You may have been breaking Google’s guidelines or posting spammy and harmful content without realizing it.
These things can incur a well-deserved manual action.
How To Uncover Manual Actions & Penalties
To find out if you have been smacked with a penalty, look inside Google Search Console.
Screenshot from Google Search Console, January 2024
If it says anything other than “No issues detected,” you are in trouble.
Luckily, Google will describe what they did and why, so you will know right away how to lift the sanctions against your site. Sometimes it will be as easy as deleting offending content.
2. Inefficient Internal Linking
Hyperlinks do much more than just let users jump between pages.
They have two important jobs:
Transfer authority from one page to another.
Shape the user journey.
These jobs apply both to external and internal links. While the content of an externally linked page is often out of your control, the internal links are fully under your control, so you can’t afford to neglect how you use them.
Otherwise you will end up with a site that neither ranks in Google nor is user-friendly.
How To Make Your Links User-Friendly & Google-Friendly
To interlink the pages on your site efficiently:
Make your links’ anchor texts descriptive. Users must be able to look at a link and immediately understand where it leads.
Link from pages with high authority to pages with lower authority. That way, low-authority pages can receive more authority and rank higher. Find pages that could benefit from it with WebCEO’s Internal Links tool.
Use a navigation bar and a footer. They contain links to some of the most important pages (like homepage, FAQ, About Us, Contact Us) which users must be able to find at any time.
Create topic clusters. A good practice for increasing your site’s authority is creating multiple pages on several related topics and linking them together.
Make the user journey as short as possible. An unwritten three-click rule states that users must be able to find anything they want in three clicks or fewer. Even if no evidence supports this rule’s impact on SEO, it’s hard to deny that it’s good for user experience.
3. Lack Of User Accessibility
As we know, not all users are perfectly healthy. Old age, diseases, or disabilities can make it very hard or even impossible to comfortably browse the Internet.
If your website does not cater to all levels of users, it becomes less user-friendly, which gets in the way of a positive user experience.
It’s so important that it’s even required by law. You can’t fix a lawsuit with SEO, so don’t tempt fate.
You want your site to be available to as many people as possible – and for that purpose, there is user accessibility.
How To Improve The User Accessibility Of Your Website
Is your site accessible? There are many tools to find that out, like ARIA by EquallyAI.
If your site doesn’t pass the accessibility test, or if user feedback lets you know about any problems, you should apply practices from those resources.
4. Using The HREFLANG Tag Incorrectly
If you target a multilingual audience (for example, with an ecommerce site), you need to watch out for this one.
You don’t want to ruin the users’ experience with a language barrier.
The hreflang tag’s job is to show the visitors the version of your site they can read, based on their location and language.
How To Use The HREFLANG Tag Correctly
A properly coded HREFLANG tag should look similar to:
Parts Of An HREFLANG Tag
The “en-GB” attribute means it targets English-speaking users in the United Kingdom. “En” is the language code (ISO 639 format) and “GB” is the region code (ISO-3166-1 alpha-2 format). Always remember to refer to these Wikipedia links to pick the correct codes!
If the tag has the wrong attribute or none at all, you risk giving your visitors a bunch of text they can’t read or navigate, making your site useless to them. Kiss those conversions and rankings goodbye, say hello to bounce rates.
Although, the hreflang tag isn’t a perfect solution. There’s always a non-zero chance the user’s native language isn’t even used wherever they reside. For these cases, and for all others too, it’s best to have an option to manually switch between the languages on your site.
Screenshot from Facebook.com, January 2024
5. Opening New Links On The Same Page
This problem can affect your rankings by diverting traffic before a conversion.
While Google have an opinion about the links’ target=”_blank” attribute, it’s still a good idea to use it on your links.
Why Should You Open All Links In A New Tab?
When the link opens in the same tab, users leave the page they were on – often before there was a conversion. That makes your high-ranking pages less useful. And if it keeps happening, Google will agree and lower the rankings.
Fortunately, the solution is simple: include the target=”_blank” attribute in your links’ HTML code. Some platforms, like WordPress, often add this attribute automatically.
Important! The target=”_blank” attribute should always be used together with rel=”noopener” or rel=”noreferrer” for security reasons. If you inadvertently link to a shady page, either of these attributes will prevent it from gaining access to the tab from which the user opened the link, protecting both your site and the user’s device.
In other words, the HTML code for a secure link will look like this:
Image quality is a very obvious factor, so let’s assume your site uses only good-looking images. But at the same time, this factor is a double-edged sword: with high quality comes high file size.
And large images affect your site loading speed.
Website Image Best Practices
You should never sacrifice the quality of your visuals. So what can you do instead?
Save your images in the most optimal format. The logo icon should be ICO or SVG, large photos should be JPG, and you can use WEBP instead of GIF (although not all browsers support WEBP).
Manually set height and width. Shave off all the excessive pixels.
Merge images if they are placed right next to each other.
In addition to poorly optimized images or assets like CSS and Javascript files, your site could be loading slowly because of caching issues.
A cache is where browsers store a website’s elements upon visit.
For example, a website’s logo is the same on every page, so once it’s stored, your browser doesn’t need to load it anew – instead, it will just retrieve it from cache, and your site will load faster.
How To Properly Cache A Website
If your website suddenly starts showing resistance to caching, you should check the .htaccess file on your server.
The .htaccess file can be used for many cool things, like password-protecting your site or customizing your error 404 page.
It can also be used to set expiration dates for the cached resources.
Open your file and look for code like this:
ExpiresActive on
ExpiresDefault "access plus 2 days"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType text/css "access plus 1 month"
ExpiresByType text/javascript "access plus 1 month"
ExpiresByType image/ico "access plus 1 year"
ExpiresByType text/html "access plus 600 seconds"
If there’s nothing like that at all, feel free to copy and paste this example.
Otherwise, tweak those “access plus time” bits to your preference or add entries for additional file types.
The trick is to set a long expiration period for elements that don’t change often, like the logo icon. That way, they will stay cached for as long as you decide.
8. Failing To Match User Search Intent
You’ve probably used SEO tools to find the best keywords: long-tailed, with a high search volume, not too competitive.
You’ve put them in all the right places: title, description, URL, headings, and throughout the text too.
Yet your site rankings and visits aren’t increasing. Why not?
It’s possible you forgot to account for the most important factor in keyword optimization: user search intent.
What Is User Intent In SEO?
You don’t want to bring users without purpose.
You want them to do specific things: learn, discuss, subscribe, purchase, share, and so on.
This variety makes it possible to categorize user search intent under several types:
Informational (learning new information).
Navigational (looking for a specific website).
Commercial (looking for a product).
Transactional (buying a product).
Locational (looking for a place or information about a place).
Seasonal (tied to a time period or a date, like Christmas).
That’s why not only your content, but your keywords must reflect it as well. And with so many types, it’s easy to make a mistake and choose the wrong keywords for the job. Knowing the difference between them can make or break your SEO.
How Do I Implement User Intent Into My SEO Strategy?
Target users with even higher accuracy by remembering the stages of the keyword funnel:
Awareness: the user has a need and starts searching for a solution, possibly not knowing yet what it could be. Example: buy laptop.
Interest: the user narrows down and explores the options. Example: best gaming laptops.
Action: the user knows exactly what they want and intends to get it. Example: laptop Lenovo IdeaPad Gaming 3.
Backlinks are necessary for building up your site’s authority – and site rankings. It’s a crucial step in SEO: very hard to do right and incredibly easy to mess up.
Having no backlinks at all is bad, but it’s even worse to have lots of low-quality backlinks.
If you don’t check your link profile from time to time, you won’t even know you have this problem and need to fix it.
It will look through all of your site’s backlinks and display the potentially harmful ones in a table. If you find any backlinks from sites you actually trust, you can keep them. But the ones that do look bad have to go.
Ask somebody who can edit the linking site to delete them.
Use Google Disavow (in-built in the Toxic Pages tool) to make Google ignore those links.
10. Failing To Follow SEO Trends
Search engine optimization has come a long way since its inception.
Many new things were introduced, many old things stopped working, and even more were improved upon and made more accurate. And this process continues.
So if you don’t monitor the SEO news and innovations, you may find that your SEO efforts aren’t effective anymore.
Make it a habit to read SEO news regularly (like on this very site). Pay attention to any important updates – especially to Google’s core algorithm updates. When those roll out, SEO changes drastically for everybody.
The more you know about different SEO problems, the easier you can identify and fix them in time. All you need is a good eye and reliable tools. Stay sharp and check your site for signs of trouble often.
This post was sponsored by Ahrefs. The opinions expressed in this article are the sponsor’s own.
With generative AI and constant Google algorithm changes, how can you shift your strategy quickly enough to keep up?
How do you stay on top of the latest SEO trends, while maximizing your site performance?
2023 has forced SEO professionals to begin finding new ways to adapt their approach to SEO success in 2024.
The rise of AI technology is becoming a strong force in shaping search algorithms and user experiences.
Meanwhile, Google’s Search Generative Experience (SGE) is shaking things up and altering the dynamics of online discovery by influencing how search engines interpret and deliver information.
So, if you’re looking to not simply survive but thrive amidst these seismic changes, a strategic pivot is imperative.
The key to success in today’s world of SEO is standing out and boosting your online presence.
In this guide, we’ll not only uncover strategies to help you increase your search discoverability, but we’ll also introduce the tools you need to implement them effectively.
Some of the most significant improvements you can start making in 2024 include:
Executing and measuring your SEO more efficiently.
Let’s dive into how these key strategy tweaks can completely transform your SEO this year and get you on track for higher SERP rankings and better performance.
How & When To Leverage AI Tools For SEO
Everyone’s talking about how to best harness the power of Artificial Intelligence (AI) in 2024.
But if there’s one thing you should take away from the AI conversation, it’s this: AI is not a human replacement.
Think of it more so as a formidable ally that exists to amplify the capabilities of human expertise.
Utilizing AI in 2024 is about striking the perfect balance and finding the synergy between human ingenuity and AI advancements.
Rather than relying solely on AI technology, find ways to use it to enhance your results.
For instance, if you’re a content marketer looking to streamline your process, AI could assist by offering insights, recommendations, and data-driven strategies to elevate the quality and relevance of your content.
What To Use AI Tools For In Content Marketing & SEO
Input your rough ideas and get an organized, well-structured outline in minutes.
Improve the quality, clarity, and readability of a sentence or paragraph with an instant content refresh.
Generate optimized meta titles for better search engine visibility.
Craft informative, SEO-friendly meta descriptions for your articles quickly and easily.
Simplify and summarize your content with precision.
Brainstorm variations of ready-to-use, SEO-friendly blog post ideas to drive more traffic to your blog.
Generate descriptive alt text for your images to improve accessibility and SEO without a hassle.
Get inspiration for your next piece of content by generating a variety of creative ideas.
And more!
Not only can AI help you save time while crafting compelling copy, but it can also automate keyword research processes.
If you’re familiar with Ahrefs’ Keywords Explorer tool, you’ll be pleased to know that they recently launched a new AI assistant to generate keyword suggestions effortlessly.
With this new update, you can get AI keyword suggestions directly within the platform, without needing to go back and forth with ChatGPT when doing your keyword research.
Stay ahead of the curve and start leveraging AI to your advantage with Ahrefs.
How To Master Website Optimization & Technical SEO
A well-optimized and technically sound website acts as a sturdy foundation, insulating you from the impact of ruthless core updates that search engines may roll out.
With search algorithms becoming increasingly sophisticated, seamless user experience, fast loading times, and adherence to Core Web Vitals have become critical factors in SEO success.
By mastering technical SEO and keeping your site in top-notch condition, you’re not just offering a seamless user experience but also signaling to search engines that your content is reliable and trustworthy.
This proactive approach helps to ensure your site remains visible and valuable amidst ever-evolving ranking criteria.
With Ahrefs’ Site Audit tool, you can run a thorough SEO audit to uncover your website’s technical and on-page SEO issues, and find out exactly what’s holding your website back.
Plus, the platform recently added some exciting new features to enhance your analysis efforts.
Here are some key updates you should know about:
Core Web Vital (CWV) metrics: Filter your pages by various CWV data points during website recrawls. Visualize CrUX and Lighthouse scores, historical changes, and individual metric performance in the Performance report.
Google Data Studio integration: Build personalized reports by blending data from different sources and visualizing everything in the form of reports and dashboards.
Search by HTML code and page text: Easily search for specific elements extracted during a crawl, such as Google Analytics IDs or “out of stock” labels.
List of issues for a particular URL: Addressing an issue on a specific URL is streamlined with a dedicated tab showcasing all related issues, allowing comprehensive fixes.
Links Report in Site Audit Issues: Navigate issues more effectively with an additional links report tab, facilitating in-depth analysis at the links level. For instance, browse and export links related to 404 pages for a thorough understanding.
Easy Way: Use An SEO Plugin To Help Optimize Your Content
Ahrefs also launched a WordPress SEO plugin to further assist in your optimization efforts.
With this new plugin, you can automate content audits and see how each article performs for a chosen target keyword.
The tool also provides recommendations on how you can improve the article for better results.
Here are a few key capabilities:
You can import “focus keywords” for each article from Yoast, RankMath, and AIOSEO.
Smoother audit progress that shows real-time audit completion percentages.
Export to CSV all metrics for all analyzed articles or only articles with selected performance status.
Ready to get your site in tip-top shape for the year ahead? Check out Ahrefs’ latest offerings today.
How To Execute & Measure Your SEO KPIs More Efficiently
Keeping your website up-to-date with SEO trends is important – but the buck doesn’t stop there.
Staying competitive requires you to keep a vigilant eye on your rival sites as well, dissecting their strategies, and adapting your own accordingly.
Success in SEO isn’t just about visibility; it’s about outperforming your peers.
Reassessing the metrics that matter most and recalibrating your strategy based on real-time insights are the secrets to staying ahead of the curve.
In this evolving landscape, the ability to execute with precision and measure impact accurately will be the key to unlocking sustained SEO success.
You can start by knowing how to spot SERP changes effectively.
For example, with Ahrefs’ historical SERP checker tool, you can go back in time and see what the SERP looked like for any given query in their index.
And if you already use Ahrefs Site Explorer tool, you’ll love the recent updates they’ve made to their overview report.
Not only does it load noticeably faster than the previous version, but you get access to the following new features:
New history chart.
Comparison mode.
Paid traffic data.
Year-over-year mode.
Google algorithm updates.
Calendar report.
Site structure report.
Visit Ahrefs to learn more.
Start Ranking Higher & Maximizing Opportunities With Ahrefs Now
In the rapidly evolving SEO landscape, Ahrefs emerges as a strategic ally, providing the tools you need to not only stay afloat but rise to new heights.
From leveraging AI tools intelligently to mastering technical SEO and executing with precision, we’ve dissected the key strategies that will redefine your SEO in 2024 and beyond.
It’s time to embrace change and elevate your search performance.
Start your journey to sustained SEO success with Ahrefs’ vast array of tools and exciting new features.
With their updated usage-based pricing model, you can access the features you need most. Try it today!
Consumers often search a company’s name on Google when considering a purchase. Branded search is crucial for just about any business.
Brand name queries may include other keywords, such as “reviews” or “coupon codes.” Most imply a commercial intent — an opportunity to convert a searcher into a buyer.
If Google’s Search Generative Experience becomes public, converting such shopping journeys will likely be harder because the AI-driven snapshots are personalized and may include answers to a searcher’s past queries, not necessarily the brand.
The image below is an AI-generated answer to the query “spirit halloween reviews.” The snapshot states, “Reviews for Spirit Halloween are generally negative.” Not many consumers will likely buy from that brand if they read it.
Branded search can produce negative results, such as this example for “spirit halloween.” Click image to enlarge.
Controlling branded search results, especially in Google’s autocomplete suggestions, is key for the coming AI-powered search.
Here’s how.
Branded Subreddit
Reddit threads increasingly appear in search results, frequently in higher rankings. Google’s “Discussions and forums” section is mostly Reddit. Here’s an example of Reddit dialog ranking for “spirit halloween reviews” in “Discussions” and immediately below it in an organic listing.
Reddit threads increasingly appear in search results, such as this example for “spirit halloween reviews.” Click image to enlarge.
“Reddit” itself is a popular query, indicating searchers value discussions of a product or brand on that forum. Thus a good way to control branded queries is creating a Subreddit for your company and using it for customer support and community building.
Comcast has two Subreddits, which rank 1 and 2 on Google for “comcast reddit.”
A search on “comcast reddit” produces two Subreddits ranking 1 and 2. Click image to enlarge.
It also comes up in the branded SGE answer.
An SGE query for “comcast reddit” also shows the company’s Subreddits. Click image to enlarge.
And a query for “comcast” shows “comcast reddit” in the autocomplete suggestions. By maintaining its own Subreddits, Comcast helps control top organic positions for its name search, SGE answers, and autocomplete suggestions.
Searching “comcast” on Google produces a “comcast reddit” autocomplete suggestion. Click image to enlarge.
Knowledge Base
Google’s organic results encourage searchers to ask (more) questions. “People also ask” boxes from brand queries often prompt searchers to explore the business further. SGE snapshots include “Ask a follow-up” questions that frequently correlate with “People also ask.”
SGE snapshots include “Ask a follow-up” questions that frequently correlate with “People also ask.” Click image to enlarge.
Answering all questions in brand-driven “People also ask” boxes is crucial for bringing these searchers to your site.
AlsoAsked is a handy tool for collecting questions from “People also ask” boxes. However, for vital brand queries, manually search and interact with the questions directly. “People also ask” boxes are dynamic — clicking results produces more questions. No tool can generate all queries of your company.
A Q&A knowledge base section on your site can answer many of these questions. Some could combine into a single answer on a page. Others require unique responses. Regardless, interlink all questions and answers so the pages rank well in brand queries.
YouTube is a sure-fire traffic and conversion generator. It also ranks well in organic search and can produce listings for brand queries.
To drive branded search results from YouTube, turn user-generated content into videos. Start by inviting customers to send videos of unpacking and using your products. Separately, create YouTube alerts for new videos mentioning your company or products.
Next, curate those videos via YouTube playlists titled with primary branded queries (e.g., “spirit halloween reviews”). The videos on the playlists will interlink, helping to rank in organic search.
Videos are prominent in Google’s search results. Brand-driven videos can drive traffic and conversions.
Searching Google for “spirit halloween reviews” produces YouTube videos. Click image to enlarge.
Celebrate the Holidays with some of SEJ’s best articles of 2023.
Our Festive Flashback series runs from December 21 – January 5, featuring daily reads on significant events, fundamentals, actionable strategies, and thought leader opinions.
2023 has been quite eventful in the SEO industry and our contributors produced some outstanding articles to keep pace and reflect these changes.
Catch up on the best reads of 2023 to give you plenty to reflect on as you move into 2024.
PageRank was once at the very core of search – and was what made Google the empire it is today.
Even if you believe that search has moved on from PageRank, there’s no denying that it has long been a pervasive concept in the industry.
Every SEO pro should have a good grasp of what PageRank was – and what it still is today.
In this article, we’ll cover:
What is PageRank?
The history of how PageRank evolved.
How PageRank revolutionized search.
Toolbar PageRank vs. PageRank.
How PageRank works.
How PageRank flows between pages.
Is PageRank still used?
Let’s dive in.
What Is PageRank?
Created by Google founders Larry Page and Sergey Brin, PageRank is an algorithm based on the combined relative strengths of all the hyperlinks on the Internet.
Most people argue that the name was based on Larry Page’s surname, whilst others suggest “Page” refers to a web page. Both positions are likely true, and the overlap was probably intentional.
Published in January 1999, the paper demonstrates a relatively simple algorithm for evaluating the strength of web pages.
Image from patents.google.com, April 2023
The paper went on to become a patent in the U.S. (but not in Europe, where mathematical formulas are not patentable).
Image from patents.google.com, April 2023
Stanford University owns the patent and has assigned it to Google. The patent is currently due to expire in 2027.
Image from patents.google.com, April 2023
The History Of How PageRank Evolved
During their time at Stanford in the late 1990s, both Brin and Page were looking at information retrieval methods.
At that time, using links to work out how “important” each page was relative to another was a revolutionary way to order pages. It was computationally difficult but by no means impossible.
The idea quickly turned into Google, which at that time was a minnow in the world of search.
There was so much institutional belief in Google’s approach from some parties that the business initially launched its search engine with no ability to earn revenue.
And while Google (known at the time as “BackRub”) was the search engine, PageRank was the algorithm it used to rank pages in the search engine results pages (SERPs).
The Google Dance
One of the challenges of PageRank was that the math, whilst simple, needed to be iteratively processed. The calculation runs multiple times, over every page and every link on the Internet. At the turn of the millennium, this math took several days to process.
The Google SERPs moved up and down during that time. These changes were often erratic, as new PageRanks were being calculated for every page.
This was known as the “Google Dance,” and it notoriously stopped SEO pros of the day in their tracks every time Google started its monthly update.
(The Google Dance later became the name of an annual party that Google ran for SEO experts at its headquarters in Mountain View.)
Trusted Seeds
A later iteration of PageRank introduced the idea of a “trusted seed” set to start the algorithm rather than giving every page on the Internet the same initial value.
Reasonable Surfer
Another iteration of the model introduced the idea of a “reasonable surfer.”
This model suggests that the PageRank of a page might not be shared evenly with the pages it links out to – but could weight the relative value of each link based on how likely a user might be to click on it.
The Retreat Of PageRank
Google’s algorithm was initially believed to be “unspam-able” internally since the importance of a page was dictated not just by its content but also by a sort of “voting system” generated by links to the page.
Google’s confidence did not last, however.
PageRank started to become problematic as the backlink industry grew. So Google withdrew it from public view, but continued to rely on it for its ranking algorithms.
The PageRank Toolbar was withdrawn by 2016, and eventually, all public access to PageRank was curtailed. But by this time, Majestic (an SEO tool), in particular, had been able to correlate its own calculations quite well with PageRank.
Google spent many years encouraging SEO professionals away from manipulating links through its “Google Guidelines” documentation and through advice from its spam team, headed up by Matt Cutts, until January 2017.
The company was relying less on PageRank and, following the purchase of MetaWeb and its proprietary Knowledge Graph (called “Freebase” in 2014), Google started to index the world’s information in different ways.
Toolbar PageRank Vs. PageRank
Google was initially so proud of its algorithm that it was happy to publicly share the result of its calculation to anyone who wanted to see it.
The most notable representation was a toolbar extension for browsers like Firefox, which showed a score between 0 and 10 for every page on the Internet.
In truth, PageRank has a much wider range of scores, but 0-10 gave SEO pros and consumers an instant way to assess the importance of any page on the Internet.
The PageRank Toolbar made the algorithm extremely visible, which also came with complications. In particular, it meant that it was clear that links were the easiest way to “game” Google.
The more links (or, more accurately, the better the link), the better a page could rank in Google’s SERPs for any targeted keyword.
This meant that a secondary market was formed, buying and selling links valued on the PageRank of the URL where the link was sold.
This problem was exacerbated when Yahoo launched a free tool called Yahoo Search Explorer, which allowed anyone the ability to start finding links into any given page.
Later, two tools – Moz and Majestic – built on the free option by building their own indexes on the Internet and separately evaluating links.
How PageRank Revolutionized Search
Other search engines relied heavily on analyzing the content on each page individually. These methods had little to identify the difference between an influential page and one simply written with random (or manipulative) text.
This meant that the retrieval methods of other search engines were extremely easy for SEO pros to manipulate.
Google’s PageRank algorithm, then, was revolutionary.
Combined with a relatively simple concept of “nGrams” to help establish relevancy, Google found a winning formula.
It soon overtook the main incumbents of the day, such as AltaVista and Inktomi (which powered MSN, amongst others).
By operating at a page level, Google also found a much more scalable solution than the “directory” based approach adopted by Yahoo and later DMOZ – although DMOZ (also called the Open Directory Project) was able to provide Google initially with an open-source directory of its own.
How PageRank Works
The formula for PageRank comes in a number of forms but can be explained in a few sentences.
Initially, every page on the internet is given an estimated PageRank score. This could be any number. Historically, PageRank was presented to the public as a score between 0 and 10, but in practice, the estimates do not have to start in this range.
The PageRank for that page is then divided by the number of links out of the page, resulting in a smaller fraction.
The PageRank is then distributed out to the linked pages – and the same is done for every other page on the Internet.
Then for the next iteration of the algorithm, the new estimate for PageRank for each page is the sum of all the fractions of pages that link into each given page.
The formula also contains a “damping factor,” which was described as the chance that a person surfing the web might stop surfing altogether.
Before each subsequent iteration of the algorithm starts, the proposed new PageRank is reduced by the damping factor.
This methodology is repeated until the PageRank scores reach a settled equilibrium. The resulting numbers were then generally transposed into a more recognizable range of 0 to 10 for convenience.
One way to represent this mathematically is:
Image from author, April 2023
Where:
PR = PageRank in the next iteration of the algorithm.
d = damping factor.
j = the page number on the Internet (if every page had a unique number).
n=total number of pages on the Internet.
i = the iteration of the algorithm (initially set as 0).
The formula can also be expressed in Matrix form.
Problems And Iterations To The Formula
The formula has some challenges.
If a page does not link out to any other page, then the formula will not reach an equilibrium.
In this event, therefore, the PageRank would be distributed amongst every page on the Internet. In this way, even a page with no incoming links could get some PageRank – but it would not accumulate enough to be significant.
Another less documented challenge is that newer pages, whilst potentially more important than older pages, will have a lower PageRank. This means that over time, old content can have a disproportionately high PageRank.
The time a page has been live is not factored into the algorithm.
How PageRank Flows Between Pages
If a page starts with a value of 5 and has 10 links out, then every page it links to is given 0.5 PageRank (less the damping factor).
In this way, the PageRank flows around the Internet between iterations.
As new pages come onto the Internet, they start with only a tiny amount of PageRank. But as other pages start to link to these pages, their PageRank increases over time.
Is PageRank Still Used?
Although public access to PageRank was removed in 2016, it is believed the score is still available to search engineers within Google.
A leak of the factors used by Yandex showed that PageRank remained as a factor that it could use.
Google engineers have suggested that the original form of PageRank was replaced with a new approximation that requires less processing power to calculate. Whilst the formula is less important in how Google ranks pages, it remains a constant for each web page.
And regardless of what other algorithms Google might choose to call upon, PageRank likely remains embedded in many of the search giant’s systems to this day.
Dixon explains how PageRank works in more detail in this video:
Original Patents And Papers For More In-Depth Reading:
Celebrate the Holidays with some of SEJ’s best articles of 2023.
Our Festive Flashback series runs from December 21 – January 5, featuring daily reads on significant events, fundamentals, actionable strategies, and thought leader opinions.
2023 has been quite eventful in the SEO industry and our contributors produced some outstanding articles to keep pace and reflect these changes.
Catch up on the best reads of 2023 to give you plenty to reflect on as you move into 2024.
As an SEO professional, you’ve heard about ChatGPT and BARD – or even used them yourself.
You might also be familiar with a handful of machine learning models from Google, such as BERT and RankBrain.
These are all great applications of machine learning.
But it isn’t always immediately apparent how machine learning can be useful for your BAU (business as usual) tasks or daily SEO to-dos.
There’s also the question of ethics, as in data privacy and fairness, when it comes to using machine learning in your work.
However, understanding how machine learning works in search (and in real life) can only work to your advantage as an SEO pro – whether you’re technical or not.
In this comprehensive article, learn the types of machine learning and ethics associated with the technology, then take inspiration from seven machine learning examples that have had a great impact on our world.
As an added bonus, at the end of this article, explore eight machine learning examples directly applied to SEO.
What Is Machine Learning?
Machine learning is a subset of artificial intelligence (AI) which enables systems to learn and improve without the need to be explicitly programmed. This is achieved by using large data sets that train the parameters in the model.
Perhaps the most well-known examples of machine learning currently are ChatGPT and BARD – and while this post won’t be focusing on them, I decided to ask them both to explain to me what machine learning is. Here are their responses:
ChatGPT
“Machine learning is a type of artificial intelligence that allows software applications to learn from the data and become more accurate in predicting outcomes without explicit programming. In other words, it is a method of teaching computers to learn from the data they are given, and then to make decisions or predictions based on that learning.
The goal of machine learning is to develop algorithms that can learn patterns in data, and then use those patterns to make decisions or predictions about new data. This is done by training the machine learning algorithm on a dataset of known inputs and outputs, and then using that knowledge to make predictions on new, unseen data.
Some common applications of machine learning include image recognition, natural language processing, fraud detection, and recommendation systems.”
Screenshot from ChatGPT, April 2023
BARD
“Machine learning is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a subset of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so.
Machine learning can be used for a variety of tasks, including:
Classification: Sorting data into categories. For example, a spam filter might classify emails as spam or not spam.
Regression: Predicting a continuous value. For example, a weather forecast might predict the temperature tomorrow.
Clustering: Grouping data points together based on their similarity. For example, a recommender system might cluster users together based on their past purchases.
Anomaly detection: Identifying unusual data points. For example, a fraud detection system might identify transactions that are likely to be fraudulent.
Machine learning is a powerful tool that can be used to solve a variety of problems. However, it is important to note that machine learning algorithms are only as good as the data they are trained on. If the data is biased, the algorithm will be biased as well. Additionally, machine learning algorithms can be susceptible to overfitting, which is when the algorithm learns the training data too well and does not generalize well to new data.
Despite these challenges, machine learning is a rapidly growing field with a wide range of applications. As the amount of data available to us continues to grow, machine learning is likely to play an even greater role in our lives in the future.”
Screenshot from BARD, April 2023
Supervised Learning
As the name suggests, in supervised learning, the model learns under supervision. It is trained with labeled data, and the output is manually reviewed.
The machine learns from the labeled data, which is then used to make future predictions.
Once the output is received, the model remembers it and will use it for the next operation.
Image from author, April 2023
Image from author, April 2023
There are two main types of supervised learning: classification and regression.
Classification
Classification is when the output variable is categorical, with two or more classes that the model can identify; for example, true or false and dog or cat.
Examples of this include predicting whether emails are likely to be spam or whether an image is of a dog or cat.
In both of these examples, the model will be trained on data that is either classified as spam or not spam, and whether an image contains a dog or cat.
Regression
This is when the output variable is a real or continuous value, and there is a relationship between the variables. Essentially, a change in one variable is associated with a change that occurs in the other variable.
The model then learns the relationship between them and predicts what the outcome will be depending on the data it is given.
For example, predicting humidity based on a given temperature value or what the stock price is likely to be at a given time.
Unsupervised Learning
Unsupervised learning is when the model uses unlabeled data and learns by itself, without any supervision. Essentially, unlike supervised learning, the model will act on the input data without any guidance.
It does not require any labeled data, as its job is to look for hidden patterns or structures in the input data and then organize it according to any similarities and differences.
For example, if a model is given pictures of both dogs and cats, it isn’t already trained to know the features that differentiate both. Still, it can categorize them based on patterns of similarities and differences.
Image from author, April 2023
There are also two main types of unsupervised learning: clustering and association.
Clustering
Clustering is the method of sorting objects into clusters that are similar to each other and belong to one cluster, versus objects that are dissimilar to a particular cluster and therefore belong in another.
Examples of this include recommendation systems and image classifying.
Association
Association is rule-based and is used to discover the probability of the co-occurrence of items within a collection of values.
Examples include fraud detection, customer segmentation, and discovering purchasing habits.
Semi-supervised Learning
Semi-supervised learning bridges both supervised and unsupervised learning by using a small section of labeled data, together with unlabeled data, to train the model. It, therefore, works for various problems, from classification and regression to clustering and association.
Semi-supervised learning can be used if there is a large amount of unlabeled data, as it only requires a small portion of the data to be labeled to train the model, which can then be applied to the remaining unlabeled data.
Google has used semi-supervised learning to better understand language used within a search to ensure it serves the most relevant content for a particular query.
Reinforcement Learning
Reinforcement learning is when a model is trained to return the optimum solution to a problem by taking a sequential approach to decision-making.
It uses trial and error from its own experiences to define the output, with rewards for positive behavior and negative reinforcement if it is not working towards the goal.
The model interacts with the environment that has been set up and comes up with solutions without human interference.
Human interference will then be introduced to provide either positive or negative reinforcement depending on how close to the goal the output is.
Examples include robotics – think robots working in a factory assembly line – and gaming, with AlphaGo as the most famous example. This is where the model was trained to beat the AlphaGo champion by using reinforcement learning to define the best approach to win the game.
Machine Learning Ethics
There is no doubt that machine learning has many benefits, and the use of machine learning models is ever-growing.
However, it’s important to consider the ethical concerns that come with using technology of this kind. These concerns include:
The accuracy of a machine learning model and whether it will generate the correct output.
Bias in the data that is used to train models, which causes a bias in the model itself, and, therefore, a bias in the outcome. If there is historical bias in data, that bias will often be replicated throughout.
The fairness in outcomes and the overall process.
Privacy – particularly with data that is used to train machine learning models – as well as the accuracy of the outcomes and predictions.
7 Machine Learning Examples In The Real World
1. Netflix
Netflix uses machine learning in a number of ways to provide the best experience for its users.
The company is also continually collecting large amounts of data, including ratings, the location of users, the length of time for which something is watched, if content is added to a list, and even whether something has been binge-watched.
This data is then used to further improve its machine learning models.
Content Recommendations
TV and movie recommendations on Netflix are personalized to each individual user’s preferences. To do this, Netflix deployed a recommendation system that considers previous content consumed, users’ most viewed genres, and content watched by users with similar preferences.
Auto-generated Thumbnails
Netflix discovered that the images used on the browse screen make a big difference in whether users watch something or not.
It, therefore, uses machine learning to create and display different images according to a user’s individual preferences. It does this by analyzing a user’s previous content choices and learning the kind of image that is more likely to encourage them to click.
These are just two examples of how Netflix uses machine learning on its platform. If you want to learn more about how it is used, you can check out the company’s research areas blog.
2. Airbnb
With millions of listings in locations across the globe at different price points, Airbnb uses machine learning to ensure users can find what they are looking for quickly and to improve conversions.
There are a number of ways the company deploys machine learning, and it shares a lot of details on its engineering blog.
Image Classification
As hosts can upload images for their properties, Airbnb found that a lot of images were mislabeled. To try and optimize user experience, it deployed an image classification model that used computer vision and deep learning.
The project aimed to categorize photos based on different rooms. This enabled Airbnb to show listing images grouped by room type and ensure the listing follows Airbnb’s guidelines.
In order to do this, it retrained the image classification neural network ResNet50, with a small number of labeled photos. This enabled it to accurately classify current and future images uploaded to the site.
Search Ranking
To provide a personalized experience for users, Airbnb deployed a ranking model that optimized search and discovery. The data for this model came from user engagement metrics such as clicks and bookings.
Listings started by being ordered randomly, and then various factors were given a weight within the model – including price, quality, and popularity with users. The more weight a listing had, the higher it would be displayed in listings.
This has since been optimized further, with training data including the number of guests, price, and availability also included within the model to discover patterns and preferences to create a more personalized experience.
Spotify uses a recommendation algorithm that predicts a user’s preference based on a collection of data from other users. This is due to numerous similarities that occur between music types that clusters of people listen to.
Playlists are one way it can do this, using statistical methods to create personalized playlists for users, such as Discover Weekly and daily mixes.
It can then use further data to adjust these depending on a user’s behavior.
With personal playlists also being created in the millions, Spotify has a huge database to work with – particularly if songs are grouped and labeled with semantic meaning.
This has allowed the company to recommend songs to users with similar music tastes. The machine learning model can serve songs to users with a similar listening history to aid music discovery.
Natural Language
With the Natural Processing Language (NLP) algorithm enabling computers to understand text better than ever before, Spotify is able to categorize music based on the language used to describe it.
It can scrape the web for text on a particular song and then use NLP to categorize songs based on this context.
This also helps algorithms identify songs or artists that belong in similar playlists, which further helps the recommendation system.
4. Detecting Fake News
While AI tools such as machine learning content generation can be a source for creating fake news, machine learning models that use natural language processing can also be used to assess articles and determine if they include false information.
Social network platforms use machine learning to find words and patterns in shared content that could indicate fake news is being shared and flag it appropriately.
5. Health Detection
There is an example of a neural network that was trained on over 100,000 images to distinguish dangerous skin lesions from benign ones. When tested against human dermatologists, the model could accurately detect 95% of skin cancer from the images provided, compared to 86.6% by the dermatologists.
As the model missed fewer melanomas, it was determined to have a higher sensitivity and was continually trained throughout the process.
There is hope that machine learning and AI, together with human intelligence, may become a useful tool for faster diagnosis.
Other ways image detection is being used in healthcare include identifying abnormalities in X-rays or scans and identifying key markups that may indicate an underlying illness.
6. Wildlife Security
Protection Assistant for Wildlife Security is an AI system that is being used to evaluate information about poaching activity to create a patrol route for conservationists to help prevent poaching attacks.
The system is continually being provided with more data, such as locations of traps and sightings of animals, which helps it to become smarter.
The predictive analysis enables patrol units to identify areas where it is likely animal poachers will visit.
8 Machine Learning Examples In SEO
1. Content Quality
Machine learning models can be trained to improve the quality of website content by predicting what both users and search engines would prefer to see.
The model can be trained on the most important insights, including search volume and traffic, conversion rate, internal links, and word count.
A content quality score can then be generated for each page, which will help inform where optimizations need to be made and can be particularly useful for content audits.
2. Natural Language Processing
Natural Language Processing (NLP) uses machine learning to reveal the structure and meaning of text. It analyzes text to understand the sentiment and extract key information.
NLP focuses on understanding context rather than just words. It is more about the content around keywords and how they fit together into sentences and paragraphs, than keywords on their own.
The overall sentiment is also taken into account, as it refers to the feeling behind the search query. The types of words used within the search help to determine whether it is classified as having a positive, negative, or neutral sentiment.
The key areas of importance for NLP are;
Entity – Words representing tangible objects such as people, places, and things that are identified and evaluated.
Categories – Text separated into categories.
Salience – How relevant the entity is.
Google has a free NLP API demo that can be used to analyze how text is seen and understood by Google. This enables you to identify improvements to content.
Recommendations In The World Of NLP
NLP is also being used to review and understand anchor text that is used to link pages. Therefore, it is more important than ever to ensure anchor text is relevant and informative.
Ensuring each page has a natural flow, with headings providing hierarchy and readability.
Answering the question the article is querying as quickly as possible. Ensure that users and search engines can discover key information without making too much effort.
Ensure you have the correct spelling and punctuation used to display authority and trustworthiness.
3. Google’s Models
AI and machine learning is used throughout Google’s many products and services. The most popular use of it in the context of search is to understand language and the intent behind search queries.
Previously, the search systems looked for matching words only, which didn’t even consider misspellings. Eventually, algorithms were created to find patterns that identified misspellings and potential typos.
There have been several systems introduced throughout the last few years after Google confirmed in 2016 its intention to become a machine learning first company.
RankBrain
The first of these was RankBrain, which was introduced in 2015 and helps Google to understand how different words are related to different concepts.
This enables Google to take a broad query and better define how it relates to real-world concepts.
Google’s systems learn from seeing words used in a query on the page, which it can then use to understand terms and match them to related concepts to understand what a user is searching for.
Neural Matching
Neural matching was launched in 2018 and introduced to local search in 2019.
This helps Google understand how queries relate to pages by looking at the content on a page, or a search query, and understanding it within the context of the page content or query.
Most queries made today make use of neural matching, and it is used in rankings.
This system enables Google to understand how combinations of words express different meanings and intent by reviewing the whole sequence of words on a page.
BERT is now used in most queries, as it helps Google understand what a user is looking for to surface the best results related to the search.
MUM
MUM, which means Multitask Unified Model, was introduced in 2021 and is used to understand languages and variations in search terms.
This uses the latest advancements to find patterns in sentences and correlations between different words to understand nuanced questions – and even predict which words are likely to come next.
4. Predictive Prefetching
By combining historical website data on user behavior with the capabilities of machine learning, some tools can guess which page a user is likely to navigate to next and begin prefetching the necessary resources to load the page.
This is known as predictive prefetching and can enhance website performance.
Predictive prefetching can also apply to other scenarios, such as forecasting pieces of content or widgets that users are most likely to view or interact with and personalizing the experience based on that information.
5. Testing
Running SEO A/B tests is one of the most effective ways to provide the SEO impact of changes, and the ability to generate statistically significant results is possible with the use of machine learning algorithms and neural networks.
SearchPilot is an example of SEO A/B testing that is powered by machine learning and neural network models.
Starting with a bucketing algorithm that creates statistically similar buckets of control and variant pages to perform tests on, a neural network model then forecasts expected traffic to the pages the test is being run on.
The neural network model, which is trained to account for any and all external influences such as seasonality, competitor activity, and algorithm updates, will also analyze the organic search traffic to the variant pages and identify how they perform against the control group throughout the test.
This also enables users to calculate whether any difference in traffic is statistically significant.
(Disclaimer: I work for SearchPilot.)
6. Internal Linking
Machine learning can help with internal linking in two ways:
Updating broken links: Machine learning can crawl your site to spot any broken internal links and then replace them with a link to the best alternative page.
Suggesting relevant internal linking: These tools can leverage big data to suggest relevant internal links during the article creation process and over time.
The other internal linking task is an internal link audit. This includes analyzing the number of internal links to a page, the placement of the links together with the anchor text, and the overall crawl depth of the page.
Anchor text classification can also be performed to identify the phrases used most frequently in alt text and categorize them based on topics and whether they are branded or non-branded terms.
7. Image Captioning For Alt Text
As SEO pros, we understand the importance of image alt text. They improve accessibility for people who use screen readers while also helping search engine crawlers understand the content of the page they are placed on.
Language vision models can be used to automatically caption images, therefore providing content that can be used as alt text. Image captioning is used to describe what is shown within an image in a single sentence.
Two models are used for image captioning, both as important as the other. The image-based model will start by extracting features from the image, while the language-based model will translate those features into a logical sentence.
If you’re interested in how machine learning can be used in daily SEO tasks, this article by Lazarina Stoy is a must-read – and if you would like to play around with some super interesting scripts, this collection of Colab notebooks from Britney Muller is the perfect place to start.
In Conclusion
Machine learning isn’t limited to just ChatGPT and BARD.
There are many practical applications for machine learning, both in the real world and specifically in the world of SEO – and these are likely just the beginning.
And while it will be vital to remain cognizant of the ethical questions associated with machine learning, it has exciting implications for the future of SEO.
Celebrate the Holidays with some of SEJ’s best articles of 2023.
Our Festive Flashback series runs from December 21 – January 5, featuring daily reads on significant events, fundamentals, actionable strategies, and thought leader opinions.
2023 has been quite eventful in the SEO industry and our contributors produced some outstanding articles to keep pace and reflect these changes.
Catch up on the best reads of 2023 to give you plenty to reflect on as you move into 2024.
There is a lot of confusion about how SEO pros should both understand and, more importantly, leverage “entities” in SEO.
I understand where this comes from, especially with the traditional approach to SEO being around words and phrases.
Indeed, most of the algorithms that the first wave of SEO pros (like me) grew up with had no concept of an “entity” in search. SEO principals – from content writing to anchor text in links to SERPs tracking – were (and largely still are) keyword-driven, and many people still find it hard to understand what has changed.
But over the last decade, all search has been moving towards understanding the world as a string of words and as a series of interconnected entities.
An entity (in search terms) is a record in a database. An entity generally has a specific record identify.
In Google, that might be:
“MREID=/m/23456” or “KGMID=/g/121y50m4.”
It is certainly not a “word” or “phrase.” I believe that the confusion with keywords stems from two root causes:
The first is that SEO pros learned their craft pre-2010 in terms of keywords and phrases. Many still do.
The second is that every entity comes with a label – which is generally a keyword or descriptor.
So while “Eiffel Tower” might seem like a perfectly identifiable “entity” to us as humans, Google sees it as “KGMID=/m/02j81” and really doesn’t care if you call it “Eiffel Tower,” or “ Torre Eiffel,” or “ایفل بورجو” (Which is Azerbaijan for “Eiffel Tower”). It knows that you are probably referring to that underlying entity in its Knowledge Graph.
This comes on to the next point:
What Is “The Knowledge Graph”?
There are subtle but important differences between “a knowledge graph,” “The Knowledge Graph,” and “The Knowledge Panel.”
A knowledge graph is a semi-structured database containing entities.
The Knowledge Graph is generally the name given to Google’s Knowledge Graph, although thousands of others exist. Wikidata (itself a knowledge graph) attempts to cross-reference identifiers from different reputable data sources.
The Knowledge Panel is a specific representation of results from Google’s Knowledge Graph. It is the pane often showing on the right of the results (SERPs) in a desktop search, giving more details about a person, place, event, or other entity.
A Brief History Of Entities In Search
Metaweb
In 2005, Metaweb started to build out a database, then called Freebase, which it described as an “open, shared database of the world’s knowledge.”
I would describe it as a semi-structured encyclopedia.
It gave every “entity” (or article, to extend the metaphor) its own unique ID number – and from there, instead of a traditional article in words, the system tried to connect articles through their relationships with other ID numbers in the system.
Some $50 million dollars in capital funding, and 5 years later, the project was sold to Google.
No commercial product was ever built, but the foundation was set for a 10-year transition, for Google, from a keyword-based search engine to an entity-based one.
Wikidata
In 2016 – some six years after the purchase – Google formally closed down Freebase because it had migrated and developed the ideas into its own “knowledge graph,” the modern term for these databases.
At that time, it is useful to note that Google publicly stated that it had synced much of its entity data with Wikidata and that, moving forward, Wikidata (which underpins the data used in Wikipedia) was one way in which Google’s Knowledge Graph could interface with the outside world.
How Entities Work And How They Are Used For Ranking
Entities In The Core Algorithm
Entities are primarily used to disambiguate ideas, not to rank pages with the same ideas.
That is not to say that clever use of entities can’t help your site’s content rank more effectively. It can. But when Google tries to serve up results to a user search, it aims first and foremost for an accurate answer.
Not necessarily the most deserving.
Therefore, Google spends considerable time converting text passages into underlying entities. This happens both when indexing your site and when analyzing a user query.
For example, if I type in “The names of the restaurants underneath the Eiffel Tower,” Google knows that the searcher is not looking for “names” or the “Eiffel Tower.”
They are looking for restaurants. Not any restaurant, but ones in a specific location. The two relevant entities in this search are “restaurant” in the context of “Champ de Mars, 5 Av. Anatole France, Paris” (The address of the Eiffel Tower).
This helps Google to decide how to blend its various search results – images, Maps, Google businesses, adverts, and organic web pages, to name a few.
Most importantly, for the SEO pro, it is very important for (say) the Jules Verne restaurant’s site to talk about its spectacular view of the Eiffel Tower if it wants Google to recognize that the page is relevant to this search query.
This might be tricky since the Jules Verne restaurant is inside the Eiffel Tower.
Language Agnostic
Entities are great for search engines because they are language-agnostic. Moreover, that idea means that an entity can be described through multiple media.
An image would be an obvious way to describe the Eiffel Tower since it is so iconic. It might also be a speech file or the official page for the tower.
These all represent valid labels for the entity and, in some cases, valid identifiers in other knowledge graphs.
Connections Between Entities
The interplay between entities allows an SEO pro to develop coherent strategies for developing relevant organic traffic.
Naturally, the most “authoritative” page for the Eiffel Tower is likely to be the official page or Wikipedia. Unless you are literally the SEO pro for the Eiffel Tower, there is little that you can do to challenge this fact.
However, the interplay between entities allows you to write content that will rank. We already mentioned “restaurants” and “Eiffel Tower” – but what about “Metro” and “Eiffel Tower,” or “Discounts” and “Eiffel Tower”?
As soon as two entities come into play, the number of relevant search results drops dramatically. By the time you get to “Discounted Eiffel Tower tickets when you travel by Metro,” you become one of just a tiny selection of pages focusing on the juxtaposition between Metro tickets, Eiffel Tower tickets, and discounts.
Many fewer people type in this phrase, but the conversion rate will be much higher.
It may also prove a more monetizable concept for you! (This example is to explain the principle. I do not know if such discounts exist. But they should.)
This concept can be scaled to create exceptionally strong pages by first breaking all the competing pages for a search phrase into a table showing the underlying entities and their relative importance to the main query.
This can then act as a content plan for a writer to build up a new piece of content that is more authoritative than any of the other competing pieces.
So although a search engine may claim that entities are not a ranking factor, the strategy goes to the heart of the philosophy that “If you write good content, they will come.”
Examples Of Entities In Google
Entities In Image Search
Screenshot from search for [dog on a skateboard], Google, August 2023
Google has worked very hard to analyze images using machine learning. So typically, Google knows the main imagery in most photos.
So take [a dog on a skateboard] as a search term…making sure that your content fully supports the image can help your content be more visible, just when the user is searching for it.
Entities In Google Discover
One of the most underrated traffic sources for SEO professionals is Google Discover.
Google provides a feed of interesting pages for users, even when they are not actively looking for something.
This happens on Android phones and also in the Google app on iPhones. Whilst news heavily influences this feed, non-news sites can get traffic from “Discover.”
How? Well – I believe that entities play a big factor!
Screenshot from Google Search Console, August 2023
Don’t be disheartened if you do not see a “Discover” tab in your Google Search Console. But when you do, it can be a welcome sign that at least one of your web pages has aligned with entities enough that at least one person’s interests overlap with your content enough to have the page in a feed targeted specifically to the user.
In the example above, even though “Discover” results are not displayed at the exact time that a user is searching, there is still a 4.2% click-through rate.
This is because Google can align the interests and habits of many of its users to the content on the Internet by mapping entities.
Where a strong correlation occurs, Google can offer up a page for a user.
How To Optimize For Entities
Some Research From A Googler
In 2014, a paper came out that I find quite helpful in demonstrating that Google (or at least, its researchers) were keen to separate out the ideas of using keywords to understand topics vs. using entities.
In this paper, Dunietz and Gillick note how NLP systems moved towards entity-based processing. They highlight how a binary “salience” system can be used on large data sets to define the entities in a document (webpage).
A “binary scoring system” suggests that Google might decide that a document either IS or ISN’T about any given entity.
Later clues suggest that “salience” is now measured by Google on a sliding scale from 0 to 1 (for example, the scoring given in its NLP API).
Even so, I find this paper really helpful in seeing where Google’s research thinks “entities” should appear on a page to “count” as being salient.
I recommend reading the paper for serious research, but they list how they classified “salience as a study of ‘New York Times’ articles.”
Specifically, they cited:
1st-loc
This was the first sentence in which a mention of an entity first appears.
The suggestion is that mentioning the entity early in your web page might increase the chances of an entity being seen as “salient” to the article.
Head-count
This is basically the number of times the “head” word of the entity’s first mention appears.
“Head word” is not specifically defined in the article, but I take it to mean the word concatenated to its simplest form.
Mentions
This refers not just to the words/labels of the entity, but also to other factors, such as referrals of the entity (he/she/it)
Headline
Where when an entity appears in a headline.
Head-lex
Described as the “lowercased head word of the first mention.”
Entity Centrality
The paper also talks about using a variation of PageRank – where they switched out web pages for Freebase articles!
The example they shared was a Senate floor debate involving FEMA, the Republican Party, (President) Obama, and a Republican senator.
After applying a PageRank-like iterative algorithm to these entities and their proximity to each other in the knowledge graph, they were able to change the weightings of the importance of those entities in the document.
Putting These Entity Signals Together In SEO
Without being specific to Google, here, an algorithm would create values for all the above variables for every entity that an NLP or named entity extraction program (NEEP) finds on a page of text (or, for that matter, all the entities recognized in an image).
Then a weighting would be applied to each variable to give a score. In the paper discussed, this score turns into a 1 or 0 (salient or not salient), but a value from 0-1 is more likely.
Google will never share the details of those weightings, but what the paper also shows is that the weightings are determined only after hundreds of millions of pages are “read.”
This is the nature of large language learning models.
But here are some top tips for SEO pros who want to rank content around two or more entities. Returning to the example “restaurants near the Eiffel Tower”:
Decide on a “dead” term for each entity. I might choose “restaurant,” “Eiffel Tower,” and “distance” because distance has a valid meaning and article in Wikipedia. Cafe might be a suitable synonym for restaurant, as might “restaurants” in the plural.
Aim to have all three entities in the header and first sentence. For example: “Restaurants a small distance from the Eiffel Tower.”
Aim in the text to talk about the inter-relationship between these entities. For example: “The Jules-Verne restaurant is literally inside it.” Assuming “it” clearly refers to the Eiffel Tower in the context of the writing, it does not need to be written out every time. Keep the language natural.
Is This Enough For Entity SEO?
No. Probably not. (You are welcome to read my book!) However, not all factors are in your control as a writer or website owner.
Two ideas that do seem to have an impact, though, are linking content from other pages in context and adding schema to help with the definitions.
Using Schema To Help Define Entities
Further clarity might be given to search engines by using the “about” and “mentions” schema to help a search engine disambiguate content.
These two schema types help to describe what a page is talking about.
By making a page “about” one or two entities and “mentions” of maybe a few more, an SEO professional can quickly summarize a long piece of content into its key areas in a way that is ready-made for knowledge graphs to consume.
It should be noted, though, that Google has not expressly stated one way or another whether it uses this schema in its core algorithms.
I would probably add this schema to my article:
The exact choice of schema is as much a philosophical question as an SEO question.
But think of the schema you use as “disambiguating” your content rather than “optimizing your content,” and you will hopefully end up with more targeted search traffic.
There are tons of options out there, and it’s more likely than not that, as an SEO pro, you’ll have to work with one in the future.
This comes with some advantages, like being able to integrate with third-party technologies more easily or being able to reuse content across channels to meet users’ new search behaviors.
But SEO pros are used to working with traditional CMS, and adapting to this new way of thinking about content might take some work.
What Is Headless SEO?
Headless SEO refers to the unique processes required to optimize content for search using a headless CMS.
Fundamentally, a headless CMS detaches content from its presentation.
In traditional CMS like WordPress, content and presentation are intertwined. You create pages rather than content.
Image from Sanity.io, November 2023
In the headless world, instead of pages, you create content units that contain different fields. This content can then be displayed across different surfaces.
Some of these surfaces are quite basic, like a website or an app, but you can take it beyond this and integrate it with social media, digital signage, or marketplaces like Etsy or Amazon.
So, headless SEO is less about creating content and building links and more focused on grasping the nuances of a decoupled system.
The Technical Challenge
There is a specific challenge that comes with working with a headless CMS. In traditional CMS, the fact that content and presentation are strictly tied together has allowed us to create very good out-of-the-box websites.
In a headless set-up, we don’t have these out-of-the-box guardrails, and we need to be a lot more specific with our technical implementation.
When doing SEO in a headless CMS, you need to worry about two very different things from a technical perspective:
The front-end: Auditing how the content is presented to users and search engines is a standard part of technical SEO, which is not new to most SEO professionals.
The CMS: headless CMS allow you to add and remove editable fields from your content. SEO pros working on a headless CMS need to understand how each of the fields is connected to the front-end presentation and if they have all the fields they need to do their job well, like an editable title tag, slug, or meta-description – or even if the content allows you to add internal links and images.
Your Headless SEO Checklist In 8 Steps
Let’s put the guardrails back on in your headless CMS so you can focus on growing your site instead.
This checklist will help you communicate your technical SEO requirements to your development team and diagnose the main issues that can come up in this setup.
1. Check All The Essential Meta Tags
While these are usually in the remit of front-end developers, they will impact your SEO performance, so it’s key to audit them as part of your launch.
Image from Sanity.io, November 2023
While there are many different ones, here are the basics for a sound headless SEO implementation:
Title – Check how it’s generated across the site. On some pages, you will want to have a specific field in your CMS to edit these. For other pages, such as categories, tags, or archives, you want to implement rules on how to auto-generate these. You can even implement validation rules within your CMS that force you to stay within a certain character limit.
Meta description – Like with the title, you will want to have a field within your CMS that lets you edit it directly on most pages. For some, you’ll need to implement rules to auto-generate them. Some headless CMSs allow you to integrate AI capabilities to help you with this. You can also incorporate validation rules in the CMS to keep your meta descriptions under 160 characters.
Meta robots – Depending on your chosen indexation management method, you will have to check whether this tag is present in your HTML head and if it’s behaving correctly. I’ll dive into indexation management a little bit later in the article.
Content type – This meta tag is used to tell the browser what type of content is on the page and the character set and encoding that’s being used. This is especially relevant when working in an international context and helps ensure that special characters, such as accent marks and umlauts, are displayed correctly. Again, you can include validation rules so that the content of this meta tag always matches the ISO standards required.
Viewport – The viewport tag tells browsers how to manage the dimensions of a page, and it’s essential for responsive design. Your job here is to check that the meta tag is correctly implemented and to check that the site is mobile-friendly, according to Google.
Language tag – This meta tag is used to declare the language that the content will be in. In an international setup, you want to make sure that this is correct across all pages so that you can create a correct hreflang markup by querying the lang attribute of each document. Again, you can set up validation rules to keep this tag ISO-compliant.
Open graph tags – While these are not SEO-related, we have become sort of the guardians of these tags over time. You’ll want to ensure that all the basic ones (og:title, og:type, og:image, and og:url) are implemented correctly. Most of these simply pull content from other fields, so you won’t always need to have a field within your CMS to change these, but you might want to create unique title rules or create a field to override your description and image.
2. Indexing Management
You can manage whether you allow search engines to index your page through the meta robots tag, as we covered above, or you can do it through the x-robots-tag on the HTTP header response.
The x-robots-tag is best for PDFs and other files, but for page management, the robots meta tag is easier to manage and diagnose.
You will want to have a field within your CMS that allows you to control indexation on a page-by-page basis. A toggle with a clear description of what it means to allow search engines to index the page is the best solution.
When building on a headless CMS, you must collaborate with your development team to decide the best approach to indexing management.
There might be conflicting priorities or complex integrations that keep you from getting the setup you want. You need to review these with your development team to find a happy solution.
3. Ensure URL Slugs Are Editable
Without direct input from your SEO team, you might end up with a CMS implementation that uses random strings of numbers and letters as URLs or a copy of the title.
Ensure your development team includes an editable field for your URL slug for the right pages.
Because keeping a stable URL structure is essential, you might not want to give everyone editing permissions on the URL slug.
You can tailor your CMS only to allow editing URLs after a page is published by a member of the SEO team. You can even build an automation that creates a redirect automatically when the URL is changed.
4. Establish Canonical URL Rules
Canonical URLs indicate to search engines what’s the main version of the content and help you manage potential duplicate content issues.
Here are some basic instructions to share with your development team and keep in mind during your audit:
Define your canonicals in the head of the page or the HTTP header.
Use absolute URLs, including the protocol and subdomain, such as https://www.google.com.
Define only one canonical per page.
The pages that you want to be indexed need to be self-canonicalized. That is, they should point at their own URL within the canonical tag.
Ecommerce sites have some extra layers of complexity when it comes to canonicalization, as they often have to manage larger duplicate content issues relating to categories and filters.
In this case, it’s best to work with your development team to find the best way to define canonicalization rules for your business.
5. Define Your XML Sitemap Setup
While this is obvious for any SEO, sitemaps are dynamic files, and they need to be updated at specific intervals or when triggered by some action. It’s important to agree on how your sitemap will be updated with your development team.
Your sitemap should contain only indexable canonical URLs with a 200 HTTP response code.
It should live in the root directory of your site, but if for any reason that’s not possible, you can indicate it in your robots.txt file like this:
Sitemap: https://www.example.com/sitemap.xml
Depending on the specific needs of your site, you have to consider if you’d like to split your sitemap by content type and if you want to have a sitemap for images, videos, or news articles.
6. Request Your Schema Markup
Schema markup offers search engines a richer understanding of your content.
Without SEO plugins doing the heavy lifting for you, you’ll have to request the right markup for your type of content and site. This should be added to the HTML head as a script. The code will look something like this:
Relax; give me a moment. I have not said a word yet.
Let me first clarify the entities this article revolves around so that you understand who and what I am actually talking about:
Google Search: Just plain old Google Search as we have known it since 1998.
Google Shopping: A Google service that allows you to search, view, and compare products since 2010.
Google Merchant Center (GMC): A platform that, since 2010, allows you to manage how your products appear in some of the different services Google provides. These include organic results in Google Search [1], Google Shopping [2], Google Maps, YouTube, Google Images, and Google Lens, and Google Ads channels like Google Search [1], Google Shopping [2], Google Maps, YouTube, Google Images, and Google Discover.
As for the “gap” you asked about – this has to do with the complex relationship between Google Search, Google Shopping, and their intermediary, Google Merchant Center, and the fact that there is still a technical gap to close for that relationship to start working efficiently.
They have been working together on closing this gap over the last three to four years, yet it still needs to be fully bridged.
It’s a gap that shows itself in the way we (a) can use GMC to get product variants into Google Shopping and update them vs. (b) cannot do so efficiently through structured data markup and Google Search.
This, in practice, still makes product feeds the preferred and most solid solution for any merchant out there.
Conflicting Structured Data Markup Requirements
For a long time, this gap also extended into some of the structured data markup requirements both parties upheld.
These discrepancies would often to the problem that, even though our Product and Offer markup for Google Search was in order, Google Merchant Center would not be satisfied with it – nor vice versa.
Often, we could only resolve this situation by duplicating some of our markup and modeling it ever-so-slightly differently.
This forced us to publish superfluous information because different sides of Google were not taking each other’s requirements into account (or at least not enough) – leaving it up to us to make sense of and deal with that mess.
Different Parts Of Google Finally Began Building A Bridge
Eventually, the different Google parties involved started aligning their structured data markup requirements.
This was even before it was announced that Google Shopping would start including organic results, which was a very important step forward, as it meant products could now finally get into Google Shopping for free.
Hooray for us, the folks who need to produce the structured data they require, yet often get lost in the how-to-do-it – as well as those who always wanted to be included in Google Shopping yet could not afford to.
This signaled a shift in stance, the effects of which we could already see happening over at schema.org as early as mid-2019, the moment the first requests for changes to the vocabulary were coming in and being released.
To illustrate that point, here’s a list of types, properties, and enumerations that have been added and/or modified in the first year following that moment:
Obstacles Being Replaced By Semantic Building Blocks
Since then, many of GMC’s product attributes have been translated into new schema.org types, properties, and data shapes.
This is a serious attempt at allowing us to express everything we can express in GMC product feeds through schema.org annotations – thus reducing Google’s reliance on feeds while offering us a new way to achieve the same thing.
It’s a process that is far from done yet, though by now, it is no longer just theory.
But there is still more to come, most notably the biggest (SEO) elephant in the room for ecommerce sites: efficient support for product variants via Google Search.
Google Shopping Shows Product Variants In A Single Result, Google Search – Does Not (Yet?)
Image from Google Merchant Center Help, November 2023
A Google Shopping ad type but also a feature for organic product variant listings.
Product variants are grouped together in product feeds via the [item_group_id] attribute, together with other product feed attributes like [color], [size], [material], [pattern], etc.
This is an attribute that, until not so long ago, did not have a schema.org equivalent and, therefore, could not be expressed in any form of structured data outside GMC product feeds – thus preventing us from expressing product variant relations in structured data markup and keeping us dependent on GMC.
Introducing A (Somewhat New) Fundamental Piece Of The Puzzle
A translation of a GMC product attribute into schema.org lingo that was added to schema.org in July 2020 but did not appear in GMC’s documentation nor Google’s structured data feature guide until September 2022:
This a property that, at first glance, might give the impression that the problem has been solved, as it allows us to create Product markup that describes the same relations our GMC XML feeds do:
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Yellow summer t-shirt XL",
"description": "Extra large yellow t-shirt for the summer months.",
"image": "https://example.com/content/yellow-summer-t-shirt.jpg",
"inProductGroupWithID": "abc123",
"gtin": "8710123456789",
"color": "yellow",
"size": { // Make sure to also look into the new size property. It is a beast.
"@type": "SizeSpecification",
"name": "XL",
"sizeSystem": "https://schema.org/WearableSizeSystemUS",
"sizeGroup": [
"https://schema.org/WearableSizeGroupBig",
"https://schema.org/WearableSizeGroupTall"
]
},
"offers": {
"@type": "Offer",
"url": "https://www.example.com/products/summer-t-shirt?size=xl",
"priceCurrency": "USD",
"price": 19.99,
"itemCondition": "https://schema.org/NewCondition",
"availability": "https://schema.org/InStock"
}
}
Note: You need to add the inProductGroupWithID property to the markup of all the product variant detail pages that are part of a group – although this also is exactly where the next issue arises.
XML Feeds Vs. Semantic Annotations – A Battle Of Efficiency
Many do not spend much time thinking about it, but the reality is that the gathering of structured data markup via web pages – at a web scale – is quite a complex and resource-heavy task.
This means that Google has to put in quite the effort (and money) to accumulate all that information. And it does not end there.
After collecting all that information, Google still needs to combine all that product variant information and figure out how all those different products fit together.
Oh, and then there is also the fact it has to deal with the enormous amount of markup mistakes out there – which makes it even more difficult to process such information.
It’s a process that never stops because Google also needs to stay up to date in regard to values like availability, price shipping costs, etc. – information that does not tend to be very static.
And this is something Google must do for every single product detail page on your site, over and over again.
Canonical URLs Are Counterproductive For Keeping Product Variant Info Up To Date
You might not be aware that both Google Search and GMC have some guidelines in regard to product variants and how to canonicalize them.
However, if you have ever dipped your toes in technical SEO, this should not have been news to you. If that is indeed the case, then you might also be aware that Google bots tend to crawl canonicalized URLs substantially less than the URLs the refers to.
A crawl rate that tends to go further down the longer a canonical referral stays in place.
The reason is that it is a clear signal to Google Search, one that indicates (near) duplicate content as well as a location of the most important version of that (near) duplicate content.
It makes it easy for Google to figure out which URLs should get the most attention – something which is important for Google as it can only assign so many resources to your site(s).
This is highly efficient for Google Search but not for you and, for example, those of your products that were out of stock for some time and are now available to be sold again – but do not! Because the Google bots are taking their sweet time to re-crawl your product pages. This means your product search results still indicate they are out of stock on page 3 or even deeper down in limbo.
Oh, and how about your Google Shopping out-of-stock organic product listings? How are those working out for you?
Of course, you can change the of those URLs in your sitemap XML file, which works great if you have less than 500 products, but it does not help much when you are talking about tens or hundreds of thousands (or even more) of product URLs.
Then those updates represent raindrops in an ocean of changes, and guess what? In those cases, canonicalized URLs get crawled even less.
There goes ‘updating your Google Shopping and Google Search results via structured data markup’ flying out of the window.
GMC Feeds – Still The Most Effective Way To Inform Google About Product Updates
Google only has to fetch a limited number of product feeds. Because of that, they offer Google a much easier and cost-effective method for getting its hands on complete and up-to-date product datasets.
I am not saying that the data contained in feeds is always perfect. (Surprise! They often also look like an exploded piñata.) But at least Google does not have to crawl entire domains upfront so as to get the complete picture.
This situation brings us to the next piece of the puzzle: a new Class that helps us bundle some of that information at more centralized locations on our sites – probably the same locations your product variant URLs refer to.
This makes it easier for Google to get to some important pieces of the puzzle prior to having crawled every single product, as all of them will be able to refer to that more centralized version of your structured data markup.
And because your canonical target URLs get crawled at a much higher frequency, your product search results will also update at a higher frequency, thus closing much of the efficiency gap between product feeds and structured data markup.
Introducing Schema.org/ProductGroup
Screenshot from schema.org, November 2023
Be aware that I’m not going to repeat everything the description already expresses (Tip: Always read the manual!).
Still, I will try to explain a scenario where, in my opinion, this new Class comes in very handy and is likely to pop up its head somewhere in the near future in some of Google’s documentation.
But First, A Word Of Caution
The reason why I emphasized “likely” is because, until the moment Google finally makes its announcement(s), nobody will know exactly how it will implement and make use of this new type – nor how other consumers of the same data will respond to it, or not (e.g., Bing, Facebook, Pinterest, etc).
This implies that, until Google makes its announcements, you should be very wary of anybody claiming to know how this will pan out and develop over time. Even the folks over at schema.org and Google do not know for sure, so do not let anybody else tell you they do know.
One thing is sure, though: a lot of the future developments will greatly depend on customer and peer feedback. Meaning it will likely also undergo a very agile evolution process.
This is illustrated by the fact that Google has been at it for years now and still has not finished every single detail – to leave some room to wiggle.
What To Prepare For Before ProductGroup Is Announced
Given that I am one of the people who has been involved in the discussions surrounding this topic over at schema.org – for a very long time – combined with the fact that some Googlers have been playing a very active part in those discussions as well, I feel confident enough to at least try to provide you with some insights as to the why and how behind some of the things heading our way, in the (near) future.
By doing so, I hope to give you ample time to start looking into this before Google makes any announcements – because chances are high that implementing this type of markup will come with some technical challenges.
I expect you will need some time to familiarize yourself with everything it entails so you can figure out what needs to be done before it becomes a fact of life. For example:
Is the data you will need available to you?
Can your systems produce all the parts of the markup at the right moment?
Are you able to provide different markup on the canonical URL than you do on the other product variant URLs?
In case you have feeds, how will you ensure the data in your markup and your feed(s) stays in sync?
Etc.
And if you do not prepare yourself for this, chances are high you will be caught off guard when Google finally does make its announcements, followed by heading into a rat race, desperately trying to catch up with your competitors.
Disclaimer
You should be careful in taking my views on this as facts. The things I will describe next are biased by the intentions and use cases I had in mind when starting the discussion over at schema.org.
And since I am “just Jarno,” a single person, this does not imply that others (a.k.a. the consumers of this ‘stuff’) will use it in exactly the same manner as I would like them to, nor that they have exactly the same use cases in mind as I had.
Nevertheless, I think I am safe lifting some of the curtain for you so that you can form your own ideas and opinions about where this is heading and how you might use it.
Informing Google Where ProductGroup Information Can Be Found
As we continue the markup example used earlier, you will see it is quite easy to inform consumers about the existence of pages containing ProductGroup information.
It only requires us to add a few new lines of markup to the markup on all your product variant detail pages that are part of a ProductGroup.
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Yellow summer t-shirt XL",
"description": "Extra large yellow t-shirt for the summer months.",
"image": "https://example.com/content/yellow-summer-t-shirt.jpg",
"inProductGroupWithID": "abc123",
"gtin": "8710123456789",
"color": "yellow",
"size": { // Make sure to also look into the new size property. It is a beast.
"@type": "SizeSpecification",
"name": "XL",
"sizeSystem": "https://schema.org/WearableSizeSystemUS",
"sizeGroup": [
"https://schema.org/WearableSizeGroupBig",
"https://schema.org/WearableSizeGroupTall"
]
},
"offers": {
"@type": "Offer",
"url": "https://www.example.com/products/summer-t-shirt?color=yellow&size=xl",
"priceCurrency": "USD",
"price": 19.99,
"itemCondition": "https://schema.org/NewCondition",
"availability": "https://schema.org/InStock"
},
"isVariantOf": {
"@type": "ProductGroup",
"productGroupID": "abc123",
"url": "https://www.example.com/products/summer-t-shirt"
}
}
Although during the discussions over at schema.org, it was suggested that ‘consumers of such data’ should allow for an even more compact form of annotating isVariantOf information, namely:
The reason is that from a publisher’s perspective, this is much easier to produce while still leading to the same end result: a referral to an entry point where the actual ProductGroup information can be found without adding a bunch of markup fluff to the majority of your pages.
Here is to hoping that Google will listen to that suggestion. (Note: It does a better job at listening to suggestions than many give it credit for).
As for where to publish ProductGroup information – as I expressed earlier, I expect that Google will tell us to publish the markup on the canonical URL of the product variants that are part of a group so as to stay compliant with its current guidelines on how to deal with product variant URLs.
Some Hopeful Speculation
Maybe someday, this will lead to us not always having to choose a canonical URL for every single product variant there is – something that is quite a cumbersome and technically complex task to achieve.
As if that was not enough of a task already, there is also the fact that many ecommerce platforms do not offer any such out-of-the-box functionality.
Instead, many platforms come with a ProductGroup-like page of sorts, in the form of a product detail page with no variable properties set (awaiting user interaction).
Wouldn’t it be really neat if you could simply publish ProductGroup markup on that single URL available to you (per ProductGroup) and have it include all the Product variant information as well?
Imagine if we could do that without losing any Rich Results – which sort of feels like being punished for trying to be transparent and telling the full (data) story of your web pages!
Who knows; maybe one day we will get there. With ProductGroup, there now at least is a schema.org Class, which, in theory, offers a working solution.
I guess time will tell if Google will also see the value and need for it.
“Enough with all the blah blah already, Jarno! Give me the good stuff. Show me what I can do with ProductGroup already.”
…OK, OK, OK, I hear ya. Hold onto your hats, boys and girls, here we go.
Getting Into The Nitty-gritty Of ProductGroup: A Blueprint For Its Members
Before diving into an illustrative markup example (which probably does not cover all you have to say about your products), let me first explain the idea behind ProductGroup’s second function:
Being a prototype/template/blueprint for the Product variants that are part of a ProductGroup.
What this means is that almost all properties (and their values) you add to a ProductGroup will be inherited by its Product variants – while also being able to express by which properties the products vary via the variesBy property.
This is quite ingenious (if it gets used by consumers as intended) as it prevents us from having to add loads of repetitive information.
For example, because the ProductGroup in my example has the property-value pair “material”: “cotton,” all the Product variants in the ProductGroup automatically inherit that same property-value pair.
This means that the main material of all the Product variants will be cotton without having to add that information to each and every single one of the Product variants.
At the same time, we can also inform consumers that the Product variants vary by color and size and that they can expect to find that information as part of the markup of the Product variants.
This makes for a lot of cross-referencing between entities – without necessarily having to make use of loads of uri/iri identifiers (via JSON-LD’s @id, Microdata’s itemid, RDFa’s resource).
If you think this will not make much of a difference, think again; the amount of Product information that repeats itself adds up really quickly.
Just in my example alone, I have easily saved up to 30-40 lines of code, roughly 20-25% of the total markup, because I defined the properties material, brand, aggregateRating and offer under ProductGroup.
And that is by making use of referencing via identifiers, thus already condensing the markup a lot. Without that and writing it out – fully – will easily add an additional 100-150 lines of markup.
Lastly, I should mention that each Product still has its own Offer markup, but the repetitive parts are placed under ProductGroup.offers.Offer.
{
"@context": "https://schema.org",
"@type": "ProductGroup",
"productGroupID": "abc123",
"url": "https://www.example.com/products/summer-t-shirt",
"material": "cotton",
"brand": {
"@type": "Brand",
"name": "Some brand"
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": 4.4,
"reviewCount": 89,
"bestRating": 5,
"worstRating": 1
},
"offers": {
"@type": "Offer",
"priceCurrency": "USD",
"itemCondition": "https://schema.org/NewCondition",
"hasMerchantReturnPolicy": {
"@type": "MerchantReturnPolicy",
"applicableCountry": "CH",
"returnPolicyCategory": "https://schema.org/MerchantReturnFiniteReturnWindow",
"merchantReturnDays": 60,
"returnMethod": "https://schema.org/ReturnByMail",
"returnFees": "https://schema.org/FreeReturn"
},
"shippingDetails": {
"@type": "OfferShippingDetails",
"shippingRate": {
"@type": "MonetaryAmount",
"currency": "USD",
"value": 3.49
},
"shippingDestination": {
"@type": "DefinedRegion",
"addressCountry": "US"
},
"deliveryTime": {
"@type": "ShippingDeliveryTime",
"handlingTime": {
"@type": "QuantitativeValue",
"minValue": 0,
"maxValue": 1,
"unitCode": "DAY"
},
"transitTime": {
"@type": "QuantitativeValue",
"minValue": 1,
"maxValue": 5,
"unitCode": "DAY"
}
}
}
},
"variesBy": [
"https://schema.org/color",
"https://schema.org/size"
],"hasVariant": [{
"@type": "Product",
"name": "Yellow summer t-shirt XL",
"description": "Extra large yellow t-shirt for the summer months.",
"image": "https://example.com/content/yellow-summer-t-shirt.jpg",
"inProductGroupWithID": "abc123",
"gtin": "8710123456789",
"color": "yellow",
"size": { // Make sure to also look into the new size property. It is a beast.
"@type": "SizeSpecification",
"name": "XL",
"sizeSystem": "https://schema.org/WearableSizeSystemUS",
"sizeGroup": [
"https://schema.org/WearableSizeGroupBig",
"https://schema.org/WearableSizeGroupTall"
]
},
"offers": {
"@type": "Offer",
"url": "https://www.example.com/products/summer-t-shirt?color=yellow&size=xl",
"price": 19.99,
"availability": "https://schema.org/InStock"
}
}, {
"@type": "Product",
"name": "Yellow summer t-shirt L",
"description": "Large yellow t-shirt for the summer months.",
"image": "https://example.com/content/yellow-summer-t-shirt.jpg",
"inProductGroupWithID": "abc123",
"gtin": "8711234567890",
"color": "yellow","size": { // Make sure to also look into the new size property. It is a beast.
"@type": "SizeSpecification",
"name": "L",
"sizeSystem": "https://schema.org/WearableSizeSystemUS",
"sizeGroup": [
"https://schema.org/WearableSizeGroupBig",
"https://schema.org/WearableSizeGroupTall"
]
},
"offers": {
"@type": "Offer",
"url": "https://www.example.com/products/summer-t-shirt?color=yellow&size=l",
"price": 18.99,
"availability": "https://schema.org/InStock"
}
}, {
"@type": "Product",
"name": "Light blue summer t-shirt XL",
"description": "Extra large light blue t-shirt for the summer months.",
"image": "https://example.com/content/light-blue-summer-t-shirt.jpg",
"inProductGroupWithID": "abc123",
"gtin": "8712345678901",
"color": "light blue",
"size": { // Make sure to also look into the new size property. It is a beast.
"@type": "SizeSpecification",
"name": "XL",
"sizeSystem": "https://schema.org/WearableSizeSystemUS",
"sizeGroup": [
"https://schema.org/WearableSizeGroupBig",
"https://schema.org/WearableSizeGroupTall"
]
},
"offers": {
"@type": "Offer",
"url": "https://www.example.com/products/summer-t-shirt?color=yellow&size=xl",
"price": 19.99,
"availability": "https://schema.org/InStock"
}
}, {
"@type": "Product",
"name": "Light blue summer t-shirt L",
"description": "Large light blue t-shirt for the summer months.",
"image": "https://example.com/content/light-blue-summer-t-shirt.jpg",
"inProductGroupWithID": "abc123",
"gtin": "8713456789012",
"color": "light blue",
"size": { // Make sure to also look into the new size property. It is a beast.
"@type": "SizeSpecification",
"name": "L",
"sizeSystem": "https://schema.org/WearableSizeSystemUS",
"sizeGroup": [
"https://schema.org/WearableSizeGroupBig",
"https://schema.org/WearableSizeGroupTall"
]
},
"offers": {
"@type": "Offer",
"url": "https://www.example.com/products/summer-t-shirt?color=light-blue&size=l",
"price": 18.99,
"availability": "https://schema.org/OutOfStock"
}
}]
}
“Great example Jarno – not! You should test your stuff before publishing, it produces errors and warnings and the products do not inherit a single property!”
Yep, I am well aware of that fact. But that is not because I lack the skills or awareness to do this properly. Support for this stuff is on the horizon, but it is not here yet.
This is still reflected by how the different validators/testing tools out there react to this type of markup.
We Will Not See ProductGroup Before Google Is Confident It Is Ready (Enough)
Because of the number of possible markup combinations, as well as the order in which this sort of markup can be expressed and how it all can be screwed up, it needs to be looked at and tested from many different angles prior to releasing it to the world.
Diligence is of the utmost importance, as things could easily go horribly wrong for both publishers and consumers alike.
There are plenty of tiny and large details left to discuss over at schema.org, but for now, it is in the hands of the Googlers involved in this (and their colleagues).
They need time to figure out how they will make this work within their systems and within which constraints at the start, followed by actually making it work and writing the proper documentation for it so that we can learn how to use it the way they would like us to.
This also implies that my markup examples should, by no means, be considered as “the way it will be done” regarding the order in which I have written it, the information it contains, or the number of properties the inheritance will work for.
My examples are based on the discussions we had over at schema.org and are meant to help you make a start with your own investigations.
Leaving You On A Cliffhanger
All of us are waiting for the next episode – produced by Google. I hope we don’t have to wait too long for this one, as I really want to discover how this ends – or rather, starts – for real.
And while we wait, you should start looking into this from as many angles as possible to figure out the potential implications for you, your clients, or your employer before you get caught off guard.
A Final Note
Schema.org is all about creating a workable public vocabulary that offers a “language” we can use for publishing and consuming semantic annotations.
Schema.org does not define how publishers nor consumers of their types and properties are supposed to use them – pretty much the same way dictionaries do not tell us how to write.
The people involved in schema.org have absolutely no say in how consumers of the vocabulary use its language, nor any say in the requirements parties like Google, Bing, and others document and publish.
Google’s search result pages provide many signals for improving organic rankings. That’s why it’s essential to actually search for each target keyword before optimizing a page or group of pages.
Keep an eye on the following sections.
Related Searches
The “Related searches” section usually resides after the first 10 or 20 results and contains phrases that produce a new set of results when clicked. The phrases are typically extended versions of the typed query but may include extraneous words.
For example, the results for “how to learn knitting” include related searches for more specific queries such as “How to knit for beginners step by step slowly” (presumably a section from a single page) and “how to knit with one needle” (likely a supporting page linking to a tutorial).
The results for “how to learn knitting” include related searches for more specific queries.
Another form of related queries is a list of topics that produce results when clicked without triggering a new search.
The same “how to learn knitting” query produces a “Related searches” section with topics such as “how to knit for beginners step-by-step with pictures” (links to video tutorials) and “Types of knitting” (different techniques).
A search for “how to learn knitting” query produces a “Related searches” section. Click image to enlarge.
Keyword Clusters
For popular queries, especially those at the start of a buying journey, Google often suggests narrowing the search based on the need. Searching “best laptops for teachers” produces the following topic clusters:
“Under $800,”
“Touchscreen,”
“Under 11.6 inches,”
“11.6 – 12.4 inches,”
“12.4 – 14 inches.”
Searching “best laptops for teachers” will produce the topic clusters, such as “Under $800” and “Touchscreen.” Click image to enlarge.
Google likely includes the clusters based on searchers’ query and click history. The clusters help merchants plan their site architecture to assist shoppers with computer needs without returning to Google.
For some queries, Google often shows a newer “Buying guide” section, providing even more insight into the kind of content to optimize. Google’s buying guide for “how to buy a laptop” includes these sections, among others:
“Dimensions: What are the different sizes of laptops?”
“Battery life: What is the average battery life of a laptop per charge?”
“Storage: What is the best storage capacity for a laptop?”
“Graphics: Which laptop has highest graphics?”
Google often shows a newer “Buying guide” section, such as for the query “how to buy a laptop.” Click image to enlarge.
Related Questions
“People also ask” boxes reveal questions of consumers when making buying decisions. Searching “best laptops for teachers” triggers these questions:
“What laptop do teachers recommend?”
“Why is a MacBook good for teachers?”
“Which laptop is best for education purposes?”
“Do teachers get discounts on laptops?”
“People also ask” boxes reveal questions of consumers when making buying decisions. Click image to enlarge.
Clicking any question will uncover even more info. For example, clicking “Do teachers get discounts on laptops?” provides a quick answer and follow-up queries, such as:
“Is Amazon Prime free for teachers?”
“Does Best Buy give teacher discount for Mac?”
Engaging with “People also ask” boxes provides obvious and not-so-obvious questions to answer when optimizing a page.
Search Intent Signals
Google’s SERP sections signal its interpretation of the searcher’s intent: commercial, informational, navigational, or investigational.
“Shopping” sections signal strong commercial intent, especially if repeated while scrolling down the page.
Image and video packs usually signal informational intent.
Knowledge panels, sitelinks, and same-domain results signal navigational intent and that Google recognizes the brand.
Keyword clusters and “Buying guide” sections tend to convey investigational intent of users researching options before buying.
Finding specific data on Google can be akin to searching for a needle in a haystack. Thankfully, Google’s search operators are here to transform your quests for knowledge into efficient and targeted searches. When used skillfully, these specialized commands can drastically reduce time spent sifting through irrelevant results. It can also help in your SEO work. This article will explain how Google search operators can help you navigate precisely and easily.
Understanding Google’s search operators
Google’s search operators are distinct commands that empower users to conduct highly targeted and sophisticated searches on the search engine. Users can exercise intricate control over the search process by incorporating these operators into search queries. This way, they can filter results more precisely and focus on particular content types.
These operators function as advanced directives beyond mere keywords. These fine-tune queries by specifying inclusivity, exclusivity, locality on the web, exact phrasing, and file formats. Using these properly, Google’s search operators transform rudimentary searches into finely tuned instruments. This makes it possible to slice through the search results to pinpoint the most relevant and specific data.
Searching with search operators can narrow down search results
The basics
Google’s search operators are special characters and commands extending regular searches’ capabilities. Search operators can narrow down results, search within a certain website, find specific file types, and more.
Below is a rundown of the most essential operators:
Quotation marks (“”): Placing quotation marks around a phrase or a group of words signals that you’re looking for that exact sequence without any alterations. Example: "global warming trends".
Minus (-): The minus operator is handy when excluding a particular word from your search results. You can place a minus sign immediately before the word or phrase to omit it from the search. Example: "satellite launch -Mars" to avoid results about Mars satellite launches.
Wildcard (*): For example, if you use the query SEO * tips, Google will return results for any phrase where the asterisk can be replaced with one or more words, such as “SEO optimization tips,” “SEO beginner tips,” or “SEO best practice tips.”
Site:: This operator is incredibly useful for searching within a single website or domain. In other words, it restricts your search to a specific site. Example: "site:nytimes.com climate change".
Filetype: / Ext: If you’re looking for specific file formats, such as PDFs, PowerPoints, or Excel spreadsheets, these operators allow you to specify the format. Example: "filetype:pdf renewable energy".
Advanced techniques:
Beyond the basic operators, several advanced commands cater to more specific needs:
Related: Use this operator to find websites with content similar to a URL you already know. Example: "related:yoast.com".
AROUND(X): An infrequently used but powerful operator, ‘AROUND(X)’ finds pages where the specified words are within X number of words apart. Replace X with a number to define the proximity of the search terms. Example: "solar AROUND(5) panels".
Combining operators for enhanced searching
Operators can be combined to fine-tune a search query. The key is to use proper syntax, such as including a colon after the operator with no space before the search term. Boolean logical operators like ‘AND’, ‘OR’, and ‘NOT’ (equivalent to “+”, “|”, and “-“) can also be used in uppercase to combine or exclude search terms. Remember, Google search is case-insensitive except for these Boolean operators.
Practical examples
But how would you use these Google search operators in your daily life? Well, here are a couple of examples:
To find government reports on renewable energy in PDF format, try: "renewable energy reports filetype:pdf site:gov".
For job listings that are not internships, you could search: "job listings -internship".
If you are looking for historical discussions about coffee but not involving tea and those that mention Brazil within five words of exporting, you will use: "history of coffee -tea Brazil AROUND(5) exporting".
Overcoming limitations
Understanding the nuances of search operators is essential, as misuse can lead to missing critical information. Keep in mind:
Specificity vs. breadth: Too many operators may overly limit results, whereas too few may not refine them enough. Experiment with the number of operators to find a middle ground.
Order matters: Operators should be placed directly before the terms they modify without spaces.
Keep updated: Google continually updates its search algorithms and features. Some operators may become deprecated over time, so staying informed about the latest search tools and strategies is important. Just last year, Google did a big update in how they handle search operators.
Using Google search operators for SEO
When it comes to SEO, understanding and using Google search operators can provide invaluable insights into how the search engine perceives a website. It also helps determine how it stands up against the competition. With these commands, you can diagnose SEO issues, conduct competitor research, and fine-tune your online strategy. Here are some practical applications of search operators for SEO purposes:
Site: The site: operator is crucial for SEO analysis. It allows you to see how many pages of your website are indexed by Google, which can indicate the site’s health. For example, a simple search like "site:yourwebsite.com" can reveal if essential pages are missing from the index.
Cache: Using cache: lets you view the last cached version of a webpage as seen by Google. This is important for identifying when a page was last crawled and whether the current page content aligns with what is indexed.
Intitle: / Allintitle: These operators can help identify how many competing pages target specific keywords in their titles, a critical element of SEO. For instance, "allintitle: organic coffee beans" will show pages with titles directly competing for that keyword phrase.
Inurl: / Allinurl: Discover how your competitors structure their URLs and which pages are optimized for particular keywords. For example, "inurl:blog coffee brewing techniques" may unearth competitors’ blog posts targeted toward coffee enthusiasts.
Filetype: / Ext: SEO isn’t just about web pages – it also encompasses downloadable content such as eBooks, PDFs, and guides. Use the filetype: operator to find what content your competitors offer and how it’s optimized. For instance, "filetype:pdf SEO guide" will let you see the variety of SEO guides available in PDF format.
-“search term”: By prefixing a minus to a competitor’s brand or website name, you can exclude them from search results and understand the competitive landscape. For example, "SEO strategies -site:competitor.com" provides insights into the SEO strategies employed by other players in the field.
More SEO tactics
Here are more practical examples demonstrating how Google’s search operators can be leveraged for various SEO tasks:
Check indexation:
To find out how many pages of a website are indexed by Google: site:example.com
To locate all indexed pages under a specific directory: site:example.com/blog/
To find indexed subdomains: site:*.example.com
Discover content and backlink opportunities:
To find guest post opportunities: "write for us" + "industry keyword" -site:example.com (exclude your site)
To locate potential resource pages for backlinks: "useful links" OR "resources" + "industry keyword" -site:example.com
To identify forums discussing your niche where you can participate: "forum" + "industry keyword"
Identifying link opportunities, which looks for websites mentioning certain keywords but not linking to your site. Example: "intext:beer brewing tips -site:yourwebsite.com".
Track down duplicate content issues:
To search for exact matches of a unique line of text from your content (to check for plagiarism): "A recognizable sentence from your content"
To find pages on the web that might have copied your product descriptions: "Your product description" -site:example.com
Perform competitor research:
To check how many and what type of pages a competitor has indexed: site:competitor.com
To see a competitor’s content on a specific topic: site:competitor.com + "topic"
To find out what PDF resources a competitor has: site:competitor.com filetype:pdf
Analyze on-site SEO elements:
To check if a website’s pages have titles that include a keyword: site:example.com intitle:"target keyword"
To find pages that might not be optimized well and missing the target keyword in the title: site:example.com -intitle:"target keyword"
Investigate local SEO and listings:
To find local citation opportunities: "industry keyword" + "location" + "add business" OR "submit listing"
To check for consistent NAP (Name, Address, Phone Number) citations: "Business Name" + "Phone Number"
SEO audit of website content:
To audit for outdated content: site:example.com + "2010..2020" + "outdated term"
To find content that’s not performing well and may need updating: site:example.com intext:"Last updated on" -intext:"2023"
Research keywords in URLs and titles:
To find URLs that contain a keyword that might suggest a focus on that keyword: inurl:"target keyword"
To see if competitors use certain words in their title tags: allintitle:"competitors’ keyword" -site:example.com
Miscellaneous checks:
To check your site (or a competitor’s) for specific HTML, JavaScript, or CSS files: site:example.com filetype:css
To find all the images indexed from a website: site:example.com filetype:jpg | filetype:png
These examples can be modified and combined creatively to fulfill various SEO-related needs. The tasks range from technical audits and content strategy to competitor analysis and link-building efforts. Refining search queries with operators lets you mine valuable insights and identify areas of improvement or opportunity more efficiently.
How to use Google search operators
SEO experts are always looking for ways to outperform competitors. These Google search operators provide the tools to extract valuable insights and audit your online presence. More importantly, they enable a proactive approach to crafting and modifying your SEO strategy. You aim to ensure your site remains relevant and highly visible in search results.
SEO is a dynamic and ongoing process. Integrate Google search operators into your regular SEO audits and competitive analysis. Doing so lets you gain a clearer perspective of your digital landscape. You can anticipate trends and adjust your strategies to remain ahead in your online work.