Google’s Mueller On How To Handle Legacy AMP Subdomains via @sejournal, @MattGSouthern

Google’s John Mueller advises site owners on managing outdated AMP subdomains, suggesting redirects or complete DNS removal.

  • Mueller recommends either keeping 301 redirects or removing the AMP subdomain entirely from DNS.
  • For sites with 500,000 pages, crawl budget impact from legacy AMP URLs isn’t a major concern.
  • AMP subdomains have their own separate crawl budget from the main domain.
How Page Performance Hurts UX & How You Can Fix It via @sejournal, @DebugBear

This post was sponsored by DebugBear. The opinions expressed in this article are the sponsor’s own.

From a user’s perspective, a slow website can be incredibly frustrating, creating a poor experience. But the impact of sluggish load times goes deeper than just user frustration.

Poor page performance affects search rankings, overall site engagement, E-E-A-T, and conversion rates that results in abandoned sessions, lost sales, and damaged trust.

Even if Google’s Core Web Vitals (CWV) Report is all green.

Sure, Chrome UX (CrUX) and Google’s CWV reports can indicate there’s an issue, but that’s it. They don’t provide you with enough details to identify, troubleshoot, and fix the issue.

And fixing these issues are vital to your digital success.

Core Web Vitals - DebugBear Page Performance ToolImage from DebugBear, October 2024

This article explores why slow websites are bad for user experience (UX), the challenges that cause them, and how advanced page performance tools can help fix these issues in ways that basic tools can’t.

UX, Brand Perception & Beyond

While often at the bottom of a technical SEO checklist, site speed is critical for UX. Sites that load in once second convert 2.5 to 3 times more than sites that require five seconds to load.

And yet, today, an estimated 14% of B2C ecommerce websites require five seconds or more to load.

These numbers become even more pronounced for mobile users, for whom pages load 70.9% slower. Mobile users have 31% fewer pageviews and an average of 4.8% higher bounce rate per session.

According to a recent Google study, 53% of mobile users will abandon a page if it takes more than three seconds to load.

Poor page experience can negatively other aspects of your site, too:

  • Search Rankings – Google includes page experience, of which CWV and page performance is a factor, when ranking web pages.
  • User Trust – Poor performing pages fail to meet a potential customer’s expectations. They are often perceived by users as the brand inconveniencing them, introducing stress, negative emotions, and a loss of a sense of control to the buying process. Slower pages can also cause users to forget information gained from previous pages, reducing the effectiveness of advertising, copy, and branding campaigns between clicks.
  • User Retention – Site visitors who experience slow load times may never return, reducing retention rates and customer loyalty.

Why Basic Page Performance Tools Don’t Fully Solve The Problem

Tools like Google PageSpeed Insights or Lighthouse give valuable insights into how your website performs, but they can often be limited. They tell you that there’s an issue but often fall short of explaining what caused it or how to fix it.

Google’s Chrome User Experience Report (CrUX) and Core Web Vitals have become essential in tracking website performance and user experience.

These metrics—Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS)—offer valuable insights into how users perceive a website’s speed and stability.

However, CrUX and Core Web Vitals only tell part of the story. They indicate that a problem exists but don’t show the root cause or offer an immediate path for improvement.

For instance, your LCP might be poor, but without deeper page speed analysis, you wouldn’t know whether it’s due to an unoptimized image, a slow server response, or third-party scripts.

Page Performance Broken Down By Geolocation - DebugBearImage from DebugBear, October 2024

Here’s where DebugBear stands out. DebugBear digs deeper, offering more granular data and unique features that basic tools don’t provide.

Continuous Monitoring and Historical Data – Many speed testing tools only offer snapshots of performance data. DebugBear, on the other hand, allows for continuous monitoring over time, providing an ongoing view of your site’s performance. This is crucial for detecting issues that crop up unexpectedly or tracking the effectiveness of your optimizations.

Granular Breakdown by Device, Location, and Browser – Basic tools often provide aggregated data, which hides the differences between user experiences across various devices, countries, and network conditions. DebugBear lets you drill down to see how performance varies, allowing you to optimize for specific user segments.

Pinpointing Content Elements Causing Delays – One of DebugBear’s standout features is its ability to show exactly which content elements—images, scripts, or third-party code—are slowing down your website. Rather than wasting hours digging through code and experimenting with trial and error, DebugBear highlights the specific elements causing delays, allowing for targeted, efficient fixes.

Why You Need Continuous Page Speed Testing

One of the biggest pitfalls in web performance optimization is relying on single-point speed tests.

Page Performance Breakdown - Content Elements in DebugBearImage from DebugBear, October 2024

Running a one-time test may give you a snapshot of performance at that moment, but it doesn’t account for fluctuations caused by different factors, such as traffic spikes, varying user devices, or changes to site content.

Without continuous testing, you risk spending hours (or even days) trying to identify the root cause of performance issues.

DebugBear solves this problem by continuously tracking page speed across different devices and geographies, offering detailed reports that can be easily shared with team members or stakeholders.

If a performance dip occurs, DebugBear provides the data necessary to quickly identify and rectify the issue, saving you from the endless trial-and-error process of manual debugging.

Without tools like DebugBear, you’re left with only a high-level view of your website’s performance.

This means hours of trying to guess the underlying issues based on broad metrics, with no real insight into what’s dragging a site down.

Different Users Experience Performance Differently

Not all users experience your website’s performance in the same way.

Device type, geographic location, and network speed can significantly affect load times and interaction delays.

For example, a user on a fast fiberoptic connection in the U.S. may have a completely different experience than someone on a slower mobile network in India.

This variance in user experience can be hidden in aggregate data, leading you to believe your site is performing well when a significant portion of your audience is actually struggling with slow speeds.

Here’s why breaking down performance data by device, country, and browser matters:

  • Device-Specific Optimizations – Some elements, like large images or animations, may perform well on desktop but drag down speeds on mobile.
  • Geographic Performance Variations – International users may experience slower speeds due to server location or network conditions. DebugBear can highlight these differences and help you optimize your content delivery network (CDN) strategy.
  • Browser Differences – Different browsers may handle elements like JavaScript and CSS in different ways, impacting performance. DebugBear’s breakdown by browser ensures you’re not overlooking these subtleties.

Without this granular insight, you risk alienating segments of your audience and overlooking key areas for optimization.

And troubleshooting these issues becomes and expensive nightmare.

Just ask SiteCare.

WordPress web development and optimization service provider SiteCare uses DebugBear to quickly troubleshoot a full range of WordPress sites, solve performance issues faster, and monitor them for changes, providing high quality service to its clients, saving thousands of hours and dollars every year.

DebugBear offers these breakdowns, providing a clear view of how your website performs for all users, not just a select few.

Real User Monitoring: The Key To Accurate Performance Insights

In addition to synthetic testing (which mimics user interactions), real user monitoring (RUM) is another powerful feature technical SEOs and marketing teams will find valuable.

While synthetic tests offer valuable controlled insights, they don’t always reflect the real-world experiences of your users.

RUM captures data from actual users as they interact with your site, providing real-time, accurate insights into what’s working and what isn’t.

For instance, real user monitoring can help you:

  • Identify performance issues unique to specific user segments.
  • Detect trends that may not be visible in synthetic tests, such as network issues or slow third-party scripts.
  • Measure the actual experience users are having on your website, not just the theoretical one.

Without real user monitoring, you might miss critical issues that only surface under specific conditions, like a heavy user load or slow mobile networks.

If you’re not using continuous page speed testing and in-depth reports, you’re flying blind.

You may see an overall decline in performance without understanding why, or you could miss opportunities for optimization that only reveal themselves under specific conditions.

The result?

Wasted time, frustrated users, lost conversions, and a website that doesn’t perform up to its potential.

DebugBear solves this by offering both continuous monitoring and granular breakdowns, making it easier to troubleshoot issues quickly and accurately.

With detailed reports, you’ll know exactly what to fix and where to focus your optimization efforts, significantly cutting down on the time spent searching for problems.


Image Credits

Featured Image: Image by Shutterstock. Used with permission.

In-Post Images: Images by DebugBear. Used with permission.

Google To Retire Sitelinks Search Box In November via @sejournal, @MattGSouthern

Google has announced the retirement of the sitelinks search box feature.

This change, set to take effect on November 21, marks the end of a tool that has been part of Google Search for over a decade.

The sitelinks search box, introduced in 2014, allowed users to perform site-specific searches directly from Google’s search results page.

It appeared above the sitelinks for certain websites, usually when searching for a company by name.

Declining Usage

Google cites declining usage as the reason for this decision, stating:

“Over time, we’ve noticed that usage has dropped.”

Potential Impact

Google affirms that removing the sitelinks search box won’t affect search rankings or the display of other sitelinks.

This change is purely visual and doesn’t impact a site’s position in search results.

Implementation

This update will be rolled out globally, affecting search results in all languages and countries.

Google has confirmed that the change won’t be listed in the Search status dashboard, indicating that it’s not considered a significant algorithmic update.

Search Console & Rich Results Test

Following the removal of the sitelinks search box, Google plans to update the following tools:

  1. The Search Console rich results report for sitelinks search box will be removed.
  2. The Rich Results Test will no longer highlight the related markup.

Structured Data Considerations

While you can remove the sitelinks search box structured data from their sites, Google says that’s unnecessary.

Unsupported structured data won’t cause issues in Search or trigger errors in Search Console reports.

It’s worth noting that the ‘WebSite’ structured data, also used for site names, continues to be supported.

Historical Context

The sitelinks search box was initially announced in September 2014 as an improvement to help users find specific website content more easily.

It supported features like autocomplete and allowed websites to implement schema markup for better integration with their own search pages.

Looking Ahead

Website owners and SEO professionals should take note of this update, though no immediate action is required.


Featured Image: MrB11/Shutterstock

Google Explains How Cumulative Layout Shift (CLS) Is Measured via @sejournal, @MattGSouthern

Google’s Web Performance Developer Advocate, Barry Pollard, has clarified how Cumulative Layout Shift (CLS) is measured.

CLS quantifies how much unexpected layout shift occurs when a person browses your site.

This metric matters to SEO as it’s one of Google’s Core Web Vitals. Pages with low CLS scores provide a more stable experience, potentially leading to better search visibility.

How is it measured? Pollard addressed this question in a thread on X.

Understanding CLS Measurement

Pollard began by explaining the nature of CLS measurement:

“CLS is ‘unitless’ unlike LCP and INP which are measured in seconds/milliseconds.”

He further clarified:

“Each layout shift is calculated by multipyling two percentages or fractions together: What moved (impact fraction) How much it moved (distance fraction).”

This calculation method helps quantify the severity of layout shifts.

As Pollard explained:

“The whole viewport moves all the way down – that’s worse than just half the view port moving all the way down. The whole viewport moving down a little? That’s not as bad as the whole viewport moving down a lot.”

Worse Case Scenario

Pollard described the worst-case scenario for a single layout shift:

“The maximum layout shift is if 100% of the viewport (impact fraction = 1.0) is moved one full viewport down (distance fraction = 1.0).

This gives a layout shift score of 1.0 and is basically the worst type of shift.”

However, he reminds us of the cumulative nature of CLS:

“CLS is Cumulative Layout Shift, and that first word (cumulative) matters. We take all the individual shifts that happen within a short space of time (max 5 seconds) and sum them up to get the CLS score.”

Pollard explained the reasoning behind the 5-second measurement window:

“Originally we cumulated ALL the shifts, but that didn’t really measure the UX—especially for pages opened for a long time (think SPAs or email). Measuring all shifts meant, given enough, time even the best pages would fail!”

He also noted the theoretical maximum CLS score:

“Since each element can only shift when a frame is drawn and we have a 5 second cap and most devices run at 60fps, that gives a theoretical cap on CLS of 5 secs * 60 fps * 1.0 max shift = 300.”

Interpreting CLS Scores

Pollard addressed how to interpret CLS scores:

“… it helps to think of CLS as a percentage of movement. The good threshold of 0.1 means about the page moved 10%—which could mean the whole page moved 10%, or half the page moved 20%, or lots of little movements were equivalent to either of those.”

Regarding the specific threshold values, Pollard explained:

“So why is 0.1 ‘good’ and 0.25 ‘poor’? That’s explained here as was a combination of what we’d want (CLS = 0!) and what is achievable … 0.05 was actually achievable at the median, but for many sites it wouldn’t be, so went slightly higher.”

See also: What is CLS and How to Optimize It?

Why This Matters

Pollard’s insights provide web developers and SEO professionals with a clearer understanding of measuring and optimizing for CLS.

As you work with CLS, keep these points in mind:

  • CLS is unitless and calculated from impact and distance fractions.
  • It’s cumulative, measuring shifts over a 5-second window.
  • The “good” threshold of 0.1 roughly equates to 10% of viewport movement.
  • CLS scores can exceed 1.0 due to multiple shifts adding up.
  • The thresholds (0.1 for “good”, 0.25 for “poor”) balance ideal performance with achievable goals.

With this insight, you can make adjustments to achieve Google’s threshold.


Featured Image: Piscine26/Shutterstock

Google Expands Structured Data Support For Product Certifications via @sejournal, @MattGSouthern

Google has announced an update to its Search Central documentation, introducing support for certification markup in product structured data.

This change will take full effect in April and aims to provide more comprehensive and globally relevant product information.

New Certification Markup For Merchant Listings

Google has added Certification markup support for merchant listings in its product structured data documentation.

This addition allows retailers and ecommerce sites to include detailed certification information about their products

Transition From EnergyConsumptionDetails to Certification Type

A key aspect of this update is replacing the EnergyConsumptionDetails type with the more versatile Certification type.

The new type can support a wider range of countries and broader certifications.

Google recommends websites using EnergyConsumptionDetails in their structured data to switch to the Certification type before April.

This will ensure product pages remain optimized for Google’s merchant listing experiences.

Expanded Capabilities & Global Relevance

The move to the Certification type represents an expansion in the types of product certifications that can be communicated through structured data.

While energy efficiency ratings were a primary focus of the EnergyConsumptionDetails type, the new Certification markup can encompass a much wider array of product certifications and standards.

This change is relevant for businesses operating in multiple countries, as it allows for more nuanced and locally applicable certification information to be included

Implementation Guidelines

Google has provided examples in its updated documentation to guide webmasters in implementing the new Certification markup.

These examples include specifying certifications such as CO2 emission classes for vehicles and energy efficiency labels for electronics.

The structured data should be added to product pages using JSON-LD format, with the Certification type nested within the product’s structured data.

Review the full documentation to ensure proper implementation.

Including certification information in structured data could lead to more informative product listings, potentially influencing user click-through rates and purchase decisions.

For consumers, this update means access to more detailed and standardized product information directly in search results, particularly regarding certifications and compliance with various standards.

Next Steps

Website owners and SEO professionals should take the following steps:

  1. Review current use of EnergyConsumptionDetails in product structured data.
  2. Plan for the transition to the Certification type before April.
  3. Implement the new Certification markup on product pages, following Google’s guidelines.
  4. Test the implementation using Google’s Rich Results Test tool.

As with any significant change to structured data implementation, it is advisable to monitor search performance and rich result appearances after making these updates.


Featured Image: lilik ferri yanto/Shutterstock

Ask An SEO: How To Stop Filter Results From Eating Crawl Budget via @sejournal, @rollerblader

Today’s Ask An SEO question comes from Michal in Bratislava, who asks:

“I have a client who has a website with filters based on a map locations. When the visitor makes a move on the map, a new URL with filters is created. They are not in the sitemap. However, there are over 700,000 URLs in the Search Console (not indexed) and eating crawl budget.

What would be the best way to get rid of these URLs? My idea is keep the base location ‘index, follow’ and newly created URLs of surrounded area with filters switch to ‘noindex, no follow’. Also mark surrounded areas with canonicals to the base location + disavow the unwanted links.”

Great question, Michal, and good news! The answer is an easy one to implement.

First, let’s look at what you’re trying and apply it to other situations like ecommerce and publishers. This way, more people can benefit. Then, go into your strategies above and end with the solution.

What Crawl Budget Is And How Parameters Are Created That Waste It

If you’re not sure what Michal is referring to with crawl budget, this is a term some SEO pros use to explain that Google and other search engines will only crawl so many pages on your website before it stops.

If your crawl budget is used on low-value, thin, or non-indexable pages, your good pages and new pages may not be found in a crawl.

If they’re not found, they may not get indexed or refreshed. If they’re not indexed, they cannot bring you SEO traffic.

This is why optimizing a crawl budget for efficiency is important.

Michal shared an example of how “thin” URLs from an SEO point of view are created as customers use filters.

The experience for the user is value-adding, but from an SEO standpoint, a location-based page would be better. This applies to ecommerce and publishers, too.

Ecommerce stores will have searches for colors like red or green and products like t-shirts and potato chips.

These create URLs with parameters just like a filter search for locations. They could also be created by using filters for size, gender, color, price, variation, compatibility, etc. in the shopping process.

The filtered results help the end user but compete directly with the collection page, and the collection would be the “non-thin” version.

Publishers have the same. Someone might be on SEJ looking for SEO or PPC in the search box and get a filtered result. The filtered result will have articles, but the category of the publication is likely the best result for a search engine.

These filtered results can be indexed because they get shared on social media or someone adds them as a comment on a blog or forum, creating a crawlable backlink. It might also be an employee in customer service responded to a question on the company blog or any other number of ways.

The goal now is to make sure search engines don’t spend time crawling the “thin” versions so you can get the most from your crawl budget.

The Difference Between Indexing And Crawling

There’s one more thing to learn before we go into the proposed ideas and solutions – the difference between indexing and crawling.

  • Crawling is the discovery of new pages within a website.
  • Indexing is adding the pages that are worthy of showing to a person using the search engine to the database of pages.

Pages can get crawled but not indexed. Indexed pages have likely been crawled and will likely get crawled again to look for updates and server responses.

But not all indexed pages will bring in traffic or hit the first page because they may not be the best possible answer for queries being searched.

Now, let’s go into making efficient use of crawl budgets for these types of solutions.

Using Meta Robots Or X Robots

The first solution Michal pointed out was an “index,follow” directive. This tells a search engine to index the page and follow the links on it. This is a good idea, but only if the filtered result is the ideal experience.

From what I can see, this would not be the case, so I would recommend making it “noindex,follow.”

Noindex would say, “This is not an official page, but hey, keep crawling my site, you’ll find good pages in here.”

And if you have your main menu and navigational internal links done correctly, the spider will hopefully keep crawling them.

Canonicals To Solve Wasted Crawl Budget

Canonical links are used to help search engines know what the official page to index is.

If a product exists in three categories on three separate URLs, only one should be “the official” version, so the two duplicates should have a canonical pointing to the official version. The official one should have a canonical link that points to itself. This applies to the filtered locations.

If the location search would result in multiple city or neighborhood pages, the result would likely be a duplicate of the official one you have in your sitemap.

Have the filtered results point a canonical back to the main page of filtering instead of being self-referencing if the content on the page stays the same as the original category.

If the content pulls in your localized page with the same locations, point the canonical to that page instead.

In most cases, the filtered version inherits the page you searched or filtered from, so that is where the canonical should point to.

If you do both noindex and have a self-referencing canonical, which is overkill, it becomes a conflicting signal.

The same applies to when someone searches for a product by name on your website. The search result may compete with the actual product or service page.

With this solution, you’re telling the spider not to index this page because it isn’t worth indexing, but it is also the official version. It doesn’t make sense to do this.

Instead, use a canonical link, as I mentioned above, or noindex the result and point the canonical to the official version.

Disavow To Increase Crawl Efficiency

Disavowing doesn’t have anything to do with crawl efficiency unless the search engine spiders are finding your “thin” pages through spammy backlinks.

The disavow tool from Google is a way to say, “Hey, these backlinks are spammy, and we don’t want them to hurt us. Please don’t count them towards our site’s authority.”

In most cases, it doesn’t matter, as Google is good at detecting spammy links and ignoring them.

You do not want to add your own site and your own URLs to the disavow tool. You’re telling Google your own site is spammy and not worth anything.

Plus, submitting backlinks to disavow won’t prevent a spider from seeing what you want and do not want to be crawled, as it is only for saying a link from another site is spammy.

Disavowing won’t help with crawl efficiency or saving crawl budget.

How To Make Crawl Budgets More Efficient

The answer is robots.txt. This is how you tell specific search engines and spiders what to crawl.

You can include the folders you want them to crawl by marketing them as “allow,” and you can say “disallow” on filtered results by disallowing the “?” or “&” symbol or whichever you use.

If some of those parameters should be crawled, add the main word like “?filter=location” or a specific parameter.

Robots.txt is how you define crawl paths and work on crawl efficiency. Once you’ve optimized that, look at your internal links. A link from one page on your site to another.

These help spiders find your most important pages while learning what each is about.

Internal links include:

  • Breadcrumbs.
  • Menu navigation.
  • Links within content to other pages.
  • Sub-category menus.
  • Footer links.

You can also use a sitemap if you have a large site, and the spiders are not finding the pages you want with priority.

I hope this helps answer your question. It is one I get a lot – you’re not the only one stuck in that situation.

More resources: 


Featured Image: Paulo Bobita/Search Engine Journal

Google Phases Out Support For Noarchive Meta Tag via @sejournal, @MattGSouthern

In a recent update to its Search Central documentation, Google has officially relegated the ‘noarchive’ rule to a historical reference section.

The new text in Google’s help document reads:

“The noarchive rule is no longer used by Google Search to control whether a cached link is shown in search results, as the cached link feature no longer exists.”

This move follows Google’s earlier decision to remove the cache: search operator, which was reported last week.

Implications For Websites

While Google says websites don’t need to remove the meta tag, it noted that “other search engines and services may be using it.”

The ‘noarchive‘ tag has been a staple of SEO practices for years, allowing websites to prevent search engines from storing cached versions of their pages.

Its relegation to a historical reference highlights the dynamic nature of Google Search.

The Gradual Phasing Out of Cached Pages

This documentation update aligns with Google’s gradual phasing out of the cached page feature.

Last week, Google removed the documentation for the cache: search operator, which had allowed users to view Google’s stored version of a webpage.

At the time, Google’s Search Liaison explained on social media that the cache feature was originally intended to help users access pages when loading was unreliable.

With improvements in web technology, Google deemed the feature no longer necessary.

As an alternative, Google has begun incorporating links to the Internet Archive’s Wayback Machine in its “About this page” feature, providing searchers with a way to view historical versions of webpages.

Controlling Archiving In The Wayback Machine

The ‘noarchive’ tag doesn’t affect the Internet Archive’s Wayback Machine.

The Wayback Machine, which Google now links to in search results pages, has its own rules for archiving and exclusion.

To prevent pages from being archived by the Wayback Machine, you have several options:

  1. Robots.txt: Adding specific directives to the robots.txt file can prevent the Wayback Machine from crawling and archiving pages. For example:
    1. User-agent: ia_archiver
      Disallow: /
  2. Direct Request: Website owners can contact the Internet Archive to request removal of specific pages or domains from the Wayback Machine.
  3. Password Protection: Placing content behind a login wall effectively prevents it from being archived.

Note that these methods are specific to the Wayback Machine and differ from Google’s now-deprecated ‘noarchive’ tag.

Conclusion

As search technology advances, it’s common to see legacy features retired in favor of new solutions.

It’s time to update those best practice guides to note Google’s deprecation of noarchive.


Featured Image: Tada Images/Shutterstock

Google Adds Two New Best Practices For Product Markup via @sejournal, @MattGSouthern

Google updates guidance on Product markup, advising ecommerce sites to prioritize HTML implementation and use JavaScript cautiously.

  • Google recommends including Product markup in initial HTML for best results.
  • JavaScript-generated markup can lead to less frequent and reliable crawls.
  • E-commerce sites using JavaScript for product data should ensure servers can handle increased traffic.
Faceted Navigation: Best Practices For SEO via @sejournal, @natalieannhoben

When it comes to large websites, such as ecommerce sites with thousands upon thousands of pages, the importance of things like crawl budget cannot be understated.

Building a website with an organized architecture and smart internal linking strategy is key for these types of sites.

However, doing that properly oftentimes involves new challenges when trying to accommodate various attributes that are a common theme with ecommerce (sizes, colors, price ranges, etc.).

Faceted navigation can help solve these challenges on large websites.

However, faceted navigation must be well thought out and executed properly so that both users and search engine bots remain happy.

What Is Faceted Navigation?

To begin, let’s dive into what faceted navigation actually is.

Faceted navigation is, in most cases, located on the sidebars of an e-commerce website and has multiple categories, files, and facets.

It essentially allows people to customize their search based on what they are looking for on the site.

For example, a visitor may want a purple cardigan, in a size medium, with black trim.

Facets are indexed categories that help to narrow down a production listing and also function as an extension of a site’s main categories.

Facets, in their best form, should ideally provide a unique value for each selection and, as they are indexed, each one on a site should send relevancy signals to search engines by making sure that all critical attributes appear within the content of the page.

Example of Facet Navigation from newegg.comExample of Facet Navigation from newegg.com, August 2024

Filters are utilized to sort items with a listings page.

While the user can use this to narrow down what they are looking for, the actual content on the page remains the same.

This can potentially lead to multiple URLs creating duplicate content, which is a concern for SEO.

There are a few potential issues that faceted navigation can create that can negatively affect SEO. The main three issues boil down to:

  • Duplicate content.
  • Wasted crawl budget.
  • Diluted link equity.

The number of highly related pieces of content continues to grow significantly, and different links may be going to all of these different versions of a page, which can dilute link equity and thus affect the page’s ranking ability as well as create infinite crawl space.

You need to take certain steps to ensure that search engine crawlers aren’t wasting valuable crawl budgets on pages that have little to no value.

Canonicalization

Turning facet search pages into SEO-friendly canonical URLs for collection landing pages is a common SEO strategy.

For example, if you want to target the keyword “gray t-shirts,” which is broad in context, it would not be ideal to focus on a single specific t-shirt. Instead, the keyword should be used on a page that lists all available gray t-shirts. This can be achieved by turning facets into user-friendly URLs and canonicalizing them.

For example, Zalando’s facets are great examples where it uses facets as collection pages.

Facets as collection pagesScreenshot from Zalando, August 2024

When you search in Google [gray t-shirts] you can see Zalando’s facet page ranking in the top #10.

Zalando's gray t-shirts page ranking in Google searchScreenshot from search for [gray t-shirts], Google, August 2024

If you try to add another filter over a gray t-shirt, let’s say the brand name ‘Adidas,’ you will get a new SEO-friendly URL with canonical meta tags and proper hreflangs for multiple languages in the source code

https://www.zalando.co.uk/t-shirts/adidas_grey/



However, if you decide to include a copy on those pages, make sure you change the H1 tag and copy accordingly to avoid keyword cannibalization.

Noindex

Noindex tags can be implemented to inform bots of which pages not to include in the index.

For example, if you wished to include a page for “gray t-shirt” in the index, but did not want pages with price filter in the index, then a noindex tag to the second result would exclude it.

For example, if you have price filters that have these URLs…

https://www.exampleshop.com/t-shirts/grey/?price_from=82

…And if you don’t want them to appear in the index, you can use the “noindex” meta robots tag in the tag:

This method tells search engines to “noindex” the page filtered by price.

Note that even if this approach removes pages from the index, there will still be crawl budget spent on them if search engine bots find those links and crawl these pages. For optimizing crawl budget, using robots.txt is the best approach.

Robots.txt

Disallowing facet search pages via robots.txt is the best way to manage crawl budget. To disallow pages with price parameters, e.g. ‘/?price=50_100’, you can use the following robots.txt rule.

Disallow: *price=*

This directive informs search engines not to crawl any URL that includes the ‘price=’ parameter, thus optimizing the crawl budget by excluding these pages.

However, if any outbound links pointing to any URL with that parameter in it existed, Google could still possibly index it. If the quality of those backlinks is high, you may consider using noindex and canonical approach to consolidate the link equity to a preferred URL.

Otherwise, you don’t need to worry about that, as Google confirmed they will drop over time.

Other Ways To Get The Most Out Of Faceted Navigation

  • Implement pagination with rel=”next” and rel=”prev” in order to group indexing properties from pages to a series as a whole.
  • Each page needs to link to children pages and parent. This can be done with breadcrumbs.
  • Only use canonical URLs in sitemaps in case you choose to canonicalize your facets search pages.
  • Include unique H1 tags and content in case of canonicalized facet URLs.
  • Facets should always be presented in a unified, logical manner (i.e., alphabetical order).
  • Implement AJAX for filtering to allow users to see results without reloading the page. However always change the URL after filtering so users can bookmark their searched pages and visit them later. Never implement AJAX without changing the URL.
  • Make sure faceted navigation is optimized for all devices, including mobile, through responsive design.

Conclusion

Although faceted navigation can be great for UX, it can cause a multitude of problems for SEO.

Duplicate content, wasted crawl budget, and diluted link equity can all cause severe problems on a site. However, you can fix those issues by applying one of the strategies discussed in this article.

It is crucial to carefully plan and implement facet navigation in order to avoid many issues down the line when it comes to faceted navigation.

More resources:


Featured Image: RSplaneta/Shutterstock

All screenshots taken by author