The Modern Guide To Robots.txt: How To Use It Avoiding The Pitfalls via @sejournal, @abbynhamilton

Robots.txt just turned 30 – cue the existential crisis! Like many hitting the big 3-0, it’s wondering if it’s still relevant in today’s world of AI and advanced search algorithms.

Spoiler alert: It definitely is!

Let’s take a look at how this file still plays a key role in managing how search engines crawl your site, how to leverage it correctly, and common pitfalls to avoid.

What Is A Robots.txt File?

A robots.txt file provides crawlers like Googlebot and Bingbot with guidelines for crawling your site. Like a map or directory at the entrance of a museum, it acts as a set of instructions at the entrance of the website, including details on:

  • What crawlers are/aren’t allowed to enter?
  • Any restricted areas (pages) that shouldn’t be crawled.
  • Priority pages to crawl – via the XML sitemap declaration.

Its primary role is to manage crawler access to certain areas of a website by specifying which parts of the site are “off-limits.” This helps ensure that crawlers focus on the most relevant content rather than wasting the crawl budget on low-value content.

While a robots.txt guides crawlers, it’s important to note that not all bots follow its instructions, especially malicious ones. But for most legitimate search engines, adhering to the robots.txt directives is standard practice.

What Is Included In A Robots.txt File?

Robots.txt files consist of lines of directives for search engine crawlers and other bots.

Valid lines in a robots.txt file consist of a field, a colon, and a value.

Robots.txt files also commonly include blank lines to improve readability and comments to help website owners keep track of directives.

Sample robots.txt fileImage from author, November 2024

To get a better understanding of what is typically included in a robots.txt file and how different sites leverage it, I looked at robots.txt files for 60 domains with a high share of voice across health, financial services, retail, and high-tech.

Excluding comments and blank lines, the average number of lines across 60 robots.txt files was 152.

Large publishers and aggregators, such as hotels.com, forbes.com, and nytimes.com, typically had longer files, while hospitals like pennmedicine.org and hopkinsmedicine.com typically had shorter files. Retail site’s robots.txt files typically fall close to the average of 152.

All sites analyzed include the fields user-agent and disallow within their robots.txt files, and 77% of sites included a sitemap declaration with the field sitemap.

Fields leveraged less frequently were allow (used by 60% of sites) and crawl-delay (used by 20%) of sites.

Field % of Sites Leveraging
user-agent 100%
disallow 100%
sitemap 77%
allow 60%
crawl-delay 20%

Robots.txt Syntax

Now that we’ve covered what types of fields are typically included in a robots.txt, we can dive deeper into what each one means and how to use it.

For more information on robots.txt syntax and how it is interpreted by Google, check out Google’s robots.txt documentation.

User-Agent

The user-agent field specifies what crawler the directives (disallow, allow) apply to. You can use the user-agent field to create rules that apply to specific bots/crawlers or use a wild card to indicate rules that apply to all crawlers.

For example, the below syntax indicates that any of the following directives only apply to Googlebot.

user-agent: Googlebot

If you want to create rules that apply to all crawlers, you can use a wildcard instead of naming a specific crawler.

user-agent: *

You can include multiple user-agent fields within your robots.txt to provide specific rules for different crawlers or groups of crawlers, for example:

user-agent: *

#Rules here would apply to all crawlers

user-agent: Googlebot

#Rules here would only apply to Googlebot

user-agent: otherbot1

user-agent: otherbot2

user-agent: otherbot3

#Rules here would apply to otherbot1, otherbot2, and otherbot3

Disallow And Allow

The disallow field specifies paths that designated crawlers should not access. The allow field specifies paths that designated crawlers can access.

Because Googlebot and other crawlers will assume they can access any URLs that aren’t specifically disallowed, many sites keep it simple and only specify what paths should not be accessed using the disallow field.

For example, the below syntax would tell all crawlers not to access URLs matching the path /do-not-enter.

user-agent: *

disallow: /do-not-enter

#All crawlers are blocked from crawling pages with the path /do-not-enter

If you’re using both allow and disallow fields within your robots.txt, make sure to read the section on order of precedence for rules in Google’s documentation.

Generally, in the case of conflicting rules, Google will use the more specific rule.

For example, in the below case, Google won’t crawl pages with the path/do-not-enter because the disallow rule is more specific than the allow rule.

user-agent: *

allow: /

disallow: /do-not-enter

If neither rule is more specific, Google will default to using the less restrictive rule.

In the instance below, Google would crawl pages with the path/do-not-enter because the allow rule is less restrictive than the disallow rule.

user-agent: *

allow: /do-not-enter

disallow: /do-not-enter

Note that if there is no path specified for the allow or disallow fields, the rule will be ignored.

user-agent: *

disallow:

This is very different from only including a forward slash (/) as the value for the disallow field, which would match the root domain and any lower-level URL (translation: every page on your site).  

If you want your site to show up in search results, make sure you don’t have the following code. It will block all search engines from crawling all pages on your site.

user-agent: *

disallow: /

This might seem obvious, but believe me, I’ve seen it happen.

URL Paths

URL paths are the portion of the URL after the protocol, subdomain, and domain beginning with a forward slash (/). For the example URL https://www.example.com/guides/technical/robots-txt, the path would be /guides/technical/robots-txt.

Example URL structureImage from author, November 2024

URL paths are case-sensitive, so be sure to double-check that the use of capitals and lower cases in the robot.txt aligns with the intended URL path.

Special Characters

Google, Bing, and other major search engines also support a limited number of special characters to help match URL paths.

A special character is a symbol that has a unique function or meaning instead of just representing a regular letter or number. Special characters supported by Google in robots.txt are:

  • Asterisk (*) – matches 0 or more instances of any character.
  • Dollar sign ($) – designates the end of the URL.

To illustrate how these special characters work, assume we have a small site with the following URLs:

  • https://www.example.com/
  • https://www.example.com/search
  • https://www.example.com/guides
  • https://www.example.com/guides/technical
  • https://www.example.com/guides/technical/robots-txt
  • https://www.example.com/guides/technical/robots-txt.pdf
  • https://www.example.com/guides/technical/xml-sitemaps
  • https://www.example.com/guides/technical/xml-sitemaps.pdf
  • https://www.example.com/guides/content
  • https://www.example.com/guides/content/on-page-optimization
  • https://www.example.com/guides/content/on-page-optimization.pdf

Example Scenario 1: Block Site Search Results

A common use of robots.txt is to block internal site search results, as these pages typically aren’t valuable for organic search results.

For this example, assume when users conduct a search on https://www.example.com/search, their query is appended to the URL.

If a user searched “xml sitemap guide,” the new URL for the search results page would be https://www.example.com/search?search-query=xml-sitemap-guide.

When you specify a URL path in the robots.txt, it matches any URLs with that path, not just the exact URL. So, to block both the URLs above, using a wildcard isn’t necessary.

The following rule would match both https://www.example.com/search and https://www.example.com/search?search-query=xml-sitemap-guide.

user-agent: *

disallow: /search

#All crawlers are blocked from crawling pages with the path /search

If a wildcard (*) were added, the results would be the same.

user-agent: *

disallow: /search*

#All crawlers are blocked from crawling pages with the path /search

Example Scenario 2: Block PDF files

In some cases, you may want to use the robots.txt file to block specific types of files.

Imagine the site decided to create PDF versions of each guide to make it easy for users to print. The result is two URLs with exactly the same content, so the site owner may want to block search engines from crawling the PDF versions of each guide.

In this case, using a wildcard (*) would be helpful to match the URLs where the path starts with /guides/ and ends with .pdf, but the characters in between vary.

user-agent: *

disallow: /guides/*.pdf

#All crawlers are blocked from crawling pages with URL paths that contain: /guides/, 0 or more instances of any character, and .pdf

The above directive would prevent search engines from crawling the following URLs:

  • https://www.example.com/guides/technical/robots-txt.pdf
  • https://www.example.com/guides/technical/xml-sitemaps.pdf
  • https://www.example.com/guides/content/on-page-optimization.pdf

Example Scenario 3: Block Category Pages

For the last example, assume the site created category pages for technical and content guides to make it easier for users to browse content in the future.

However, since the site only has three guides published right now, these pages aren’t providing much value to users or search engines.

The site owner may want to temporarily prevent search engines from crawling the category page only (e.g., https://www.example.com/guides/technical), not the guides within the category (e.g., https://www.example.com/guides/technical/robots-txt).

To accomplish this, we can leverage “$” to designate the end of the URL path.

user-agent: *

disallow: /guides/technical$

disallow: /guides/content$

#All crawlers are blocked from crawling pages with URL paths that end with /guides/technical and /guides/content

The above syntax would prevent the following URLs from being crawled:

  • https://www.example.com/guides/technical
  • https://www.example.com/guides/content

While allowing search engines to crawl:

  • https://www.example.com/guides/technical/robots-txt
  • https://www.example.com/guides/content/on-page-optimization

Sitemap

The sitemap field is used to provide search engines with a link to one or more XML sitemaps.

While not required, it’s a best practice to include XML sitemaps within the robots.txt file to provide search engines with a list of priority URLs to crawl.  

The value of the sitemap field should be an absolute URL (e.g., https://www.example.com/sitemap.xml), not a relative URL (e.g., /sitemap.xml). If you have multiple XML sitemaps, you can include multiple sitemap fields.

Example robots.txt with a single XML sitemap:

user-agent: *

disallow: /do-not-enter

sitemap: https://www.example.com/sitemap.xml

Example robots.txt with multiple XML sitemaps:

user-agent: *

disallow: /do-not-enter

sitemap: https://www.example.com/sitemap-1.xml

sitemap: https://www.example.com/sitemap-2.xml

sitemap: https://www.example.com/sitemap-3.xml

Crawl-Delay

As mentioned above, 20% of sites also include the crawl-delay field within their robots.txt file.

The crawl delay field tells bots how fast they can crawl the site and is typically used to slow down crawling to avoid overloading servers.

The value for crawl-delay is the number of seconds crawlers should wait to request a new page. The below rule would tell the specified crawler to wait five seconds after each request before requesting another URL.

user-agent: FastCrawlingBot

crawl-delay: 5

Google has stated that it does not support the crawl-delay field, and it will be ignored.

Other major search engines like Bing and Yahoo respect crawl-delay directives for their web crawlers.

Search Engine Primary user-agent for search Respects crawl-delay?
Google Googlebot No
Bing Bingbot Yes
Yahoo Slurp Yes
Yandex YandexBot Yes
Baidu Baiduspider No

Sites most commonly include crawl-delay directives for all user agents (using user-agent: *), search engine crawlers mentioned above that respect crawl-delay, and crawlers for SEO tools like Ahrefbot and SemrushBot.

The number of seconds crawlers were instructed to wait before requesting another URL ranged from one second to 20 seconds, but crawl-delay values of five seconds and 10 seconds were the most common across the 60 sites analyzed.

Testing Robots.txt Files

Any time you’re creating or updating a robots.txt file, make sure to test directives, syntax, and structure before publishing.

This robots.txt Validator and Testing Tool makes this easy to do (thank you, Max Prin!).

To test a live robots.txt file, simply:

  • Add the URL you want to test.
  • Select your user agent.
  • Choose “live.”
  • Click “test.”

The below example shows that Googlebot smartphone is allowed to crawl the tested URL.

Example robots.txt test - crawling allowedImage from author, November 2024

If the tested URL is blocked, the tool will highlight the specific rule that prevents the selected user agent from crawling it.

Example robots.txt test - crawling disallowedImage from author, November 2024

To test new rules before they are published, switch to “Editor” and paste your rules into the text box before testing.

Common Uses Of A Robots.txt File

While what is included in a robots.txt file varies greatly by website, analyzing 60 robots.txt files revealed some commonalities in how it is leveraged and what types of content webmasters commonly block search engines from crawling.

Preventing Search Engines From Crawling Low-Value Content

Many websites, especially large ones like ecommerce or content-heavy platforms, often generate “low-value pages” as a byproduct of features designed to improve the user experience.

For example, internal search pages and faceted navigation options (filters and sorts) help users find what they’re looking for quickly and easily.

While these features are essential for usability, they can result in duplicate or low-value URLs that aren’t valuable for search.

The robots.txt is typically leveraged to block these low-value pages from being crawled.

Common types of content blocked via the robots.txt include:

  • Parameterized URLs: URLs with tracking parameters, session IDs, or other dynamic variables are blocked because they often lead to the same content, which can create duplicate content issues and waste the crawl budget. Blocking these URLs ensures search engines only index the primary, clean URL.
  • Filters and sorts: Blocking filter and sort URLs (e.g., product pages sorted by price or filtered by category) helps avoid indexing multiple versions of the same page. This reduces the risk of duplicate content and keeps search engines focused on the most important version of the page.
  • Internal search results: Internal search result pages are often blocked because they generate content that doesn’t offer unique value. If a user’s search query is injected into the URL, page content, and meta elements, sites might even risk some inappropriate, user-generated content getting crawled and indexed (see the sample screenshot in this post by Matt Tutt). Blocking them prevents this low-quality – and potentially inappropriate – content from appearing in search.
  • User profiles: Profile pages may be blocked to protect privacy, reduce the crawling of low-value pages, or ensure focus on more important content, like product pages or blog posts.
  • Testing, staging, or development environments: Staging, development, or test environments are often blocked to ensure that non-public content is not crawled by search engines.
  • Campaign sub-folders: Landing pages created for paid media campaigns are often blocked when they aren’t relevant to a broader search audience (i.e., a direct mail landing page that prompts users to enter a redemption code).
  • Checkout and confirmation pages: Checkout pages are blocked to prevent users from landing on them directly through search engines, enhancing user experience and protecting sensitive information during the transaction process.
  • User-generated and sponsored content: Sponsored content or user-generated content created via reviews, questions, comments, etc., are often blocked from being crawled by search engines.
  • Media files (images, videos): Media files are sometimes blocked from being crawled to conserve bandwidth and reduce the visibility of proprietary content in search engines. It ensures that only relevant web pages, not standalone files, appear in search results.
  • APIs: APIs are often blocked to prevent them from being crawled or indexed because they are designed for machine-to-machine communication, not for end-user search results. Blocking APIs protects their usage and reduces unnecessary server load from bots trying to access them.

Blocking “Bad” Bots

Bad bots are web crawlers that engage in unwanted or malicious activities such as scraping content and, in extreme cases, looking for vulnerabilities to steal sensitive information.

Other bots without any malicious intent may still be considered “bad” if they flood websites with too many requests, overloading servers.

Additionally, webmasters may simply not want certain crawlers accessing their site because they don’t stand to gain anything from it.

For example, you may choose to block Baidu if you don’t serve customers in China and don’t want to risk requests from Baidu impacting your server.

Though some of these “bad” bots may disregard the instructions outlined in a robots.txt file, websites still commonly include rules to disallow them.

Out of the 60 robots.txt files analyzed, 100% disallowed at least one user agent from accessing all content on the site (via the disallow: /).

Blocking AI Crawlers

Across sites analyzed, the most blocked crawler was GPTBot, with 23% of sites blocking GPTBot from crawling any content on the site.

Orginality.ai’s live dashboard that tracks how many of the top 1,000 websites are blocking specific AI web crawlers found similar results, with 27% of the top 1,000 sites blocking GPTBot as of November 2024.

Reasons for blocking AI web crawlers may vary – from concerns over data control and privacy to simply not wanting your data used in AI training models without compensation.

The decision on whether or not to block AI bots via the robots.txt should be evaluated on a case-by-case basis.

If you don’t want your site’s content to be used to train AI but also want to maximize visibility, you’re in luck. OpenAI is transparent on how it uses GPTBot and other web crawlers.

At a minimum, sites should consider allowing OAI-SearchBot, which is used to feature and link to websites in the SearchGPT – ChatGPT’s recently launched real-time search feature.

Blocking OAI-SearchBot is far less common than blocking GPTBot, with only 2.9% of the top 1,000 sites blocking the SearchGPT-focused crawler.

Getting Creative

In addition to being an important tool in controlling how web crawlers access your site, the robots.txt file can also be an opportunity for sites to show their “creative” side.

While sifting through files from over 60 sites, I also came across some delightful surprises, like the playful illustrations hidden in the comments on Marriott and Cloudflare’s robots.txt files.

Marriot robots.txt fileScreenshot of marriot.com/robots.txt, November 2024
Screenshot of cloudflare.com/robots.txt, November 2024

Multiple companies are even turning these files into unique recruitment tools.

TripAdvisor’s robots.txt doubles as a job posting with a clever message included in the comments:

“If you’re sniffing around this file, and you’re not a robot, we’re looking to meet curious folks such as yourself…

Run – don’t crawl – to apply to join TripAdvisor’s elite SEO team[.]”

If you’re looking for a new career opportunity, you might want to consider browsing robots.txt files in addition to LinkedIn.

How To Audit Robots.txt

Auditing your Robots.txt file is an essential part of most technical SEO audits.

Conducting a thorough robots.txt audit ensures that your file is optimized to enhance site visibility without inadvertently restricting important pages.

To audit your Robots.txt file:

  • Crawl the site using your preferred crawler. (I typically use Screaming Frog, but any web crawler should do the trick.)
  • Filter crawl for any pages flagged as “blocked by robots.txt.” In Screaming Frog, you can find this information by going to the response codes tab and filtering by “blocked by robots.txt.”
  • Review the list of URLs blocked by the robots.txt to determine whether they should be blocked. Refer to the above list of common types of content blocked by robots.txt to help you determine whether the blocked URLs should be accessible to search engines.
  • Open your robots.txt file and conduct additional checks to make sure your robots.txt file follows SEO best practices (and avoids common pitfalls) detailed below.
Sample Screaming Frog ReportImage from author, November 2024

Robots.txt Best Practices (And Pitfalls To Avoid)

The robots.txt is a powerful tool when used effectively, but there are some common pitfalls to steer clear of if you don’t want to harm the site unintentionally.

The following best practices will help set yourself up for success and avoid unintentionally blocking search engines from crawling important content:

  • Create a robots.txt file for each subdomain. Each subdomain on your site (e.g., blog.yoursite.com, shop.yoursite.com) should have its own robots.txt file to manage crawling rules specific to that subdomain. Search engines treat subdomains as separate sites, so a unique file ensures proper control over what content is crawled or indexed.
  • Don’t block important pages on the site. Make sure priority content, such as product and service pages, contact information, and blog content, are accessible to search engines. Additionally, make sure that blocked pages aren’t preventing search engines from accessing links to content you want to be crawled and indexed.
  • Don’t block essential resources. Blocking JavaScript (JS), CSS, or image files can prevent search engines from rendering your site correctly. Ensure that important resources required for a proper display of the site are not disallowed.
  • Include a sitemap reference. Always include a reference to your sitemap in the robots.txt file. This makes it easier for search engines to locate and crawl your important pages more efficiently.
  • Don’t only allow specific bots to access your site. If you disallow all bots from crawling your site, except for specific search engines like Googlebot and Bingbot, you may unintentionally block bots that could benefit your site. Example bots include:
    • FacebookExtenalHit – used to get open graph protocol.
    • GooglebotNews – used for the News tab in Google Search and the Google News app.
    • AdsBot-Google – used to check webpage ad quality.
  • Don’t block URLs that you want removed from the index. Blocking a URL in robots.txt only prevents search engines from crawling it, not from indexing it if the URL is already known. To remove pages from the index, use other methods like the “noindex” tag or URL removal tools, ensuring they’re properly excluded from search results.
  • Don’t block Google and other major search engines from crawling your entire site. Just don’t do it.

TL;DR

  • A robots.txt file guides search engine crawlers on which areas of a website to access or avoid, optimizing crawl efficiency by focusing on high-value pages.
  • Key fields include “User-agent” to specify the target crawler, “Disallow” for restricted areas, and “Sitemap” for priority pages. The file can also include directives like “Allow” and “Crawl-delay.”
  • Websites commonly leverage robots.txt to block internal search results, low-value pages (e.g., filters, sort options), or sensitive areas like checkout pages and APIs.
  • An increasing number of websites are blocking AI crawlers like GPTBot, though this might not be the best strategy for sites looking to gain traffic from additional sources. To maximize site visibility, consider allowing OAI-SearchBot at a minimum. 
  • To set your site up for success, ensure each subdomain has its own robots.txt file, test directives before publishing, include an XML sitemap declaration, and avoid accidentally blocking key content.

More resources:


Featured Image: Se_vector/Shutterstock

7 Things To Look For In An SEO-Friendly WordPress Host

This post was sponsored by Bluehost. The opinions expressed in this article are the sponsor’s own.

When trying to improve your WordPress site’s search rankings, hosting might not be the first thing on your mind.

But your choice of hosting provider can significantly impact your SEO efforts.

A poor hosting setup can slow down your site, compromise its stability and security, and drain valuable time and resources.

The answer? Choosing the right WordPress hosting provider.

Here are seven essential features to look for in an SEO-friendly WordPress host that will help you:

1. Reliable Uptime & Speed for Consistent Performance

A website’s uptime and speed can significantly influence your site’s rankings and the success of your SEO strategies.

Users don’t like sites that suffer from significant downtime or sluggish load speeds. Not only are these sites inconvenient, but they also reflect negatively on the brand and their products and services, making them appear less trustworthy and of lower quality.

For these reasons, Google values websites that load quickly and reliably. So, if your site suffers from significant downtime or sluggish load times, it can negatively affect your site’s position in search results as well as frustrate users.

Reliable hosting with minimal downtime and fast server response times helps ensure that both users and search engines can access your content seamlessly.

Performance-focused infrastructure, optimized for fast server responses, is essential for delivering a smooth and engaging user experience.

When evaluating hosting providers, look for high uptime guarantees through a robust Service Level Agreement (SLA), which assures site availability and speed.

Bluehost Cloud, for instance, offers a 100% SLA for uptime, response time, and resolution time.

Built specifically with WordPress users in mind, Bluehost Cloud leverages an infrastructure optimized to deliver the speed and reliability that WordPress sites require, enhancing both SEO performance and user satisfaction. This guarantee provides you with peace of mind.

Your site will remain accessible and perform optimally around the clock, and you’ll spend less time troubleshooting and dealing with your host’s support team trying to get your site back online.

2. Data Center Locations & CDN Options For Global Reach

Fast load times are crucial not only for providing a better user experience but also for reducing bounce rates and boosting SEO rankings.

Since Google prioritizes websites that load quickly for users everywhere, having data centers in multiple locations and Content Delivery Network (CDN) integration is essential for WordPress sites with a global audience.

To ensure your site loads quickly for all users, no matter where they are, choose a WordPress host with a distributed network of data centers and CDN support. Consider whether it offers CDN options and data center locations that align with your audience’s geographic distribution

This setup allows your content to reach users swiftly across different regions, enhancing both user satisfaction and search engine performance.

Bluehost Cloud integrates with a CDN to accelerate content delivery across the globe. This means that whether your visitors are in North America, Europe, or Asia, they’ll experience faster load times.

By leveraging global data centers and a CDN, Bluehost Cloud ensures your site’s SEO remains strong, delivering a consistent experience for users around the world.

3. Built-In Security Features To Protect From SEO-Damaging Attacks

Security is essential for your brand, your SEO, and overall site health.

Websites that experience security breaches, malware, or frequent hacking attempts can be penalized by search engines, potentially suffering from ranking drops or even removal from search indexes.

Therefore, it’s critical to select a host that offers strong built-in security features to safeguard your website and its SEO performance.

When evaluating hosting providers, look for options that include additional security features.

Bluehost Cloud, for example, offers comprehensive security features designed to protect WordPress sites, including free SSL certificates to encrypt data, automated daily backups, and regular malware scans.

These features help maintain a secure environment, preventing security issues from impacting your potential customers, your site’s SEO, and ultimately, your bottom line.

With Bluehost Cloud, your site’s visitors, data, and search engine rankings remain secure, providing you with peace of mind and a safe foundation for SEO success.

4. Optimized Database & File Management For Fast Site Performance

A poorly managed database can slow down site performance, which affects load times and visitor experience. Therefore, efficient data handling and optimized file management are essential for fast site performance.

Choose a host with advanced database and file management tools, as well as caching solutions that enhance site speed. Bluehost Cloud supports WordPress sites with advanced database optimization, ensuring quick, efficient data handling even as your site grows.

With features like server-level caching and optimized databases, Bluehost Cloud is built to handle WordPress’ unique requirements, enabling your site to perform smoothly without additional plugins or manual adjustments.

Bluehost Cloud contributes to a better user experience and a stronger SEO foundation by keeping your WordPress site fast and efficient.

5. SEO-Friendly, Scalable Bandwidth For Growing Sites

As your site’s popularity grows, so does its bandwidth requirements. Scalable or unmetered bandwidth is vital to handle traffic spikes without slowing down your site and impacting your SERP performance.

High-growth websites, in particular, benefit from hosting providers that offer flexible bandwidth options, ensuring consistent speed and availability even during peak traffic.

To avoid disaster, select a hosting provider that offers scalable or unmetered bandwidth as part of their package. Bluehost Cloud’s unmetered bandwidth, for instance, is designed to accommodate high-traffic sites without affecting load times or user experience.

This ensures that your site remains responsive and accessible during high-traffic periods, supporting your growth and helping you maintain your SEO rankings.

For websites anticipating growth, unmetered bandwidth with Bluehost Cloud provides a reliable, flexible solution to ensure long-term performance.

6. WordPress-Specific Support & SEO Optimization Tools

WordPress has unique needs when it comes to SEO, making specialized hosting support essential.

Hosts that cater specifically to WordPress provide an added advantage by offering tools and configurations such as staging environments and one-click installations specifically for WordPress.

WordPress-specific hosting providers also have an entire team of knowledgeable support and technical experts who can help you significantly improve your WordPress site’s performance.

Bluehost Cloud is a WordPress-focused hosting solution that offers priority, 24/7 support from WordPress experts, ensuring any issue you encounter is dealt with effectively.

Additionally, Bluehost’s staging environments enable you to test changes and updates before going live, reducing the risk of SEO-impacting errors.

Switching to Bluehost is easy, affordable, and stress-free, too.

Bluehost offers a seamless migration service designed to make switching hosts simple and stress-free. Our dedicated migration support team handles the entire transfer process, ensuring your WordPress site’s content, settings, and configurations are moved safely and accurately.

Currently, Bluehost also covers all migration costs, so you can make the switch with zero out-of-pocket expenses. We’ll credit the remaining cost of your existing contract, making the transition financially advantageous.

You can actually save money or even gain credit by switching

7. Integrated Domain & Site Management For Simplified SEO Administration

SEO often involves managing domain settings, redirects, DNS configurations, and SSL updates, which can become complicated without centralized management.

An integrated hosting provider that allows you to manage your domain and hosting in one place simplifies these SEO tasks and makes it easier to maintain a strong SEO foundation.

When selecting a host, look for providers that integrate domain management with hosting. Bluehost offers a streamlined experience, allowing you to manage both domains and hosting from a single dashboard.

SEO-related site administration becomes more manageable, and you can focus on the things you do best: growth and optimization.

Find A SEO-Friendly WordPress Host

Choosing an SEO-friendly WordPress host can have a significant impact on your website’s search engine performance, user experience, and long-term growth.

By focusing on uptime, global data distribution, robust security, optimized database management, scalable bandwidth, WordPress-specific support, and integrated domain management, you create a solid foundation that supports both SEO and usability.

Ready to make the switch?

As a trusted WordPress partner with over 20 years of experience, Bluehost offers a hosting solution designed to meet the unique demands of WordPress sites big and small.

Our dedicated migration support team handles every detail of your transfer, ensuring your site’s content, settings, and configurations are moved accurately and securely.

Plus, we offer eligible customers a credit toward their remaining contracts, making the transition to Bluehost not only seamless but also cost-effective.

Learn how Bluehost Cloud can elevate your WordPress site. Visit us today to get started.


Image Credits

Featured Image: Image by Bluehost. Used with permission.

In-Post Image: Images by Bluehost. Used with permission.

HTTP Archive Report: 61% Of Cookies Enable Third-Party Tracking via @sejournal, @MattGSouthern

HTTP Archive published 12 chapters of its annual Web Almanac, revealing disparities between mobile and desktop web performance.

The Almanac analyzes data from millions of sites to track trends in web technologies, performance metrics, and user experience.

This year’s Almanac details changes in technology adoption patterns that will impact businesses and users.

Key Highlights

Mobile Performance Gap

The most significant finding centers on the growing performance gap between desktop and mobile experiences.

With the introduction of Google’s new Core Web Vital metric, Interaction to Next Paint (INP), the gap has become wider than ever.

“Web performance is tied to what devices and networks people can afford,” the report notes, highlighting the socioeconomic implications of this growing divide.

The data shows that while desktop performance remains strong, mobile users—particularly those with lower-end devices—face challenges:

  • Desktop sites achieve 97% “good” INP scores
  • Mobile sites lag at 74% “good” INP scores
  • Mobile median Total Blocking Time is 18 times higher than desktop

Third-Party Tracking

The report found that tracking remains pervasive across the web.

“We find that 61% of cookies are set in a third-party context,” the report states, noting that these cookies can be used for cross-site tracking and targeted advertising.

Key privacy findings include:

  • Google’s DoubleClick sets cookies on 44% of top websites
  • Only 6% of third-party cookies use partitioning for privacy protection
  • 11% of first-party cookies have SameSite set to None, potentially enabling tracking

CMS Market Share

In the content management space, WordPress continues its dominance, with the report stating:

“Of the over 16 million mobile sites in this year’s crawl, WordPress is used by 5.7 millions sites for a total of 36% of sites.”

However, among the top 1,000 most-visited websites, only 8% use identifiable CMS platforms, suggesting larger organizations opt for custom solutions.

In the ecommerce sector, WooCommerce leads with 38% market share, followed by Shopify at 18%.

The report found that “OpenCart is the last of the 362 detected shop systems that manage to secure a share above 1% of the market.”

PayPal remains most detected payment method (3.5% of sites), followed by Apple Pay and Shop Pay.

Performance By Platform

Some platforms markedly improved Core Web Vitals scores over the past year.

Squarespace increased from 33% good scores in 2022 to 60% in 2024, while others like Magento and WooCommerce continue to face performance challenges.

The remaining chapters of the Web Almanac are expected to be published in the coming weeks.

Structured Data Trends

The deprecation of FAQ and HowTo rich results by Google hasn’t significantly impacted their implementation.

This suggests website owners find value in these features beyond search.

Google expanded support for structured data types for various verticals, including vehicles, courses, and vacation rentals.

Why This Matters

These findings highlight that mobile optimization remains a challenge for developers and businesses.

HTTP Archive researchers noted in the report:

“These results highlight the ongoing need for focused optimization efforts, particularly in mobile experience.

The performance gap between devices suggests that many users, especially those on lower-end mobile devices, may be experiencing a significantly degraded web experience.”

Additionally, as privacy concerns grow, the industry faces pressure to balance user tracking with privacy protection.

Businesses reliant on third-party tracking mechanisms may need to adapt their marketing and analytics strategies accordingly.

The 2024 Web Almanac is available on HTTP Archive’s website; the remaining chapters are expected to be published in the coming weeks.


Featured Image: BestForBest/Shutterstock

Google’s Martin Splitt: Duplicate Content Doesn’t Impact Site Quality via @sejournal, @MattGSouthern

Google’s Search Central team has released a new video in its “SEO Made Easy” series. In it, Search Advocate Martin Splitt addresses common concerns about duplicate content and provides practical solutions for website owners.

Key Takeaways

Despite concerns in the SEO community, Google insists that duplicate content doesn’t harm a site’s perceived quality.

Splitt states:

“Some people think it influences the perceived quality of a site but it doesn’t. It does cause some challenges for website owners though, because it’s harder to track performance of pages with duplicates.”

However, it can create several operational challenges that website owners should address:

  • Difficulty in tracking page performance metrics
  • Potential competition between similar content pieces
  • Slower crawling speeds, especially at scale

Splitt adds:

“It might make similar content compete with each other and it can cause pages to take longer to get crawled if this happens at a larger scale. So it’s not great and is something you might want to clean up, but it isn’t something that you should lose sleep over.”

Three Solutions

1. Implement Canonical Tags

Splitt recommends using canonical tags in HTML or HTTP headers to indicate preferred URLs for duplicate content.

While Google treats these as suggestions rather than directives, they help guide the search engine’s indexing decisions.

Splitt clarifies:

“This tag is often used incorrectly by website owners so Google search can’t rely on it and treats it as a hint but might choose a different URL anyway.”

2. Manage Internal Links and Redirects

When Google chooses different canonical URLs than specified, website owners should:

  • Review and update internal links to point to preferred canonical URLs
  • Consider implementing 301 redirects for external links
  • Ensure redirects are appropriately configured to maintain site performance

3. Consolidate Similar Content

The most strategic approach involves combining similar pages to:

  • Improve user experience
  • Streamline Search Console reporting
  • Reduce site clutter

Splitt explains:

“If you find that you have multiple very similar pages, even if Google doesn’t consider them duplicates, try to combine them. It makes information easier to find for your users, will make reporting in Google Search Console easier to work with, and will reduce clutter on your site.”

Search Console Notices

Google Search Console may flag pages with various duplicate content notices:

  • “Duplicate without user-selected canonical”
  • “Alternate page with proper canonical tag”
  • “Duplicate Google chose different canonical than user”

These notifications indicate that Google has indexed the content, possibly under different URLs than initially intended.

International SEO Considerations

Splitt addresses duplicate content in international contexts, noting that similar content across multiple language versions is acceptable and handled appropriately by Google’s systems.

He states:

“If you find that you have multiple very similar pages, even if Google doesn’t consider them duplicates, try to combine them. It makes information easier to find for your users, will make reporting in Google Search Console easier to work with, and will reduce clutter on your site.”

Why This Matters

This guidance represents Google’s current stance on duplicate content and clarifies best practices for content organization and URL structure optimization.

See the full video below:


Featured Image: AnnaKu/Shutterstock

4 New Techniques To Speed Up Your Website & Fix Core Web Vitals via @sejournal, @DebugBear

This post was sponsored by DebugBear. The opinions expressed in this article are the sponsor’s own.

Want to make your website fast?

Luckily, many techniques and guides exist to help you speed up your website.

In fact, just in the last year, several new browser features have been released that offer:

  • New ways to optimize your website.
  • New ways to identify causes of slow performance.

All within your browser.

So, this article looks at these new browser SEO features and how you can use them to pass Google’s Core Web Vitals assessment.

Why Website Performance Is Key For User Experience & SEO

Having a fast website will make your users happier and increase conversion rates.

But performance is also a Google ranking factor.

Google has defined three user experience metrics, called the Core Web Vitals:

  • Largest Contentful Paint: how quickly does page content appear?
  • Cumulative Layout Shift: does content move around after loading?
  • Interaction to Next Paint: how responsive is the page to user input?

For each of these metrics there’s a maximum threshold that shouldn’t be exceeded to pass the web vitals assessment.

Metric thresholds for Google Core Web Vitals, October 2024

1. Add Instant Navigation With “Speculation Rules”

New Key Definitions:

When websites are slow to load that’s usually because various resources have to be loaded from the website server. But what if there was a way to achieve instant navigations, where visitors don’t have to wait?

This year Chrome launched a new feature called speculation rules, which can achieve just that. After loading the initial page on a website, other pages can be preloaded in the background. Then, when the visitor clicks on a link, the new page appears instantly.

Best of all, this feature is easy to implement just by adding a

Google Updates Crawl Budget Best Practices via @sejournal, @MattGSouthern

Google has updated its crawl budget guidelines, stressing the need to maintain consistent link structures between mobile and desktop websites.

  • Large websites must ensure mobile versions contain all desktop links or risk slower page discovery.
  • The update mainly impacts sites with over 10,000 pages or those experiencing indexing issues.
  • Link structure consistency across mobile and desktop is now a Google-recommended best practice for crawl budget optimization.
The SEO Agency Guide To Efficient WordPress Hosting & Management via @sejournal, @kinsta

This post was sponsored by Kinsta. The opinions expressed in this article are the sponsor’s own.

Managing client sites can quickly become costly in terms of time, money, and expertise, especially as your agency grows.

You’re constantly busy fixing slow WordPress performance, handling downtime, or regularly updating and backing up ecommerce sites and small blogs.

The solution to these challenges might lie in fully managed hosting for WordPress sites.

Opting for a fully managed hosting provider that specializes in WordPress and understands agency needs can save you both time and money. By making the switch, you can focus on what truly matters: serving your current clients and driving new business into your sales funnel.

WordPress Worries & How To Keep Clients Happy

For SEO agencies managing multiple client sites, ensuring consistently fast performance across the board is essential. Websites with poor performance metrics are more likely to see a dip in traffic, increased bounce rates, and lost conversion opportunities.

Managed hosting, especially hosting that specializes and is optimized for WordPress, offers agencies a way to deliver high-speed, well-performing sites without constantly battling technical issues.

Clients expect seamless performance, but handling these technical requirements for numerous websites can be a time-consuming process. While WordPress is versatile and user-friendly, it does come with performance challenges.

SEO agencies must deal with frequent updates, plugin management, security vulnerabilities, and optimization issues.

Challenges like bloated themes, inefficient plugins, and poor hosting infrastructure can lead to slow load times. You also need to ensure that client WordPress sites are secured against malware and hackers, which requires regular monitoring and updates.

With managed hosting, many of these tasks are automated, significantly reducing the workload on your team.

Managed hosting for WordPress simplifies the process by providing a full suite of performance, security, and maintenance services.

Instead of spending valuable time on manual updates, backups, and troubleshooting, you can rely on your hosting provider to handle these tasks automatically, resulting in reduced downtime, improved site performance, and a more efficient use of resources.

Ultimately, you can focus your energy on SEO strategies that drive results for your clients.

Basics Of Managed Hosting For WordPress

Managed hosting providers like Kinsta take care of all the technical aspects of running WordPress websites, including performance optimization, security, updates, backups, and server management.

We take over the responsibilities ensure the platform runs smoothly and securely without the constant need for manual intervention.

Kinsta also eliminates common performance bottlenecks in WordPress include slow-loading themes, outdated plugins, inefficient database queries, and suboptimal server configurations.

Key Benefits Of Efficient Managed Hosting For SEO

1. Performance & Speed

Core Web Vitals, Google’s user experience metrics, play a significant role in determining search rankings. Managed hosting improves metrics like LCP, FID, and CLS by offering high-performance servers and built-in caching solutions.

CDNs reduce latency by serving your website’s static files from servers closest to the user, significantly improving load times.

Kinsta, for example, uses Google Cloud’s premium tier network and C2 virtual machines, ensuring the fastest possible load times for WordPress sites. We also provide integrated CDN services, along with advanced caching configurations, which ensure that even resource-heavy WordPress sites load quickly.

And the benefits are instantly noticeable.

Before the switch, Torro Media faced performance issues, frequent downtimes, and difficulties scaling their websites to handle traffic growth. These issues negatively affected their clients’ user experience and SEO results.

After migrating to Kinsta, Torro Media saw noteable improvements:

  • Faster website performance – Site load times significantly improved, contributing to better SEO rankings and overall user experience.
  • Reduced downtime – Kinsta’s reliable infrastructure ensured that Torro Media’s websites experienced minimal downtime, keeping client websites accessible.
  • Expert support – Our support team helped Torro Media resolve technical issues efficiently, allowing the agency to focus on growth rather than troubleshooting.

As a result, Torro was able to scale its operations and deliver better results for its clients.

2. WP-Specific Security

Security is a critical component of managed hosting. Platforms like Kinsta offer automatic security patches, malware scanning, and firewalls tailored specifically for WordPress.

These features are vital to protecting your clients’ sites from cyber threats, which, if left unchecked, can lead to ranking drops due to blacklisting by search engines.

Downtime and security breaches negatively impact SEO. Google devalues sites that experience frequent downtime or security vulnerabilities.

Managed hosting providers minimize these risks by maintaining secure, stable environments with 24/7 monitoring, helping ensure that your clients’ sites remain online and safe from attacks.

3. Automatic Backups & Recovery

Automatic daily backups are a standard feature of managed hosting, protecting against data loss due to server crashes or website errors. For agencies, this means peace of mind, knowing that they can restore their clients’ sites quickly in case of a problem. The ability to quickly recover from an issue helps maintain SEO rankings, as prolonged downtime can hurt search performance.

Managed hosting providers often include advanced tools such as one-click restore points and robust disaster recovery systems. Additionally, having specialized support means that you have access to experts who understand WordPress and can help troubleshoot complex issues that affect performance and SEO.

Importance Of An Agency-Focused Managed WordPress Hosting Provider

For SEO agencies, uptime guarantees are essential to maintaining site availability. Managed hosting providers, like Kinsta, who specialize in serving agencies, offer a 99.9% uptime SLA and multiple data center locations, ensuring that websites remain accessible to users across the globe.

Scalability and flexibility matter, too. As your agency grows, your clients’ hosting needs may evolve. Managed hosting platforms designed for agencies offer scalability, allowing you to easily add resources as your client portfolio expands.

With scalable solutions, you can handle traffic surges without worrying about site downtime or slowdowns.

Agency Dashboard - Managed Hosting for WordPress

1. The Right Dashboards

A user-friendly dashboard is crucial for managing multiple client sites efficiently. Kinsta’s MyKinsta dashboard, for example, allows agencies to monitor performance, uptime, and traffic across all sites in one centralized location, providing full visibility into each client’s website performance.

Hosting dashboards like Kinsta’s MyKinsta provide real-time insights into key performance metrics such as server response times, resource usage, and traffic spikes. These metrics are essential for ensuring that sites remain optimized for SEO.

2. Balance Costs With Performance Benefits

For agencies, managing hosting costs is always a consideration. While managed hosting may come with a higher price tag than traditional shared hosting, the benefits, such as faster performance, reduced downtime, and enhanced security, translate into better client results and long-term cost savings.

Kinsta offers flexible pricing based on traffic, resources, and features, making it easier for agencies to align their hosting solutions with client budgets.

By automating tasks like backups, updates, and security management, managed hosting allows agencies to significantly reduce the time and resources spent on day-to-day maintenance. This frees up your team to focus on delivering SEO results, ultimately improving efficiency and client satisfaction.

Don’t think it makes that big of a difference? Think again.

After migrating to Kinsta, 5Tales experienced:

  • Improved site speed – Load times dropped by over 50%, which enhanced user experience and SEO performance.
  • Better support – Kinsta’s specialized support team helped troubleshoot issues quickly and provided expert-level advice.
  • Streamlined management – With our user-friendly dashboard and automated features, 5Tales reduced the time spent on maintenance and troubleshooting.

Overall, 5Tales saw an increase in both client satisfaction and SEO rankings after moving to Kinsta.

3. Managed Hosting & Page Speed Optimization

Tools like Kinsta’s Application Performance Monitoring (APM) provide detailed insights into website performance, helping agencies identify slow-loading elements and optimize them. This level of transparency enables faster troubleshooting and more precise optimization efforts, which are critical for maintaining fast page speeds.

It’s also easy to integrate managed hosting platforms with your existing tech stack. Kinsta works seamlessly with SEO tools like Google Analytics, DebugBear, and others, allowing agencies to track site performance, analyze traffic patterns, and ensure sites are running at peak efficiency.

Conclusion

Managed hosting is not just a convenience. It’s a critical component of success for SEO agencies managing WordPress sites.

By leveraging the performance, security, and time-saving benefits of a managed hosting provider like Kinsta, agencies can improve client results, enhance their relationships, and streamline their operations.

When it comes to SEO, every second counts. A fast, secure, and well-maintained website will always perform better in search rankings. For agencies looking to deliver maximum value to their clients, investing in managed hosting is a smart, long-term decision.

Ready to make the switch?

Kinsta offers a guarantee of no-shared hosting, 99.99% uptime guarantee, and 24/7/365 support, so we’re here when you need us. Plus, we makes it easy, effortless, and free to move to Kinsta.

Our team of migration experts have experience switching from all web hosts. And when you make the switch to Kinsta, we’ll give you up to $10,000 in free hosting to ensure you avoid paying double hosting bills.


Image Credits

Featured Image: Image by Kinsta. Used with permission.

In-Post Image: Images by Kinsta. Used with permission.

Google Revises URL Parameter Best Practices via @sejournal, @MattGSouthern

In a recent update to its Search Central documentation, Google has added specific guidelines for URL parameter formatting.

The update brings parameter formatting recommendations from a faceted navigation blog post into the main URL structure documentation, making these guidelines more accessible.

Key Updates

The new documentation specifies that developers should use the following:

  • Equal signs (=) to separate key-value pairs
  • Ampersands (&) to connect multiple parameters

Google recommends against using alternative separators such as:

  • Colons and brackets
  • Single or double commas

Why This Matters

URL parameters play a role in website functionality, particularly for e-commerce sites and content management systems.

They control everything from product filtering and sorting to tracking codes and session IDs.

While powerful, they can create SEO challenges like duplicate content and crawl budget waste.

Proper parameter formatting ensures better crawling efficiency and can help prevent common indexing issues that affect search performance.

The documentation addresses broader URL parameter challenges, such as managing dynamic content generation, handling session IDs, and effectively implementing sorting parameters.

Previous Guidance

Before this update, developers had to reference an old blog post about faceted navigation to find specific URL parameter formatting guidelines.

Consolidating this information into the main guidelines makes it easier to find.

The updated documentation can be found in Google’s Search Central documentation under the Crawling and Indexing section.

Looking Ahead

If you’re using non-standard parameter formats, start planning a migration to the standard format. Ensure proper redirects, and monitor your crawl stats during the switch.

While Google has not said non-standard parameters will hurt rankings, this update clarifies what they prefer. New sites and redesigns should adhere to the standard format to avoid future headaches.


Featured Image: Vibe Images/Shutterstock

Google’s Mueller Dismisses Core Web Vitals Impact On Rankings via @sejournal, @MattGSouthern

Google Search Advocate John Mueller has reaffirmed that Core Web Vitals are not major ranking factors, responding to data that suggested otherwise.

His statements come amid growing industry discussion about the immediate impact of site performance on search visibility.

Mueller’s Stance

Mueller stated on LinkedIn:

“We’ve been pretty clear that Core Web Vitals are not giant factors in ranking, and I doubt you’d see a big drop just because of that.”

The main benefit of improving website performance is providing a better user experience.

A poor experience could naturally decrease traffic by discouraging return visitors, regardless of how they initially found the site.

Mueller continues:

“Having a website that provides a good experience for users is worthwhile, because if users are so annoyed that they don’t want to come back, you’re just wasting the first-time visitors to your site, regardless of where they come from.”

Small Sites’ Competitive Edge

Mueller believes smaller websites have a unique advantage when it comes to implementing SEO changes.

Recalling his experience of trying to get a big company to change a robots.txt line, he explains:

“Smaller sites have a gigantic advantage when it comes to being able to take advantage of changes – they can be so much more nimble.”

Mueller noted that larger organizations may need extensive processes for simple changes, while smaller sites can update things like robots.txt in just 30 minutes.

He adds:

“None of this is easy, you still need to figure out what to change to adapt to a dynamic ecosystem online, but I bet if you want to change your site’s robots.txt (for example), it’s a matter of 30 minutes at most.”

Context

Mueller’s response followed research presented by Andrew Mcleod, who documented consistent patterns across multiple websites indicating rapid ranking changes after performance modifications.

In one case, a site with over 50,000 monthly visitors experienced a drop in traffic within 72 hours of implementing advertisements.

Mcleod’s analysis, which included five controlled experiments over three months, showed:

  • Traffic drops of up to 20% within 48 hours of enabling ads
  • Recovery periods of 1-2 weeks after removing ads
  • Consistent patterns across various test cases

Previous Statements

This latest guidance aligns with Mueller’s previous statements on Core Web Vitals.

In a March podcast, Mueller confirmed that Core Web Vitals are used in “ranking systems or in Search systems,” but emphasized that perfect scores won’t notably affect search results.

Mueller’s consistent message is clear: while Core Web Vitals are important for user experience and are part of Google’s ranking systems, you should prioritize content quality rather than focus on metrics.

Looking Ahead

Core Web Vitals don’t directly affect rankings, per Mueller.

While Google’s stance on ranking factors remains unchanged, the reality is that technical performance and user experience work together to influence traffic.


Featured Image: Ye Liew/Shutterstock