How Google Detects Duplicate Content

The myth of a duplicate content penalty has existed for years. Google seeks diverse search results and must choose when two or more pages are the same or similar, resulting in the others losing organic traffic — different from a penalty.

Google’s “Search Central” blog includes a guide on ranking systems that describes deduplication:

Searches on Google may find thousands or even millions of matching web pages. Some of these may be very similar to each other. In such cases, our systems show only the most relevant results to avoid unhelpful duplication.

Yet the guide doesn’t specify how the deduplication system chooses a page. In my experience, duplication occurs in four ways.

Similar pages

When a site has similar product or category pages or syndicates content (knowingly or not), Google will likely show only one page in search results. It’s not a penalty, but it does dilute traffic among the identical pages. Thus, ensure Google ranks the original, up-to-date, detailed, and relevant page (not a syndicated or scraped version).

Canonical tags and 301 redirects can point Google to the best page. Neither is foolproof, as Google views them as suggestions. The only way to force the best page is to avoid duplicating it.

The danger of duplicate content is when a third-party scraped version overranks the original. Google can usually identify scraped content, which is typically on low-quality sites with few or no authority signals. Thus a higher-ranking scraped version implies a problem with the original site.

Featured snippets

Featured snippets appear above organic search results and provide a quick answer to a query. Google removes featured snippet URLs from lower organic positions to avoid duplication.

The purpose of featured snippets is to answer queries, removing the need to click. Thus a featured snippet page likely receives less organic traffic, and there is no surefire method to prevent it. If a page suddenly loses traffic, check Search Console to see if it’s featured.

Google will likely deduplicate AI Overviews in the same way.

Top stories

Top stories” is a separate search-result section for breaking or relevant news. A URL in top stories typically loses its organic position.

Domains

Domain names trigger a different type of duplication beyond content. Google won’t typically show the same domain in top results, even for brand name queries. Keep an eye on queries for your brand to know other domains that rank for it and how to combat them.

Google Clarifies 404 & Redirect Validation In Search Console via @sejournal, @MattGSouthern

Google’s Search Advocate, John Mueller, has provided insights into Search Console’s validation process, addressing how it handles 404 errors and redirects during site migrations.

Key Points

A Reddit user shared their experience with a client’s website migration that led to a loss in rankings.

They explained that they took several steps to address the issues, including:

  • Fixing on-site technical problems.
  • Redirecting 404 pages to the appropriate URLs.
  • Submitting these changes for validation in Google Search Console.

Although they confirmed that all redirects and 404 pages were working correctly, they failed to validate the changes in Search Console.

Feeling frustrated, the user sought advice on what to do next.

This prompted a response from Mueller, who provided insights into how Google processes these changes.

Mueller’s Response

Mueller explained how Google manages 404 errors and redirect validations in Search Console.

He clarified that the “mark as fixed” feature doesn’t speed up Google’s reprocessing of site changes. Instead, it’s a tool for site owners to monitor their progress.

Mueller noted:

“The ‘mark as fixed’ here will only track how things are being reprocessed. It won’t speed up reprocessing itself.”

He also questioned the purpose of marking 404 pages as fixed, noting that no further action is needed if a page intentionally returns a 404 error.

Mueller adds:

“If they are supposed to be 404s, then there’s nothing to do. 404s for pages that don’t exist are fine. It’s technically correct to have them return 404. These being flagged don’t mean you’re doing something wrong, if you’re doing the 404s on purpose.”

For pages that aren’t meant to be 404, Mueller advises:

“If these aren’t meant to be 404 – the important part is to fix the issue though, set up the redirects, have the new content return 200, check internal links, update sitemap dates, etc. If it hasn’t been too long (days), then probably it’ll pick up again quickly. If it’s been a longer time, and if it’s a lot of pages on the new site, then (perhaps obviously) it’ll take longer to be reprocessed.”

Key Takeaways From Mueller’s Advice

Mueller outlined several key points in his response.

Let’s break them down:

For Redirects and Content Updates

  • Ensure that redirects are correctly set up and new content returns a 200 (OK) status code.
  • Update internal links to reflect the new URLs.
  • Refresh the sitemap with updated dates to signal changes to Google.

Reprocessing Timeline

  • If changes were made recently (within a few days), Google will likely process them quickly.
  • For larger websites or older issues, reprocessing may take more time.

Handling 404 Pages

  • If a page is no longer meant to exist, returning a 404 error is the correct approach.
  • Seeing 404s flagged in Search Console doesn’t necessarily indicate a problem, provided the 404s are intentional.

Why This Matters

Website migrations can be complicated and may temporarily affect search rankings if not done correctly.

Google Search Console is useful for tracking changes, but it has limitations.

The validation process checks if fixes are implemented correctly, not how quickly changes will be made.

Practice patience and ensure all technical details—redirects, content updates, and internal linking—are adequately addressed.


Featured Image: Sammby/Shutterstock

Google’s JavaScript Warning & How It Relates To AI Search via @sejournal, @MattGSouthern

A recent discussion among the Google Search Relations team highlights a challenge in web development: getting JavaScript to work well with modern search tools.

In Google’s latest Search Off The Record podcast, the team discussed the rising use of JavaScript, and the tendency to use it when it’s not required.

Martin Splitt, a Search Developer Advocate at Google, noted that JavaScript was created to help websites compete with mobile apps, bringing in features like push notifications and offline access.

However, the team cautioned that excitement around JavaScript functionality can lead to overuse.

While JavaScript is practical in many cases, it’s not the best choice for every part of a website.

The JavaScript Spectrum

Splitt described the current landscape as a spectrum between traditional websites and web applications.

He says:

“We’re in this weird state where websites can be just that – websites, basically pages and information that is presented on multiple pages and linked, but it can also be an application.”

He offered the following example of the JavaScript spectrum:

“You can do apartment viewings in the browser… it is a website because it presents information like the square footage, which floor is this on, what’s the address… but it’s also an application because you can use a 3D view to walk through the apartment.”

Why Does This Matter?

John Mueller, Google Search Advocate, noted a common tendency among developers to over-rely on JavaScript:

“There are lots of people that like these JavaScript frameworks, and they use them for things where JavaScript really makes sense, and then they’re like, ‘Why don’t I just use it for everything?’”

As I listened to the discussion, I was reminded of a study I covered weeks ago. According to the study, over-reliance on JavaScript can lead to potential issues for AI search engines.

Given the growing prominence of AI search crawlers, I thought it was important to highlight this conversation.

While traditional search engines typically support JavaScript well, its implementation demands greater consideration in the age of AI search.

The study finds AI bots make up an increasing percentage of search crawler traffic, but these crawlers can’t render JavaScript.

That means you could lose out on traffic from search engines like ChatGPT Search if you rely too much on JavaScript.

Things To Consider

The use of JavaScript and the limitations of AI crawlers present several important considerations:

  1. Server-Side Rendering: Since AI crawlers can’t execute client-side JavaScript, server-side rendering is essential for ensuring visibility.
  2. Content Accessibility: Major AI crawlers, such as GPTBot and Claude, have distinct preferences for content consumption. GPTBot prioritizes HTML content (57.7%), while Claude focuses more on images (35.17%).
  3. New Development Approach: These new constraints may require reevaluating the traditional “JavaScript-first” development strategy.

The Path Foward

As AI crawlers become more important for indexing websites, you need to balance modern features and accessibility for AI crawlers.

Here are some recommendations:

  • Use server-side rendering for key content.
  • Make sure to include core content in the initial HTML.
  • Apply progressive enhancement techniques.
  • Be cautious about when to use JavaScript.

To succeed, adapt your website for traditional search engines and AI crawlers while ensuring a good user experience.

Listen to the full podcast episode below:


Featured Image: Ground Picture/Shutterstock

AI Crawlers Account For 28% Of Googlebot’s Traffic, Study Finds via @sejournal, @MattGSouthern

A report released by Vercel highlights the growing impact of AI bots in web crawling.

OpenAI’s GPTBot and Anthropic’s Claude generate nearly 1 billion requests monthly across Vercel’s network.

The data indicates that GPTBot made 569 million requests in the past month, while Claude accounted for 370 million.

Additionally, PerplexityBot contributed 24.4 million fetches, and AppleBot added 314 million requests.

Together, these AI crawlers represent approximately 28% of Googlebot’s total volume, which stands at 4.5 billion fetches.

Here’s what this could mean for SEO.

Key Findings On AI Crawlers

The analysis looked at traffic patterns on Vercel’s network and various web architectures. It found some key features of AI crawlers:

  • Major AI crawlers do not render JavaScript, though they do pull JavaScript files.
  • AI crawlers are often inefficient, with ChatGPT and Claude spending over 34% of their requests on 404 pages.
  • The type of content these crawlers focus on varies. ChatGPT prioritizes HTML (57.7%), while Claude focuses more on images (35.17%).

Geographic Distribution

Unlike traditional search engines that operate from multiple regions, AI crawlers currently maintain a concentrated U.S. presence:

  • ChatGPT operates from Des Moines (Iowa) and Phoenix (Arizona)
  • Claude operates from Columbus (Ohio)

Web Almanac Correlation

These findings align with data shared in the Web Almanac’s SEO chapter, which also notes the growing presence of AI crawlers.

According to the report, websites now use robots.txt files to set rules for AI bots, telling them what they can or cannot crawl.

GPTBot is the most mentioned bot, appearing on 2.7% of mobile sites studied. The Common Crawl bot, often used to collect training data for language models, is also frequently noted.

Both reports stress that website owners need to adjust to how AI crawlers behave.

3 Ways To Optimize For AI Crawlers

Based on recent data from Vercel and the Web Almanac, here are three ways to optimize for AI crawlers.

1. Server-Side Rendering

AI crawlers don’t execute JavaScript. This means any content that relies on client-side rendering might be invisible.

Recommended actions:

  • Implement server-side rendering for critical content
  • Ensure main content, meta information, and navigation structures are present in the initial HTML
  • Use static site generation or incremental static regeneration where possible

2. Content Structure & Delivery

Vercel’s data shows distinct content type preferences among AI crawlers:

ChatGPT:

  • Prioritizes HTML content (57.70%)
  • Spends 11.50% of fetches on JavaScript files

Claude:

  • Focuses heavily on images (35.17%)
  • Dedicates 23.84% of fetches to JavaScript files

Optimization recommendations:

  • Structure HTML content clearly and semantically
  • Optimize image delivery and metadata
  • Include descriptive alt text for images
  • Implement proper header hierarchy

3. Technical Considerations

High 404 rates from AI crawlers mean you need to keep these technical considerations top of mind:

  • Maintain updated sitemaps
  • Implement proper redirect chains
  • Use consistent URL patterns
  • Regular audit of 404 errors

Looking Ahead

For search marketers, the message is clear: AI chatbots are a new force in web crawling, and sites need to adapt their SEO accordingly.

Although AI bots may rely on cached or dated information now, their capacity to parse fresh content from across the web will grow.

You can help ensure your content is crawled and indexed with server-side rendering, clean URL structures, and updated sitemaps.


Featured Image: tete_escape/Shutterstock

18 Essential Accessibility Changes To Drive Increased Website Growth via @sejournal, @skynet_lv

This post was sponsored by “Skynet Technologies USA LLC”.

Did you know that 1 billion people have not reached you or your customers’ websites yet.

1 billion potential customers are waiting for businesses to step up and do what’s right.

Find out if your website is accessible to 1 billion people >>>

Accessibility isn’t just a compliance checkbox anymore – it’s a growth strategy.

The demand for scalable, innovative accessibility solutions has skyrocketed.

And your competition is already making these improvements.

For agencies, this means an unprecedented opportunity to meet clients’ needs while driving revenue.

Learn how you can generate additional revenue and boost your clients’ SERP ranking by gaining access to:

Ready to get started?

How Accessibility Improvements Can Increase Growth

The digital economy thrives on inclusion.

There is a large market of individuals who are not included in modern website usability.

With over a billion people globally living with disabilities, accessible digital experiences open doors to untapped markets.

Do Websites Need To Be Accessible?

The short answer is yes.

How Does An Accessible Website Drive Traffic?

Traffic comes from people who have needs. Of course, everyone has needs, including people with disabilities.

Accessible websites and tools cater to all users, expanding reach to a diverse and often overlooked customer base.

Global Potential & Unlocking New Audiences

From a global perspective, the global community of people with disabilities is a market estimated to hold a staggering $13 trillion in spending power.

By removing barriers and ensuring inclusive digital experiences, you can tap into this 1 billion-person market and drive substantial economic growth.

Digital accessibility helps to increase employment opportunities, education options, and simple access to various banking and financial services for everybody.

Boosts User Experience & Engagement 

Accessibility improvements run parallel with SEO improvements.

In fact, they often enhance overall website performance, which leads to:

  • Better user experience.
  • Higher rankings.
  • Increased traffic.
  • Higher conversion rates.

Ensures Your Websites Are Compliant

Increasing lawsuits against businesses that fail to comply with accessibility regulations have imposed pressure on them to implement accessibility in their digital assets.

Compliance with ADA, WCAG 2.0, 2.1, 2.2, Section 508, Australian DDA, European EAA EN 301 549, UK Equality Act (EA), Indian RPD Act, Israeli Standard 5568, California Unruh, Ontario AODA, Canada ACA, German BITV, Brazilian Inclusion Law (LBI 13.146/2015), Spain UNE 139803:2012, France RGAA standards, JIS X 8341 (Japan), Italian Stanca Act, Switzerland DDA, Austrian Web Accessibility Act (WZG) guidelines aren’t optional. Accessibility solution partnerships ensure to stay ahead of potential lawsuits while fostering goodwill.

6 Steps To Boost Your Growth With Accessibility

  1. To drive growth, your agency should prioritize digital accessibility by following WCAG standards, regularly testing with tools like AXE, WAVE, or Skynet Technologies Website Accessibility Checker, and addressing accessibility gaps. Build accessible design frameworks with high-contrast colors, scalable text, and clear navigation.
  2. Integrate assistive technologies such as keyboard navigation, screen reader compatibility, and video accessibility. Focus on responsive design, accessible forms, and inclusive content strategies like descriptive link text, simplified language, and alternative formats.
  3. Providing accessibility training and creating inclusive marketing materials will further support compliance and growth.
  4. To ensure the website thrives, prioritize mobile-first design for responsiveness across all devices, adhere to WCAG accessibility standards, and incorporate keyboard-friendly navigation and alt text for media.
  5. Optimize page speed and core web vitals while using an intuitive interface with clear navigation and effective call-to-action buttons, and use SEO-friendly content with proper keyword optimization and schema markups to boost visibility.
  6. Ensure security with SSL certificates, clear cookie consent banners, and compliance with privacy regulations like GDPR and CCPA. Finally, implement analytics and conversion tracking tools to gather insights and drive long-term growth.

We know this is a lot.

If this sounds good to you, let us help you get set up.

How Can Digital Accessibility Partnerships Supercharge Your Clients’ SEO?

Partnering for digital accessibility isn’t just about inclusivity — it’s a game-changer for SEO, too!

Accessible websites are built with cleaner code, smarter structures, and user-friendly features like alt text and clear headings that search engines love.

Plus, faster load times, mobile-friendly designs, and seamless navigation keep users engaged, reducing bounce rates and boosting rankings. When you focus on making a site accessible to everyone, you’re not just widening your audience—you’re signaling to search engines that the website is high-quality and relevant. It’s a win-win for accessibility and SEO!

12 Essential Factors To Consider For Successful Accessibility Partnerships

  1. Expertise: Look for a provider with a proven track record in digital accessibility, including knowledge of relevant global website accessibility standards and best practices.
  2. Experience: Consider their experience working with similar industries or organizations.
  3. Tools and technologies: Evaluate their use of automated and manual testing tools to identify and remediate accessibility issues.
  4. Price Flexibility: Explore pricing models that align with both the budget and project requirements. Whether for a single site or multiple sites, the service should be compatible and scalable to meet the needs.
  5. Platform Compatibility: Ensure seamless accessibility integration across various platforms, providing a consistent and accessible experience for all users, regardless of the website environment.
  6. Multi-language support: Enhance user experience with global language support, making websites more inclusive and accessible to a global audience.
  7. Regular check-ins: Schedule regular meetings to discuss project progress, address any issues, and make necessary adjustments.
  8. Clear communication channels: Establish clear communication channels (for example: email, and project management tools) to facilitate efficient collaboration.
  9. Transparent reporting: Request detailed reports on the progress of accessibility testing, remediation efforts, and overall project status.
  10. KPIs to measure success: Review the partner’s historical data, especially those similar projects in terms of scale, complexity, and industry.
  11. Evaluate technical expertise: Assess their proficiency in using various accessibility testing tools and ability to integrate different APIs.
  12. Long-term partnership strategy: Compare previous data with the current one for improvement and optimization process. It is crucial for a long-term partnership that there is a specific interval of review and improvements.

    Scaling Accessibility With Smart Partnerships

    All in One Accessibility®: Simplicity meets efficiency!

    The All in One Accessibility® is an AI-powered accessibility tool that helps organizations to enhance their website accessibility level for ADA, WCAG 2.0, 2.1, 2.2, Section 508, Australian DDA, European EAA EN 301 549, UK Equality Act (EA), Indian RPD Act, Israeli Standard 5568, California Unruh, Ontario AODA, Canada ACA, German BITV, Brazilian Inclusion Law (LBI 13.146/2015), Spain UNE 139803:2012, France RGAA standards, JIS X 8341 (Japan), Italian Stanca Act, Switzerland DDA, Austrian Web Accessibility Act (WZG), and more.

    It is available with features like sign language LIBRAS (Brazilian Portuguese Only) integration, 140+ multilingual support, screen reader, voice navigation, smart language auto-detection and voice customization, talk & type, Google and Adobe Analytics tracking, along with premium add-ons including white label and custom branding, VPAT/ACR reports, manual accessibility audit and remediation, PDF remediation, and many more.

    • Quick Setup: Install the widget to any site with ease—no advanced coding required.
    • Feature-Rich Design: From text resizing and color contrast adjustments to screen reader support, it’s packed with tools that elevate the user experience.
    • Revenue Opportunities: Agencies can resell the solution to clients, adding a high-value service to their offerings while earning attractive commissions through the affiliate program.
    • Reduced development costs: Minimizes the financial impact of accessibility remediation by implementing best practices and quick tools.

    Agency Partnership: Scaling accessibility with ease!

    • Extended Service Offerings: The All in One Accessibility® Agency Partnership allows agencies to offer a powerful accessibility widget – quick accessibility solution into their services, enabling them that are in high demand.
    • White Label: As an agency partner, you can offer All in One Accessibility® under their own brand name.
    • Centralized Management: It simplifies oversight by consolidating accessibility data and reporting, allowing enterprises to manage multiple websites seamlessly.
    • Attractive Revenue Streams: Agencies can resell the widget to clients, earning significant revenue through competitive pricing structures and repeat business opportunities.
    • Boost Client Retention: By addressing accessibility needs proactively, agencies build stronger relationships with clients, fostering long-term loyalty and recurring contracts.
    • Increase Market Reach: Partnering with All in One Accessibility® positions agencies as leaders in inclusivity, attracting businesses looking for reliable accessibility solutions.
    • NO Investment, High Return: With no setup costs, scalable features, and up to 30% commission, the partnership enables agencies to maximize profitability with their clients.

    Affiliate Partnership: A revenue opportunity for everyone!

    The All in One Accessibility® Affiliate Partnership program is for content creators, marketers, accessibility advocates, web professionals, 501 (c) organizations (non-profit), and law firms.

    • Revenue Growth through Referrals: The All in One Accessibility® affiliate partnership allows affiliates to earn competitive commissions by promoting a high-demand accessibility solution, turning referrals into consistent revenue.
    • Expanding Market Reach: Affiliates can tap into a diverse audience of businesses seeking ADA and WCAG compliance, scaling both revenue and the adoption of accessibility solutions.
    • Fostering Accessibility Awareness: By promoting the All in One Accessibility® widget, affiliates play a pivotal role in driving inclusivity, helping more websites become accessible to users with disabilities.
    • Leveraging Trusted Branding: Affiliates benefit from partnering with a reliable and recognized quick accessibility improvement tool, boosting their credibility and marketing impact.
    • Scaling with Zero Investment: With user-friendly promotional resources and a seamless onboarding process, affiliates can maximize returns without any costs.

    Use Accessibility As A Growth Engine

    Endeavoring for strategic partnerships with accessibility solution providers is a win-win for agencies aiming to meet the diverse needs of their clients. These partnerships not only enhance the accessibility of digital assets but also create opportunities for growth, and loyalty, top search engine rankings, boost revenue, improve compliance with legal standards, and make you to contribute into digital accessibility world.

    With Skynet Technologies USA LLC, Transform accessibility from a challenge into a revenue-driving partnership. Let inclusivity power the success.

    Ready to get started? Embarking on a digital accessibility journey is simpler than you think! Take the first step by evaluating the website’s current WCAG compliance with a manual accessibility audit.

    For more information, Reach out hello@skynettechnologies.com.


    Image Credits

    Featured Image: Image by Skynet Technologies. Used with permission.

    Google Formalizes Decade-Old Faceted Navigation Guidelines via @sejournal, @MattGSouthern

    Google has updated its guidelines on faceted navigation by turning an old blog post into an official help document.

    What started as a blog post in 2014 is now official technical documentation.

    This change reflects the complexity of ecommerce and content-heavy websites, as many sites adopt advanced filtering systems for larger catalogs.

    Faceted Navigation Issues

    Ever used filters on an e-commerce site to narrow down products by size, color, and price?

    That’s faceted navigation – the system allowing users to refine search results using multiple filters simultaneously.

    While this feature is vital for users, it can create challenges for search engines, prompting Google to release new official documentation on managing these systems.

    Modern Challenges

    The challenge with faceted navigation lies in the mathematics of combinations: each additional filter option multiplies the potential URLs a search engine might need to crawl.

    For example, a simple product page with options for size (5 choices), color (10 choices), and price range (6 ranges) could generate 300 unique URLs – for just one product.

    According to Google Analyst Gary Illyes, this multiplication effect makes faceted navigation the leading cause of overcrawling issues reported by website owners.

    The impact includes:

    • Wasting Server Resources: Many websites use too much computing power on unnecessary URL combinations.
    • Inefficient Crawl Budget: Crawlers may take longer to find important new content because they are busy with faceted navigation.
    • Weakening SEO Performance: Having several URLs for the same content can hurt a website’s SEO.

    What’s Changed?

    The new guidance is similar to the 2014 blog post, but it includes some important updates:

    1. Focus on Performance: Google now clearly warns about the costs of using computing resources.
    2. Clear Implementation Options: The documentation gives straightforward paths for different types of websites.
    3. Updated Technical Recommendations: Suggestions now account for single-page applications and modern SEO practices.

    Implementation Guide

    For SEO professionals managing sites with faceted navigation, Google now recommends a two-track approach:

    Non-Critical Facets:

    • Block via robots.txt
    • Use URL fragments (#)
    • Implement consistent rel=”nofollow” attributes

    Business-Critical Facets:

    • Maintain standardized parameter formats
    • Implement proper 404 handling
    • Use strategic canonical tags

    Looking Ahead

    This documentation update suggests Google is preparing for increasingly complex website architectures.

    SEO teams should evaluate their current faceted navigation against these guidelines to ensure optimal crawling efficiency and indexing performance.


    Featured Image: Shutterstock/kenchiro168

    Google Warns: Beware Of Fake Googlebot Traffic via @sejournal, @MattGSouthern

    Google’s Developer Advocate, Martin Splitt, warns website owners to be cautious of traffic that appears to come from Googlebot. Many requests pretending to be Googlebot are actually from third-party scrapers.

    He shared this in the latest episode of Google’s SEO Made Easy series, emphasizing that “not everyone who claims to be Googlebot actually is Googlebot.”

    Why does this matter?

    Fake crawlers can distort analytics, consume resources, and make it difficult to assess your site’s performance accurately.

    Here’s how to distinguish between legitimate Googlebot traffic and fake crawler activity.

    Googlebot Verification Methods

    You can distinguish real Googlebot traffic from fake crawlers by looking at overall traffic patterns rather than unusual requests.

    Real Googlebot traffic tends to have consistent request frequency, timing, and behavior.

    If you suspect fake Googlebot activity, Splitt advises using the following Google tools to verify it:

    URL Inspection Tool (Search Console)

    • Finding specific content in the rendered HTML confirms that Googlebot can successfully access the page.
    • Provides live testing capability to verify current access status.

    Rich Results Test

    • Acts as an alternative verification method for Googlebot access
    • Shows how Googlebot renders the page
    • Can be used even without Search Console access

    Crawl Stats Report

    • Shows detailed server response data specifically from verified Googlebot requests
    • Helps identify patterns in legitimate Googlebot behavior

    There’s a key limitation worth noting: These tools verify what real Googlebot sees and does, but they don’t directly identify impersonators in your server logs.

    To fully protect against fake Googlebots, you would need to:

    • Compare server logs against Google’s official IP ranges
    • Implement reverse DNS lookup verification
    • Use the tools above to establish baseline legitimate Googlebot behavior

    Monitoring Server Responses

    Splitt also stressed the importance of monitoring server responses to crawl requests, particularly:

    • 500-series errors
    • Fetch errors
    • Timeouts
    • DNS problems

    These issues can significantly impact crawling efficiency and search visibility for larger websites hosting millions of pages.

    Splitt says:

    “Pay attention to the responses your server gave to Googlebot, especially a high number of 500 responses, fetch errors, timeouts, DNS problems, and other things.”

    He noted that while some errors are transient, persistent issues “might want to investigate further.”

    Splitt suggested using server log analysis to make a more sophisticated diagnosis, though he acknowledged that it’s “not a basic thing to do.”

    However, he emphasized its value, noting that “looking at your web server logs… is a powerful way to get a better understanding of what’s happening on your server.”

    Potential Impact

    Beyond security, fake Googlebot traffic can impact website performance and SEO efforts.

    Splitt emphasized that website accessibility in a browser doesn’t guarantee Googlebot access, citing various potential barriers, including:

    • Robots.txt restrictions
    • Firewall configurations
    • Bot protection systems
    • Network routing issues

    Looking Ahead

    Fake Googlebot traffic can be annoying, but Splitt says you shouldn’t worry too much about rare cases.

    Suppose fake crawler activity becomes a problem or uses too much server power. In that case, you can take steps like limiting the rate of requests, blocking specific IP addresses, or using better bot detection methods.

    For more on this issue, see the full video below:


    Featured Image: eamesBot/Shutterstock

    Google: Focus On Field Data For Core Web Vitals via @sejournal, @MattGSouthern

    Google stresses the importance of using actual user data to assess Core Web Vitals instead of relying only on lab data from tools like PageSpeed Insights (PSI) and Lighthouse.

    This reminder comes as the company prepares to update the throttling settings in PSI. These updates are expected to increase the performance scores of websites in Lighthouse.

    Field Data vs. Lab Data

    Core Web Vitals measure a website’s performance in terms of loading speed, interactivity, and visual stability from the user’s perspective.

    Field data shows users’ actual experiences, while lab data comes from tests done in controlled environments using tools like Lighthouse.

    Barry Pollard, a Web Performance Developer Advocate at Google, recently emphasized focusing on field data.

    In a LinkedIn post, he stated:

    “You should concentrate on your field Core Web Vitals (the top part of PageSpeed Insights), and only use the lab Lighthouse Score as a very rough guide of whether Lighthouse has recommendations to improve performance or not…

    The Lighthouse Score is best for comparing two tests made on the same Lighthouse (e.g. to test and compare fixes).

    Performance IS—and hence LH Scores also ARE—highly variable. LH is particularly affected by where it is run from (PSI, DevTools, CI…), but also on the lots of other factors.

    Lighthouse is a GREAT tool but it also can only test some things, under certain conditions.

    So while it’s great to see people interested in improving webperf, make sure you’re doing just that (improve performance) and not just improving the score”

    Upcoming Changes To PageSpeed Insights

    Pollard discussed user concerns about PageSpeed Insights’s slow servers, which can cause Lighthouse tests to take longer than expected.

    To fix this, Google is changing the throttling settings in PageSpeed Insights, which should lead to better performance scores when the update is released in the coming weeks.

    These changes will affect both the web interface and the API but will not impact other versions of Lighthouse.

    However, Pollard  reminds users that  “a score of 100 doesn’t mean perfect; it just means Lighthouse can’t help anymore.”

    Goodhart’s Law & Web Performance

    Pollard referenced Goodhart’s Law, which says that when a measure becomes a goal, it stops being a good measure.

    In the web performance context, focusing only on improving Lighthouse scores may not improve actual user experience.

    Lighthouse is a helpful tool, but it can only assess certain aspects of performance in specific situations.

    Alon Kochba, Web Performance and Software Engineer at Wix, added context to the update, stating:

    “Lighthouse scores may not be the most important – but this is a big deal for Lighthouse scores in PageSpeed Insights.

    4x -> 1.2x CPU throttling for Mobile device simulation, which was way off for quite a while.”

    Key Takeaway: Prioritize User Experience

    As the update rolls out, website owners and developers should focus on user experience using field data for Core Web Vitals.

    While Lighthouse scores can help find areas for improvement, they shouldn’t be the only goal.

    Google encourages creating websites that load quickly, respond well, and are visually stable.


    Featured Image: GoodStudio/Shutterstock

    Google Uses About 40 Signals To Determine Canonical URLs via @sejournal, @MattGSouthern

    In a recent episode of Google’s Search Off the Record podcast, Allan Scott from the “Dups” team explained how Google decides which URL to consider as the main one when there are duplicate pages.

    He revealed that Google looks at about 40 different signals to pick the main URL from a group of similar pages.

    Around 40 Signals For Canonical URL Selection

    Duplicate content is a common problem for search engines because many websites have multiple pages with the same or similar content.

    To solve this, Google uses a process called canonicalization. This process allows Google to pick one URL as the main version to index and show in search results.

    Google has discussed the importance of using signals like rel=”canonical” tags, sitemaps, and 301 redirects for canonicalization. However, the number of signals involved in this process is more than you may expect.

    Scott revealed during the podcast:

    “I’m not sure what the exact number is right now because it goes up and down, but I suspect it’s somewhere in the neighborhood of 40.”

    Some of the known signals mentioned include:

    1. rel=”canonical” tags
    2. 301 redirects
    3. HTTPS vs. HTTP
    4. Sitemaps
    5. Internal linking
    6. URL length

    The weight and importance of each signal may vary, and some signals, like rel=”canonical” tags, can influence both the clustering and canonicalization process.

    Balancing Signals

    With so many signals at play, Allan acknowledged the challenges in determining the canonical URL when signals conflict.

    He stated:

    “If your signals conflict with each other, what’s going to happen is the system will start falling back on lesser signals.”

    This means that while strong signals like rel=”canonical” tags and 301 redirects are crucial, other factors can come into play when these signals are unclear or contradictory.

    As a result, Google’s canonicalization process involves a delicate balancing act to determine the most appropriate canonical URL.

    Best Practices For Canonicalization

    Clear signals help Google identify the preferred canonical URL.

    Best practices include:

    1. Use rel=”canonical” tags correctly.
    2. Implement 301 redirects for permanently moved content.
    3. Ensure HTTPS versions of pages are accessible and linked.
    4. Submit sitemaps with preferred canonical URLs.
    5. Keep internal linking consistent.

    These signals help Google find the correct canonical URLs, improving your site’s crawling, indexing, and search visibility.

    Mistakes To Avoid

    Here are a few common mistakes to watch out for.

    1. Incorrect or conflicting canonical tags:

    • Pointing to non-existent or 404 pages
    • Multiple canonical tags with different URLs on one page
    • Pointing to a different domain entirely

    Fix: Double-check canonical tags, use only one per page, and use absolute URLs.

    2. Canonical chains or loops

    When Page A points to Page B as canonical, but Page B points back to A or another page, creating a loop.

    Fix: Ensure canonical URLs always point to the final, preferred version of the page.

    3. Using noindex and canonical tags together

    Sending mixed signals to search engines. Noindex means don’t index the page at all, making canonicals irrelevant.

    Fix: Use canonical tags for consolidation and noindex for exclusion.

    4. Canonicalizing to redirect or noindex pages

    Pointing canonicals to redirected or noindex pages confuses search engines.

    Fix: Canonical URLs should be 200 status and indexable.

    5. Ignoring case sensitivity

    Inconsistent URL casing can cause duplicate content issues.

    Fix: Keep URL and canonical tag casing consistent.

    6. Overlooking pagination and parameters

    Paginated content and parameter-heavy URLs can cause duplication if mishandled.

    Fix: Use canonical tags pointing to the first page or “View All” for pagination, and keep parameters consistent.

    Key Takeaways

    It’s unlikely the complete list of 40+ signals used to determine canonical URLs will be made publicly available.

    However, this was still an insightful discussion worth highlighting.

    Here are the key takeaways:

    • Google uses approximately 40 different signals to determine canonical URLs, with rel=”canonical” tags and 301 redirects being among the strongest indicators
    • When signals conflict, Google falls back on secondary signals to make its determination
    • Clear, consistent implementation of canonicalization signals (tags, redirects, sitemaps, internal linking) is crucial
    • Common mistakes like canonical chains, mixed signals, or incorrect implementations can confuse search engines

    Hear the full discussion in the video below:


    Featured Image: chatiyanon/Shutterstock