Google: Focus On Field Data For Core Web Vitals via @sejournal, @MattGSouthern

Google stresses the importance of using actual user data to assess Core Web Vitals instead of relying only on lab data from tools like PageSpeed Insights (PSI) and Lighthouse.

This reminder comes as the company prepares to update the throttling settings in PSI. These updates are expected to increase the performance scores of websites in Lighthouse.

Field Data vs. Lab Data

Core Web Vitals measure a website’s performance in terms of loading speed, interactivity, and visual stability from the user’s perspective.

Field data shows users’ actual experiences, while lab data comes from tests done in controlled environments using tools like Lighthouse.

Barry Pollard, a Web Performance Developer Advocate at Google, recently emphasized focusing on field data.

In a LinkedIn post, he stated:

“You should concentrate on your field Core Web Vitals (the top part of PageSpeed Insights), and only use the lab Lighthouse Score as a very rough guide of whether Lighthouse has recommendations to improve performance or not…

The Lighthouse Score is best for comparing two tests made on the same Lighthouse (e.g. to test and compare fixes).

Performance IS—and hence LH Scores also ARE—highly variable. LH is particularly affected by where it is run from (PSI, DevTools, CI…), but also on the lots of other factors.

Lighthouse is a GREAT tool but it also can only test some things, under certain conditions.

So while it’s great to see people interested in improving webperf, make sure you’re doing just that (improve performance) and not just improving the score”

Upcoming Changes To PageSpeed Insights

Pollard discussed user concerns about PageSpeed Insights’s slow servers, which can cause Lighthouse tests to take longer than expected.

To fix this, Google is changing the throttling settings in PageSpeed Insights, which should lead to better performance scores when the update is released in the coming weeks.

These changes will affect both the web interface and the API but will not impact other versions of Lighthouse.

However, Pollard  reminds users that  “a score of 100 doesn’t mean perfect; it just means Lighthouse can’t help anymore.”

Goodhart’s Law & Web Performance

Pollard referenced Goodhart’s Law, which says that when a measure becomes a goal, it stops being a good measure.

In the web performance context, focusing only on improving Lighthouse scores may not improve actual user experience.

Lighthouse is a helpful tool, but it can only assess certain aspects of performance in specific situations.

Alon Kochba, Web Performance and Software Engineer at Wix, added context to the update, stating:

“Lighthouse scores may not be the most important – but this is a big deal for Lighthouse scores in PageSpeed Insights.

4x -> 1.2x CPU throttling for Mobile device simulation, which was way off for quite a while.”

Key Takeaway: Prioritize User Experience

As the update rolls out, website owners and developers should focus on user experience using field data for Core Web Vitals.

While Lighthouse scores can help find areas for improvement, they shouldn’t be the only goal.

Google encourages creating websites that load quickly, respond well, and are visually stable.


Featured Image: GoodStudio/Shutterstock

OpenAI Releases ChatGPT o1, ‘World’s Smartest Language Model” via @sejournal, @martinibuster

Today OpenAI rolled out what Sam Altman says is the world’s smartest language model in the world plus a brand new Pro tier that comes with unlimited usage limits and with a higher level of computing resources.

OpenAI ChatGPT o1 Model

Sam Altman announced on X (formerly Twitter) that their new AI model is now live and available in ChatGPT right now and will be arriving to the API soon.

He tweeted:

“o1, the smartest model in the world. smarter, faster, and more features (eg multimodality) than o1-preview. live in chatgpt now, coming to api soon.

chatgpt pro. $200/month. unlimited usage and even-smarter mode for using o1. more benefits to come!”

Screenshot Of ChatGPT 01 Model Availability

ChatGPT Pro Mode $200/Month

ChatGPT Pro Mode is a new tier that has more “thinking power” than the standard version of o1, which increases it’s reliability. Answers in Pro mode take longer to generate, displaying a progress bar and triggering an in-app notification if the user navigates to a different conversation.

OpenAI describes the new ChatGPT Pro Mode:

“ChatGPT Pro provides access to a version of our most intelligent model that thinks longer for the most reliable responses. In evaluations from external expert testers, o1 pro mode produces more reliably accurate and comprehensive responses, especially in areas like data science, programming, and case law analysis.

Compared to both o1 and o1-preview, o1 pro mode performs better on challenging ML benchmarks across math, science, and coding.”

The new tier is not a price increase from the regular plan, which is called Plus. It’s an entirely new plan called Pro.

OpenAI’s new o1 Pro plan provides unlimited access to its new o1 model, along with o1-mini, GPT-4o, and Advanced Voice. It also includes o1 Pro Mode, which has access to increased computational power to generate more refined and insightful responses to complex queries.

Read more about OpenAI’s new pro plan and O1 model:

Introducing ChatGPT Pro

Featured Image by Shutterstock/One Artist

Google Uses About 40 Signals To Determine Canonical URLs via @sejournal, @MattGSouthern

In a recent episode of Google’s Search Off the Record podcast, Allan Scott from the “Dups” team explained how Google decides which URL to consider as the main one when there are duplicate pages.

He revealed that Google looks at about 40 different signals to pick the main URL from a group of similar pages.

Around 40 Signals For Canonical URL Selection

Duplicate content is a common problem for search engines because many websites have multiple pages with the same or similar content.

To solve this, Google uses a process called canonicalization. This process allows Google to pick one URL as the main version to index and show in search results.

Google has discussed the importance of using signals like rel=”canonical” tags, sitemaps, and 301 redirects for canonicalization. However, the number of signals involved in this process is more than you may expect.

Scott revealed during the podcast:

“I’m not sure what the exact number is right now because it goes up and down, but I suspect it’s somewhere in the neighborhood of 40.”

Some of the known signals mentioned include:

  1. rel=”canonical” tags
  2. 301 redirects
  3. HTTPS vs. HTTP
  4. Sitemaps
  5. Internal linking
  6. URL length

The weight and importance of each signal may vary, and some signals, like rel=”canonical” tags, can influence both the clustering and canonicalization process.

Balancing Signals

With so many signals at play, Allan acknowledged the challenges in determining the canonical URL when signals conflict.

He stated:

“If your signals conflict with each other, what’s going to happen is the system will start falling back on lesser signals.”

This means that while strong signals like rel=”canonical” tags and 301 redirects are crucial, other factors can come into play when these signals are unclear or contradictory.

As a result, Google’s canonicalization process involves a delicate balancing act to determine the most appropriate canonical URL.

Best Practices For Canonicalization

Clear signals help Google identify the preferred canonical URL.

Best practices include:

  1. Use rel=”canonical” tags correctly.
  2. Implement 301 redirects for permanently moved content.
  3. Ensure HTTPS versions of pages are accessible and linked.
  4. Submit sitemaps with preferred canonical URLs.
  5. Keep internal linking consistent.

These signals help Google find the correct canonical URLs, improving your site’s crawling, indexing, and search visibility.

Mistakes To Avoid

Here are a few common mistakes to watch out for.

1. Incorrect or conflicting canonical tags:

  • Pointing to non-existent or 404 pages
  • Multiple canonical tags with different URLs on one page
  • Pointing to a different domain entirely

Fix: Double-check canonical tags, use only one per page, and use absolute URLs.

2. Canonical chains or loops

When Page A points to Page B as canonical, but Page B points back to A or another page, creating a loop.

Fix: Ensure canonical URLs always point to the final, preferred version of the page.

3. Using noindex and canonical tags together

Sending mixed signals to search engines. Noindex means don’t index the page at all, making canonicals irrelevant.

Fix: Use canonical tags for consolidation and noindex for exclusion.

4. Canonicalizing to redirect or noindex pages

Pointing canonicals to redirected or noindex pages confuses search engines.

Fix: Canonical URLs should be 200 status and indexable.

5. Ignoring case sensitivity

Inconsistent URL casing can cause duplicate content issues.

Fix: Keep URL and canonical tag casing consistent.

6. Overlooking pagination and parameters

Paginated content and parameter-heavy URLs can cause duplication if mishandled.

Fix: Use canonical tags pointing to the first page or “View All” for pagination, and keep parameters consistent.

Key Takeaways

It’s unlikely the complete list of 40+ signals used to determine canonical URLs will be made publicly available.

However, this was still an insightful discussion worth highlighting.

Here are the key takeaways:

  • Google uses approximately 40 different signals to determine canonical URLs, with rel=”canonical” tags and 301 redirects being among the strongest indicators
  • When signals conflict, Google falls back on secondary signals to make its determination
  • Clear, consistent implementation of canonicalization signals (tags, redirects, sitemaps, internal linking) is crucial
  • Common mistakes like canonical chains, mixed signals, or incorrect implementations can confuse search engines

Hear the full discussion in the video below:


Featured Image: chatiyanon/Shutterstock

Google Warns Of Duplicate Content “Black Holes” Caused By Error Pages via @sejournal, @MattGSouthern

Google’s “Search Off the Record” podcast recently highlighted an SEO issue that can make web pages disappear from search results.

In the latest episode, Google Search team member Allan Scott discussed “marauding black holes” formed by grouping similar-looking error pages.

Google’s system can accidentally cluster error pages that look alike, causing regular pages to get included in these groups.

This means Google may not crawl these pages again, which can lead to them being de-indexed, even after fixing the errors.

The podcast explained how this happens, its effects on search traffic, and how website owners can keep their pages from getting lost.

How Google Handles Duplicate Content

To understand content black holes, you must first know how Google handles duplicate content.

Scott explains this happens in two steps:

  1. Clustering: Google groups pages that have the same or very similar content.
  2. Canonicalization: Google then chooses the best URL from each group.

After clustering, Google stops re-crawling these pages. This saves resources and avoids unnecessary indexing of duplicate content.

How Error Pages Create Black Holes

The black hole problem happens when error pages group together because they have similar content, such as generic “Page Not Found” messages. Regular pages with occasional errors or temporary outages can get stuck in these error clusters.

The duplication system prevents the re-crawling of pages within a cluster. This makes it hard for mistakenly grouped pages to escape the “black hole,” even after fixing the initial errors. As a result, these pages can get de-indexed, leading to a loss of organic search traffic.

Scott explained:

“Only the things that are very towards the top of the cluster are likely to get back out. Where this really worries me is sites with transient errors… If those fail to fetch, they might break your render, in which case we’ll look at your page, and we’ll think it’s broken.”

How To Avoid Black Holes

To avoid problems with duplicate content black holes, Scott shared the following advice:

  1. Use the Right HTTP Status Codes: For error pages, use proper status codes (like 404, 403, and 503) instead of a 200 OK status. Only pages marked as 200 OK may be grouped together.
  2. Create Unique Content for Custom Error Pages: If you have custom error pages that use a 200 OK status (common in single-page apps), make sure these pages contain specific content to prevent grouping. For example, include the error code and name in the text.
  3. Caution with Noindex Tags: Do not use noindex tags on error pages unless you want them permanently removed from search results. This tag strongly indicates that you want the pages removed, more so than using error status codes.

Following these tips can help ensure regular pages aren’t accidentally mixed with error pages, keeping them in Google’s index.

Regularly checking your site’s crawl coverage and indexation can help catch duplication issues early.

In Summary

Google’s “Search Off the Record” podcast highlighted a potential SEO issue where error pages can be seen as duplicate content. This can cause regular pages to be grouped with errors and removed from Google’s index, even if the errors are fixed.

To prevent duplicate content issues, website owners should:

  1. Use the correct HTTP status codes for error pages.
  2. Ensure custom error pages have unique content.
  3. Monitor their site’s crawl coverage and indexation.

Following technical SEO best practices is essential for maintaining strong search performance, as emphasized by Google’s Search team.

Hear the full discussion in the video below:


Featured Image: Nazarii_Neshcherenskyi/Shutterstock

Wix Integrates Session Recording Toolkit Into Analytics Interface via @sejournal, @martinibuster

Wix has integrated TWIPLA’s session recording toolkit into its analytics reports, enabling users to replay visitor interactions and make data-driven decisions to improve user experience and conversions

Session Recordings Toolkit

TWIPLA, a German website analytics company, announced the native integration of their session recording toolkit directly within the Wix analytics reports. Wix publishers can now replay video that replays actual customer journeys through the site, to better understand customer behavior. This analytics feature is integrated directly within the Wix analytics interface.

Noa Kroytoro, Product Manager at Wix Analytics, commented:

“The launch of the session recordings toolkit enhances the reporting tools available to Wix users through Wix Analytics, providing them with deeper insights into customer behavior for
more effective user experience optimization. Our partnership with TWIPLA enables us to deliver our users a powerful solution for data-driven decision making.”

TWIPLA CEO Tim Hammermann, said:

“It’s one of the most popular tools we have and our clients have found that it helps them to make tangible improvements to online success, particularly because the granular filtering
system makes it so easy to find session replays that match specific visitor segments.”

Full instructions and details of how to use the new session recordings are available on the Wix website:

Wix Analytics: Adding and Setting Up Session Recordings

Read the official announcement by Twipla:

TWIPLA expands partnership with Wix, powering new session recordings toolkit for advanced UX optimization (PDF)

Gravatar Offers Free Domains To Use As Bluesky Handles via @sejournal, @martinibuster

Gravatar announced a free domain name offer and Bluesky integration that makes it relatively easy to use a custom domain name as a Bluesky handle. A custom handle provides a more professional Bluesky presence compared to the standard generic versions like username.bsky.social. Integrating a domain name through Gravatar involves several steps on both platforms but is relatively easy compared to processes at a domain registrar.

Using a custom domain name through a domain name registrar requires a TXT record verification and adding a DNS record to the domain at the registrar. Claiming a domain name through Gravatar doesn’t require those extra steps, which significantly simplifies the process of using a custom domain at Bluesky.

Gravatar Domain Name Offer

The offer from Gravatar makes it simple to grab a domain name free for the first year directly from the Gravatar profile, which requires registering a free Gravatar profile (if one isn’t already registered). The option for selecting a domain name will be in the Gravatar dashboard.

Gravatar currently offers the following domains free for the first year with renewals at the standard rate the following year.

Free for the first year, renews at the following rates:

  • .link $9.00/year
  • .life $31.00/year
  • .live $28.00/year
  • .social $32.00/year
  • .bio $62.00/year
  • .fyi $18.00/year
  • .pro $21.00/year
  • .guru $35.00/year
  • .world $35.00/year
  • .ninja $19.00/year

Gravatar lists steps for completing the process:

1. Get Your Free Gravatar Domain

2. Verify Your Bluesky

3. Change Your Bluesky Handle

4. Reconnect Bluesky in Gravatar

Featured Image by Shutterstock/Koshiro K

Google Announces New ‘Dating & Companionship’ Ads Policy via @sejournal, @MattGSouthern

Google announced it will launch a new Dating and Companionship Ads policy and certification program on March 4.

This update aims to improve oversight of dating ads on Google’s advertising platforms.

New Policy Highlights

Advertisers must get certification from Google to run dating or companionship ads under the new policy.

The policy bans certain ads, including those that:

  • Promote underage dating
  • Use misleading images or text
  • Promote paid companionship or sexual acts
  • Support exploitative or deceptive practices
  • Advertise mail-order spouses

Some ads for hook-up, fling, swinger sites, affair services, sexual fetish dating, and apps with nudity or suggestive content will face additional restrictions.

Ad serving restrictions will depend on the ad type, user age, local laws, SafeSearch settings, and past searches for sexual content.

Transition Period

Google’s new Dating and Companionship Ads policy will take effect on March 4.

Advertisers should review their ads now to ensure compliance, either obtaining certification or removing non-compliant ads. Enforcement will gradually increase after the launch.

While this is a standalone policy, it incorporates relevant rules from Google’s existing policies on Inappropriate Content and sexual content, which will also be updated at the same time.

Implications For Advertisers

Brands in the dating and companionship industry must review their ads and landing pages to comply with Google’s new policy rules.

Certification will be mandatory to continue advertising in this area.

This policy aims to create a safer advertising environment by reducing misleading and inappropriate ads, helping to build trust among users of dating services.

As the March implementation date approaches, Google will share more details about the certification process and policy updates.


Featured Image: MicroOne/Shutterstock

Google On Robots.txt: When To Use Noindex vs. Disallow via @sejournal, @MattGSouthern

In a recent YouTube video, Google’s Martin Splitt explained the differences between the “noindex” tag in robots meta tags and the “disallow” command in robots.txt files.

Splitt, a Developer Advocate at Google, pointed out that both methods help manage how search engine crawlers work with a website.

However, they have different purposes and shouldn’t be used in place of each other.

When To Use Noindex

The “noindex” directive tells search engines not to include a specific page in their search results. You can add this instruction in the HTML head section using the robots meta tag or the X-Robots HTTP header.

Use “noindex” when you want to keep a page from showing up in search results but still allow search engines to read the page’s content. This is helpful for pages that users can see but that you don’t want search engines to display, like thank-you pages or internal search result pages.

When To Use Disallow

The “disallow” directive in a website’s robots.txt file stops search engine crawlers from accessing specific URLs or patterns. When a page is disallowed, search engines will not crawl or index its content.

Splitt advises using “disallow” when you want to block search engines completely from retrieving or processing a page. This is suitable for sensitive information, like private user data, or for pages that aren’t relevant to search engines.

Common Mistakes to Avoid

One common mistake website owners make is using “noindex” and “disallow” for the same page. Splitt advises against this because it can cause problems.

If a page is disallowed in the robots.txt file, search engines cannot see the “noindex” command in the page’s meta tag or X-Robots header. As a result, the page might still get indexed, but with limited information.

To stop a page from appearing in search results, Splitt recommends using the “noindex” command without disallowing the page in the robots.txt file.

Google provides a robots.txt report in Google Search Console to test and monitor how robots.txt files affect search engine indexing.

Why This Matters

Understanding the proper use of “noindex” and “disallow” directives is essential for SEO professionals.

Following Google’s advice and using the available testing tools will help ensure your content appears in search results as intended.

See the full video below:


Featured Image: Asier Romero/Shutterstock

Accessibility Champ: Wix, WordPress, Squarespace, Duda, Or…? via @sejournal, @martinibuster

The HTTP Archive published its report on the state of accessibility on the web, based on scores generated with the Lighthouse Accessibility Audit, a feature of Google’s Lighthouse website auditing tool that also measures website performance, best practices, and SEO. The report compared traditional content management systems with website building platforms, with WordPress scoring surprisingly well.

Lighthouse is a feature available through Chrome DevTools built into every Chrome-based browser and as one of the audits on the standalone PageSpeed Insights tool.

HTTP Archive

The research was conducted by the HTTP Archive, a community driven open source project that tracks data about how how sites are built and perform. They offer a configurable report of how different content management platforms perform that is updated monthly.

The accessibility report was done using data collected by the The WebAim Million study which is based on the top one million website home pages. WebAim Million uses data from the Tranco list which itself is based on six different sources to come up with the list of million sites, a list that is designed to be resistant to manipulation.

The Tranco List site explains:

“Researchers in web security or Internet measurements often use rankings of popular websites. However, in our paper we showed that these rankings disagree on which domains are most popular, can change significantly on a daily basis and can be manipulated (by malicious actors).

As the research community still benefits from regularly updated lists of popular domains, we provide Tranco, a ranking that improves upon the shortcomings of current lists. We also emphasize the reproducibility of these rankings and the studies using them by providing permanent citable references.

We currently use the lists from five providers: Cisco Umbrella (available free of charge), and Majestic (available under a CC BY 3.0 license), Farsight (only for the default list), the Chrome User Experience Report (CrUX) (available under a CC BY-SA 4.0 license), and Cloudflare Radar (available under a CC BY-NC 4.0 license). Tranco is not affiliated with any of these providers.”

Top CMS Accessibility Performance

HTTP Archive performed it’s research to identify the best performing platforms and shortcomings of each.

Accessibility: Traditional CMS

Adobe Experience Manager and Contentful were the top traditional content management systems when it came to accessibility, tied with a score of 87%, followed by Sitecore and WordPress in second place. An interesting fact about the top ranked CMSs is that, except for WordPress, three of the four top ranked CMSs were closed source, Adobe Experience Manager (AEM), Contentful and Sitecore .

Accessibility Scores By CMS:

  • Adobe Experience Manager 87%
  • Contentful 87%
  • Sitecore 85%
  • WordPress 85%
  • Craft CMS 84%
  • Contao 84%
  • Drupal 84%
  • Liferay 83%
  • TYPO3 CMS 83%
  • DNN 82%

What’s going on with the CMS scores? HTTP Archive explains:

“When most folks think about CMS, they think about the ones that you can download and install yourself. This is predominantly made up of open source tools, but not exclusively. Adobe Experience Manager (AEM), Contentful and Sitecore were the most accessible three in this list of top 10. A possible explanation for this is that closed-source software like AEM is more likely to be used by larger corporations, which have more resources to address accessibility issues. Additionally, open-source software gives website owners a lot of freedom, which in some cases can lead to worse accessibility.”

Accessibility: Website Platforms

This comparison is by website building platform, comparing platforms like Wix, Duda, and Squarespace. The accessibility scores for the platforms were higher than the scores for traditional CMSs, reflecting how private platforms are better able to control variables as opposed to an open source CMS that offers users a more open ended experience.

Accessibility Scores By Website Platform

  • Wix 94%
  • Squarespace 92%
  • Google Sites 90%
  • Duda 87%
  • Hubspot CMS Hub 87%
  • Pixnet 87%
  • Weebly 86%
  • GoDaddy Website Builder 85%
  • Webnode 84%
  • Tilda 83%

Wix Beats Out All CMS & Platforms

What’s notable about these scores is that sites built with Wix score higher for accessibility than all other sites built on any other CMS or website building platform. Ninety four percent of sites built with Wix have a That’s a reflection of Wix’s well-known effort to create a product that is strong in performance, SEO and accessibility.

Here is the list arranged in descending order by percentage:

1. Wix – 94%
2. Squarespace – 92%
3. Google Sites – 90%
4. Adobe Experience Manager – 87%
5. Contentful – 87%
6. Duda – 87%
7. Hubspot CMS Hub – 87%
8. Pixnet – 87%
9. Sitecore – 85%
10. WordPress – 85%
11. GoDaddy Website Builder – 85%
12. Weebly – 86%
13. Craft CMS – 84%
14. Contao – 84%
15. Drupal – 84%
16. Webnode – 84%
17. Liferay – 83%
18. TYPO3 CMS – 83%
19. Tilda – 83%
20. DNN – 82%

Website Accessibility

SEOs are understandably motivated by best practices for ranking better. For example, many didn’t prioritize site performance until it became a ranking factor, even though website performance improves sales and advertising performance and may have indirect impact on rankings.

Accessibility also has indirect advantages for improved search performance. For example, about .5% of the female population and 8% of males are color blind. Why would anyone who cares about their rankings alienate, frustrate and exclude approximately 4.5% of website visitors?

Wix and Squarespace are prioritizing accessibility. Everyone else should as well, because it’s both ethical and a sound business practice.

Read the HTTP Archive report here.

Featured Image by Shutterstock/Happy_Nati

Google: Host Resources On Different Hostname To Save Crawl Budget via @sejournal, @MattGSouthern

Google Search Central has launched a new series called “Crawling December” to provide insights into how Googlebot crawls and indexes webpages.

Google will publish a new article each week this month exploring various aspects of the crawling process that are not often discussed but can significantly impact website crawling.

The first post in the series covers the basics of crawling and sheds light on essential yet lesser-known details about how Googlebot handles page resources and manages crawl budgets.

Crawling Basics

Today’s websites are complex due to advanced JavaScript and CSS, making them harder to crawl than old HTML-only pages. Googlebot works like a web browser but on a different schedule.

When Googlebot visits a webpage, it first downloads the HTML from the main URL, which may link to JavaScript, CSS, images, and videos. Then, Google’s Web Rendering Service (WRS) uses Googlebot to download these resources to create the final page view.

Here are the steps in order:

  1. Initial HTML download
  2. Processing by the Web Rendering Service
  3. Resource fetching
  4. Final page construction

Crawl Budget Management

Crawling extra resources can reduce the main website’s crawl budget. To help with this, Google says that “WRS tries to cache every resource (JavaScript and CSS) used in the pages it renders.”

It’s important to note that the WRS cache lasts up to 30 days and is not influenced by the HTTP caching rules set by developers.

This caching strategy helps to save a site’s crawl budget.

Recommendations

This post gives site owners tips on how to optimize their crawl budget:

  1. Reduce Resource Use: Use fewer resources to create a good user experience. This helps save crawl budget when rendering a page.
  2. Host Resources Separately: Place resources on a different hostname, like a CDN or subdomain. This can help shift the crawl budget burden away from your main site.
  3. Use Cache-Busting Parameters Wisely: Be careful with cache-busting parameters. Changing resource URLs can make Google recheck them, even if the content is the same. This can waste your crawl budget.

Also, Google warns that blocking resource crawling with robots.txt can be risky.

If Google can’t access a necessary resource for rendering, it may have trouble getting the page content and ranking it properly.

Monitoring Tools

The Search Central team says the best way to see what resources Googlebot is crawling is by checking a site’s raw access logs.

You can identify Googlebot by its IP address using the ranges published in Google’s developer documentation.

Why This Matters

This post clarifies three key points that impact how Google finds and processes your site’s content:

  • Resource management directly affects your crawl budget, so hosting scripts and styles on CDNs can help preserve it.
  • Google caches resources for 30 days regardless of your HTTP cache settings, which helps conserve your crawl budget.
  • Blocking critical resources in robots.txt can backfire by preventing Google from properly rendering your pages.

Understanding these mechanics helps SEOs and developers make better decisions about resource hosting and accessibility – choices that directly impact how well Google can crawl and index their sites.


Featured Image: ArtemisDiana/Shutterstock