How To Use XML Sitemaps To Boost SEO via @sejournal, @jes_scholz

What was considered best practice yesterday does not hold true today and this is especially relevant when it comes to XML sitemaps, which are almost as old as SEO itself.

The problem is, it’s time-consuming to sort valuable advice from all the misinformation on forums and social media about how to optimize XML sitemaps

So, while most of us recognize the importance of submitting sitemaps to Google Search Console and Bing Webmaster Tools, as well as in the robots.txt file – for faster content discovery and refresh, more efficient crawling of SEO-relevant pages, and valuable indexing reporting to identify SEO issues – the finer details of implementing sitemaps to improve SEO performance may be missed.

Let’s clear up the confusion and dive into the current best practices for sitemap optimization.

In this article, we cover:

  • What is an XML sitemap?
  • How to create a sitemap.
  • Valid XML sitemap format.
  • Types of sitemaps.
  • Optimization of XML sitemaps.
  • XML sitemap best practice checklist.

What Is An XML Sitemap?

An XML sitemap is a file that lists all of your website’s URLs.

It acts as a roadmap to tell the crawlers of indexing platforms (like search engines, but also large language models (LLMs)) what content is available and how to reach it.

sitemap vs website crawlingImage from author, February 2025

In the example above, a search engine will find all nine pages in a sitemap with one visit to the XML sitemap file.

On the website, it will have to jump through five internal links on five pages to find page 9.

This ability of XML sitemaps to assist crawlers in faster indexing is especially important for websites that:

  • Have thousands of pages and/or a deep website architecture.
  • Frequently add new pages.
  • Frequently change the content of existing pages.
  • Suffer from weak internal linking and orphan pages.
  • Lack of a strong external link profile.

Even though indexing platforms could technically find your URLs without it, by including pages in an XML sitemap, you’re indicating that you consider them to be quality landing pages.

And while there is no guarantee that an XML sitemap will get your pages crawled faster, let alone indexed or ranked, submitting one certainly increases your chances.

How To Create A Sitemap

There are two ways to create a sitemap: Static sitemaps that must be manually updated, or dynamic sitemaps that are updated in real-time or by a regular cron job.

Static sitemaps are simple to create using a tool such as Screaming Frog.

The problem is that as soon as you create or remove a page, your sitemap is outdated. If you modify the content of a page, the sitemap won’t automatically update the lastmod tag.

So, unless you love manually creating and uploading sitemaps for every single change, it’s best to avoid static sitemaps.

Dynamic XML sitemaps, on the other hand, are automatically updated by your server to reflect relevant website changes.

To create a dynamic XML sitemap you can do one of the following:

  • Ask your developer to code a custom script, being sure to provide clear specifications.
  • Use a dynamic sitemap generator tool.
  • Install a plugin for your content management system (CMS), for example, Yoast plugin for WordPress.

Valid XML Sitemap Format

Image from author, February 2025

Your sitemap needs three items to introduce itself to indexing platforms:

  • XML Version Declaration: Specifies the file type to inform indexing platforms what they can expect from the file.
  • UTF-8 Encoding: Ensures all the characters used can be understood.
  • Specify The Namespace: Communicates what rules the sitemap follows. Most sitemaps use the “http://www.sitemaps.org/schemas/sitemap/0.9” namespace to show that the file conforms to standards set by sitemaps.org.

This is followed by a URL container for each page. In a standard XML sitemap, there are only two tags that should be included for a URL:

  1. Loc (a.k.a. Location) Tag: This compulsory tag contains the absolute, canonical version of the URL location. It should accurately reflect your site protocol (http or https) and if you have chosen to include or exclude www.
  2. Lastmod (a.k.a. Last Modified) Tag: An optional but highly recommended tag to communicate the date and time the page was published or the last meaningful change. This helps indexing platforms understand which pages have fresh content and prioritize them for crawling.

Google’s documentation on sitemaps states:

“Google uses the value if it’s consistently and verifiably (for example by comparing to the last modification of the page) accurate. The value should reflect the date and time of the last significant update to the page. For example, an update to the main content, the structured data, or links on the page is generally considered significant, however an update to the copyright date is not.”

Bing’s documentation agrees on the importance of the lastmod tag:

“The “lastmod” tag is used to indicate the last time the web pages linked by the sitemaps were modified. This information is used by search engines to determine how frequently to crawl your site, and to decide which pages to index and which to leave out.”

Mistakes, such as updating the value when the sitemap is generated rather than when the individual page was last modified, or worse, trying to manipulate crawlers by updating the date without significantly altering the page, may result in this signal being ignored by search engines for your website. Damaging your ability to have your content efficiently crawled.

Do not include the Changefreq (a.k.a. Change Frequency) Tag or priority tag. Once upon a time, these hinted at how often to crawl, but are now ignored by search engines.

Types Of Sitemaps

There are many different types of sitemaps. Let’s look at the ones you actually need.

XML Sitemap Index

XML sitemaps have a couple of limitations:

  • A maximum of 50,000 URLs.
  • An uncompressed file size limit of 50 MB.

Sitemaps can be compressed using gzip to save bandwidth for your server. But once unzipped, the sitemap still can’t exceed either limit.

Whenever you exceed either limit, you will need to split your URLs across multiple XML sitemaps.

Those sitemaps can then be combined into a single XML sitemap index file, often named sitemap-index.xml. Essentially, it is a sitemap for sitemaps.

You can create multiple sitemap index files. But be aware that you cannot nest sitemap index files.

For indexing platforms to easily find every one of your sitemap files, you will want to:

  • Submit your sitemap index to Google Search Console and Bing Webmaster Tools.
  • Specify your sitemap or sitemap index URL(s) in your robots.txt file. Pointing indexing platforms directly to your sitemap as you welcome them to crawl.
Image from author, February 2025

Image Sitemap

Image sitemaps were designed to improve the indexing of image content, originally offering additional tags.

In modern-day SEO, however, it’s best practice to utilize JSON-LD schema.org/ImageObject markup to call out image properties to indexing platforms, as it provides more attributes than an image XML sitemap.

Because of this, a dedicated XML image sitemap is unnecessary. Simply add the image XML namespace declaration and the image tags directly to the main XML sitemap within the associated URL.

Image from author, February 2025

Know that images don’t have to be on the same domain as your website to be submitted in a sitemap. You can use a CDN as long as it’s verified in Google Search Console.

Video Sitemap

Similar to images, Google says video sitemap tags can be added within an existing sitemap.

However, unlike images, video extensions in sitemaps offer a multitude of additional tags.

Image from author, February 2025

If you leverage these tags extensively, consider a dedicated video sitemap.

Adding these extensions increases the file size of your sitemap significantly and may lead to you exceeding the file size limits.

Either method will help Google discover, crawl, and index your video content as long as the video is related to the content of the host page and is accessible to Googlebot.

While Bing does support video extensions in XML sitemaps, Fabrice Canel confirmed to me that they prefer submission via IndexNow. Although Bing’s documentation still mentions the mRSS format.

Google News Sitemap

Google News sitemaps can only be used for article content that was created in the last two days. Once the articles are older than 48 hours, remove the URLs from the sitemap.

Again, while Google News sitemap tags can be included in your regular sitemap, this is not recommended.

Unlike for image and video, only Google leverages the news sitemap extension, not Bing or other indexers.

Image from author, February 2025

Contrary to some online advice, Google News sitemaps don’t support image URLs.

HTML Sitemap

XML sitemaps take care of indexing platform needs. HTML sitemaps were designed to assist human users in finding content.

The question becomes: If you have a good user experience and well-crafted internal links, do you need an HTML sitemap?

Check the page views of your HTML sitemap in Google Analytics. Chances are, it’s very low. If not, it’s a good indication that you need to improve your website navigation.

HTML sitemaps are generally linked in website footers. Taking link equity from every single page of your website.

Ask yourself. Is that the best use of that link equity? Or are you including an HTML sitemap as a nod to legacy website best practices?

If few humans use it, and indexing platforms don’t need it as you have strong internal linking and an XML sitemap, does that HTML sitemap have a reason to exist? I would argue no.

XML Sitemap Optimization

XML sitemap optimization involves how you structure your sitemaps and what URLs are included.

How you choose to do this impacts how efficiently indexing platforms crawl your website and, thus, your content visibility.

Here are four ways to optimize XML sitemaps:

1. Only Include SEO Relevant Pages In XML Sitemaps

An XML sitemap is a list of pages you want to be crawled (and subsequently given visibility to by indexing platforms), which isn’t necessarily every page of your website.

A bot arrives at your website with an “allowance” for how many pages it will crawl.

The XML sitemap indicates that you consider the included URLs more important than those that aren’t blocked but not in the sitemap.

You’re using it to tell indexing platforms, “I’d really appreciate it if you’d focus on these URLs in particular.”

To help them crawl your site more intelligently and reap the benefits of faster (re)indexing, do not include:

  • 301 redirect URLs.
  • 404 or 410 URLs.
  • Non-canonical URLs.
  • Pages with noindex tags.
  • Pages blocked by robots.txt.
  • Paginated pages.
  • Parameter URLs that aren’t SEO-relevant.
  • Resource pages accessible by a lead gen form (e.g., white paper PDFs).
  • Utility pages that are useful to users, but not intended to be landing pages (login page, contact us, privacy policy, account pages, etc.).

I’ve seen recommendations to add 3xx, 4xx, or non-indexable pages to sitemaps in the hope it will speed up deindexing.

But similar to manipulation of the last mod date, such attempts to get these pages processed faster may result in the sitemaps being ignored by search engines as a signal, damaging your ability to have your valuable content efficiently crawled.

But remember, Google is going to use your XML submission only as a hint about what’s important on your site.

Just because it’s not in your XML sitemap doesn’t necessarily mean that Google won’t index those pages.

2. Ensure Your XML Sitemap Is Valid

XML sitemap validators can tell you if the XML code is valid. But this alone is not enough.

There might be another reason why Google or Bing can’t fetch your sitemap, such as robots directives. Third-party tools won’t be able to identify this.

As such, the most efficient way to ensure your sitemap is valid is to submit it directly to Google Search Console and Bing Webmaster Tools.

Image from author, February 2025

When valid in GSC and BWT, you will see the green “Success” status.

Image from author, February 2025

If you get a red message instead, click on the error to find out why, fix it, and resubmit.

But in Google Search Console and Bing Webmaster Tools, you can do so much more than simple validation.

3. Leverage Sitemap Reporting For Indexing Analysis

Image from author, February 2025

Say you submit 80,000 pages all in one sitemap index, and 9,000 are excluded by both Google and Bing.

Sitemap reporting will help you to understand overarching why, but provides limited reporting on which URLs are problematic.

So, while it’s valuable information, it’s not easily actionable. You need to discover which types of pages were left out.

What if you use descriptive sitemap names that reflect the sections of your website – one for categories, products, articles, etc.?

Image from author, February 2025

Then, we can drill down to see that 7,000 of the 9,000 non-indexed URLs are category pages – and clearly know where to focus attention.

This can also be done within a sitemap index file.

Now, I know both Google and Schema.org show examples encouraging numbered naming. So, you may have ended up in a /sitemap-products-index.xml file with something like this:

  • /products-1.xml
  • /products-2.xml

Which is not the most insightful naming convention. What if we break it down into parent categories? For example:

  • /products-mens.xml
  • /products-womens.xml
  • /products-kids.xml

And if your website is multilingual, be sure to leverage language as an additional separation layer.

Such smart structuring of sitemaps to group by page type allows you to dive into the data more efficiently and isolate indexing issues.

Just remember, for this to effectively work, sitemaps need to be mutually exclusive, with each URL existing in only one sitemap. The exception is the Google News sitemap.

4. Strategize Sitemap Size

As mentioned before, search engines impose a limit of 50,000 URLs per sitemap file.

Some SEO specialists intentionally reduce this number, say to 10,000. This can be helpful to speed up indexing.

However, you can only download 1,000 URLs in GSC. So, if 2,000 URLs in a certain sitemap are not indexed, you can only access half of them. If you are trying to do content cleanup, this will not be enough.

Image from author, February 2025

To gain full visibility on all URLs causing issues, break sitemaps down into groups of 1,000.

The obvious downside is that this has a higher setup time as all URLs need to be submitted in Google Search Console and Bing Webmaster Tools. This may also require high levels of ongoing management.

XML Sitemap Best Practice Checklist

Do invest time to:

✓ Dynamically generate XML sitemaps.

✓ Compress sitemap files.

✓ Use a sitemap index file.

✓ Include the & tags.

✓ Use image tags in existing sitemaps.

✓ Use video and Google News sitemaps if relevant.

✓ Reference sitemap URLs in robots.txt.

✓ Submit sitemaps to both Google Search Console and Bing Webmaster Tools.

✓ Include only SEO-relevant pages in XML sitemaps.

✓ Ensure URLs are included only in a single sitemap.

✓ Ensure the sitemap code is error-free.

✓ Group URLs in descriptively named sitemaps based on page type.

✓ Strategize how to break down large sitemap files.

✓ Use Google Search Console and Bing Webmaster Tools to analyze indexing rates.

Now, go check your own sitemaps and make sure you’re doing it right.

More Resources:


Featured Image: BEST-BACKGROUNDS/Shutterstock

Google’s JavaScript SERPs Impact Trackers, AI

Google’s search engine results pages now require JavaScript, effectively “hiding” the listings from organic rank trackers, artificial intelligence models, and other optimization tools.

The world’s most popular search engine began requiring JavaScript on search pages last month. Google stated the move aimed to protect its services from bots and “abuse,” perhaps a thinly veiled allusion to competitive AI.

These changes could complicate search engine optimization in at least three ways: rank tracking, keyword research, and AI visibility.

Google Search now requires browsers to have JavaScript enabled.

Impact of JavaScript

Web crawlers can scrape and index JavaScript-enabled pages even when the JavaScript itself renders the content. Googlebot does this, for example.

A web-scraping bot grabs the content of an HTML page in four steps, more or less:

  • Request. The crawler sends a simple HTTP GET request to the URL.
  • Response. The server returns the HTML content.
  • Parse. The crawler parses (analyzes) the HTML, gathering the content.
  • Use. The content is passed on for storage or use.

For example, before the JavaScript switch, bots from Ahrefs and Semrush crawled Google SERPs. A bot could visit the SERP for, say, “men’s running shoes,” parse the HTML, and use the data to produce rank-tracking and traffic reports.

The process is relatively more complicated with JavaScript.

  • Request. The crawler sends a simple HTTP GET request to the URL.
  • Response. The server returns a basic HTML skeleton, often without much content (e.g.,

    ).

  • Execute. To run the JavaScript and load dynamic content., the crawler renders the page in a headless browser such as Puppeteer, Playwright, or Selenium.
  • Wait. The crawler waits for the page to load, including API calls and data updates. A few milliseconds might seem insignificant, but it slows down the crawlers and adds costs.
  • Parse. The crawler parses the dynamic and static HTML, gathering the content as before.
  • Use. The content is passed on for storage or use.

The two additional steps — Execute and Wait — are far from trivial since they require full browser simulation and thus much more CPU and RAM. Some have estimated that JavaScript-enabled crawling takes three to 10 times more computing resources than scraping static HTML.

Feature HTML Scraping JavaScript Scraping
Initial response Full HTML content Minimal HTML with placeholders
JavaScript execution Not required Required
Tools Requests, BeautifulSoup, Scrapy Puppeteer, Playwright, Selenium
Performance Faster, lightweight Slower, resource-heavy
Content availability Static content only Both static and dynamic content
Complexity Low High

It is worth clarifying that Google does not render the entire SERP with JavaScript, instead requiring that visitors’ browsers enable JavaScript — essentially the same impact.

The time and resources to crawl a SERP vary greatly. Hence one cannot easily assess the impact of Google’s new JavaScript requirement on one tool or another other than an educated guess.

Rank tracking

Marketers use organic rank-tracking tools to monitor where a web page appears on Google SERPs — listings, featured snippets, knowledge panels, local packs — for target keywords.

Semrush, Ahrefs, and other tools crawl millions, if not billions, of SERPs monthly. Rendering and parsing those dynamic results pages could raise costs significantly, perhaps fivefold.

For marketers, this potential increase might mean tracking tools become more expensive or relatively less accurate if they crawl SERPs infrequently.

Keyword research

Google’s JavaScript requirement may also impact keyword research since identifying relevant, high-traffic keywords could become imprecise and more costly.

These changes may force marketers to find other ways to identify content topics and keyword gaps. Kevin Indig, a respected search engine optimizer, suggested that marketers turn to page- or domain-level traffic metrics if keyword data becomes unreliable.

AI models

The hype surrounding AI engines reminds me of voice search a few years ago, although the former is becoming much more transformative.

Likely AI models crawled Google results to discover pages and content. An AI model asked to find the best running shoe for a 185-pound male might scrape a Google SERP and follow links to the top 10 sites. Thus some marketers expected a halo effect from ranking well on Google.

But AI models must now spend extra time and computing power to parse Google’s JavaScript-driven results pages.

Wait and Adapt

As is often the case with Google’s changes, marketers must wait to gauge the JavaScript effect, but one thing is certain: SEO is changing.

AI Search Optimization: Make Your Structured Data Accessible via @sejournal, @MattGSouthern

A recent investigation has uncovered a problem for websites relying on JavaScript for structured data.

This data, often in JSON-LD format, is difficult for AI crawlers to access if not in the initial HTML response.

Crawlers like GPTBot (used by ChatGPT), ClaudeBot, and PerplexityBot can’t execute JavaScript and miss any structured data added later.

This creates challenges for websites using tools like Google Tag Manager (GTM) to insert JSON-LD on the client side, as many AI crawlers can’t read dynamically generated content.

Key Findings About JSON-LD & AI Crawlers

Elie Berreby, the founder of SEM King, examined what happens when JSON-LD is added using Google Tag Manager (GTM) without server-side rendering (SSR).

He found out why this type of structured data is often not seen by AI crawlers:

  1. Initial HTML Load: When a crawler requests a webpage, the server returns the first HTML version. If structured data is added with JavaScript, it won’t be in this initial response.
  2. Client-Side JavaScript Execution: JavaScript runs in the browser and changes the Document Object Model (DOM) for users. At this stage, GTM can add JSON-LD to the DOM.
  3. Crawlers Without JavaScript Rendering: AI crawlers that can’t run JavaScript cannot see changes in the DOM. This means they miss any JSON-LD added after the page loads.

In summary, structured data added only through client-side JavaScript is invisible to most AI crawlers.

Why Traditional Search Engines Are Different

Traditional search crawlers like Googlebot can read JavaScript and process changes made to a webpage after it loads, including JSON-LD data injected by Google Tag Manager (GTM).

In contrast, many AI crawlers can’t read JavaScript and only see the raw HTML from the server. As a result, they miss dynamically added content, like JSON-LD.

Google’s Warning on Overusing JavaScript

This challenge ties into a broader warning from Google about the overuse of JavaScript.

In a recent podcast, Google’s Search Relations team discussed the growing reliance on JavaScript. While it enables dynamic features, it’s not always ideal for essential SEO elements like structured data.

Martin Splitt, Google’s Search Developer Advocate, explained that websites range from simple pages to complex applications. It’s important to balance JavaScript use with making key content available in the initial HTML.

John Mueller, another Google Search Advocate, agreed, noting that developers often turn to JavaScript when simpler options, like static HTML, would be more effective.

What To Do Instead

Developers and SEO professionals should ensure structured data is accessible to all crawlers to avoid issues with AI search crawlers.

Here are some key strategies:

  1. Server-Side Rendering (SSR): Render pages on the server to include structured data in the initial HTML response.
  2. Static HTML: Use schema markup directly in the HTML to limit reliance on JavaScript.
  3. Prerendering: Offer prerendered pages where JavaScript has already been executed, providing crawlers with fully rendered HTML.

These approaches align with Google’s advice to prioritize HTML-first development and include important content like structured data in the initial server response.

Why This Matters

AI crawlers will only grow in importance, and they play by different rules than traditional search engines.

If your site depends on GTM or other client-side JavaScript for structured data, you’re missing out on opportunities to rank in AI-driven search results.

By shifting to server-side or static solutions, you can future-proof your site and ensure visibility in traditional and AI searches.


Featured Image: nexusby/Shutterstock

What to Know about Meta Descriptions

A meta description summarizes the content on a web page. Google has long stated that meta descriptions do not impact rankings, yet business execs often misunderstand their function.

Here’s what to know about meta descriptions from a search engine optimization perspective.

Not a ranking factor

When ranking web pages, Google doesn’t consider meta descriptions, although they can appear in the snippets of organic listings, informing searchers of what the page is about.

Note the example below for a Google search of “practical ecommerce.” The snippet shows the query (“practical ecommerce”) in bold text, likely increasing the clicks on the listing. Thus meta descriptions containing popular keywords typically attract more attention — and clicks.

A search for “practical ecommerce” produces a snippet using the page’s meta description.

Not always in search results

Nonetheless, Google usually ignores a page’s meta description and uses body content in the search snippet. Google confirms this in a “Search Central” blog post:

Google primarily uses the content on the page to automatically determine the appropriate snippet. We may also use descriptive information in the meta description element when it describes the page better than other parts of the content.

A search snippet is query-dependent — Google attempts to generate a snippet relevant to the searcher’s word or phrase. Including all potential queries in a meta description is impossible, but a couple of tactics apply:

  • Include the page’s primary keyword. Google will likely display the meta description for those queries, giving page owners control over what searchers see on popular terms.
  • Use variations of the brand name. Optimize brand searches with common deviations, such as one word or two. Each option will appear in bold, driving clicks to the page.

Low priority

Unlike other on-page elements, meta descriptions are not user-facing or ranking-driven. In its Search Central post, Google even encourages machine-generated versions provided they are aimed at humans and relevant to the page:

…programmatic generation of the descriptions can be appropriate and is encouraged. Good descriptions are human-readable and diverse. Page-specific data is a good candidate for programmatic generation. Keep in mind that meta descriptions comprised of long strings of keywords don’t give users a clear idea of the page’s content and are less likely to be displayed as a snippet.

ChatGPT and Gemini can generate meaningful meta descriptions. Here’s my go-to prompt:

My target keyword is [KEYWORD]. Here’s my page copy: [TEXT]. Generate a meta description containing my keyword in the first sentence. Make the description engaging — for example, include a call-to-action.

Other AI-driven tools can produce the descriptions, too.

No ideal length

Countless search-engine tools will claim a meta description is too long or short. Always ignore them. Google continually experiments with the length and content of search snippets, such as showing the date and rich elements. Most snippets in 2025 will be just one sentence (roughly 140 characters), although that will likely change.

Insert top keywords at the beginning of a meta description instead of guessing the length. This will ensure Google uses it more often and displays those queries in bold text.

How Rendering Affects SEO: Takeaways From Google’s Martin Splitt via @sejournal, @MattGSouthern

Google has released a new episode of its Search Central Lightning Talks, which focuses on rendering strategies, an important topic for web developers.

In this video, Martin Splitt, a Developer Advocate at Google, explains the intricacies of different rendering methods and how these approaches impact website performance, user experience, and search engine optimization (SEO).

This episode also connects to recent discussions about the overuse of JavaScript and its effects on AI search crawlers, a topic previously addressed by Search Engine Journal.

Splitt’s insights offer practical guidance for developers who want to optimize their websites for modern search engines and users.

What Is Rendering?

Splitt begins by explaining what rendering means in the context of websites.

He explains rendering in simple terms, saying:

“Rendering in this context is the process of pulling data into a template. There are different strategies as to where and when this happens, so let’s take a look together.”

In the past, developers would directly edit and upload HTML files to servers.

However, modern websites often use templates to simplify the creation of pages with similar structures but varying content, such as product listings or blog posts.

Splitt categorizes rendering into three main strategies:

  1. Pre-Rendering (Static Site Generation)
  2. Server-Side Rendering (SSR)
  3. Client-Side Rendering (CSR)

1. Pre-Rendering

Screenshot from: YouTube.com/GoogleSearchCentral, January 2025.

Pre-rendering, also known as static site generation, generates HTML files in advance and serves them to users.

Splitt highlights its simplicity and security:

“It’s also very robust and very secure, as there isn’t much interaction happening with the server, and you can lock it down quite tightly.”

However, he also notes its limitations:

“It also can’t respond to interactions from your visitors. So that limits what you can do on your website.”

Tools such as Jekyll, Hugo, and Gatsby automate this process by combining templates and content to create static files.

Advantages:

  • Simple setup with minimal server requirements
  • High security due to limited server interaction
  • Robust and reliable performance

Disadvantages:

  • Requires manual or automated regeneration whenever content changes
  • Limited interactivity, as pages cannot dynamically respond to user actions

2. Server-Side Rendering (SSR): Flexibility with Trade-Offs

Screenshot from: YouTube.com/GoogleSearchCentral, January 2025.

Server-side rendering dynamically generates web pages on the server each time a user visits a site.

This approach enables websites to deliver personalized content, such as user-specific dashboards and interactive features, like comment sections.

Splitt says:

“The program decides on things like the URL, visitor, cookies, and other things—what content to put into which template and return it to the user’s browser.”

Splitt also points out its flexibility:

“It can respond to things like a user’s login status or actions, like signing up for a newsletter or posting a comment.”

But he acknowledges its downsides:

“The setup is a bit more complex and requires more work to keep it secure, as users’ input can now reach your server and potentially cause problems.”

Advantages:

  • Supports dynamic user interactions and tailored content
  • Can accommodate user-generated content, such as reviews and comments

Disadvantages:

  • Complex setup and ongoing maintenance
  • Higher resource consumption, as pages are rendered for each visitor
  • Potentially slower load times due to server response delays

To alleviate resource demands, developers can use caching or proxies to minimize redundant processing.

3. Client-Side Rendering (CSR): Interactivity with Risks

Screenshot from: YouTube.com/GoogleSearchCentral, January 2025.

Client-side rendering uses JavaScript to fetch and display data in the user’s browser.

This method creates interactive websites and web applications, especially those with real-time updates or complex user interfaces.

Splitt highlights its app-like functionality:

“The interactions feel like they’re in an app. They happen smoothly in the background without the page reloading visibly.”

However, he cautions about its risks:

“The main issue with CSR usually is the risk that, in case something goes wrong during transmission, the user won’t see any of your content. That can also have SEO implications.”

Advantages:

  • Users enjoy a smooth, app-like experience without page reloads.
  • It allows features like offline access using progressive web apps (PWAs).

Disadvantages:

  • It depends heavily on the user’s device and browser.
  • Search engines may have trouble indexing JavaScript-rendered content, leading to SEO challenges.
  • Users might see blank pages if JavaScript fails to load or run.

Splitt suggests a hybrid approach called “hydration ” to improve SEO.

In this method, the server initially renders the content, and then client-side rendering handles further interactions.

Screenshot from: YouTube.com/GoogleSearchCentral, January 2025.

How to Choose the Right Rendering Strategy

Splitt points out that there is no one-size-fits-all solution for website development.

Developers should consider what a website needs by looking at specific factors.

Splitt says:

“In the end, that depends on a bunch of factors, such as what does your website do? How often does the content change? What kind of interactions do you want to support? And what kind of resources do you have to build, run, and maintain your setup?”

He provides a visual summary of the pros and cons of each approach to help developers make informed choices.

Screenshot from: YouTube.com/GoogleSearchCentral, January 2025.

Connecting the Dots: Rendering and JavaScript Overuse

This episode continues earlier discussions about the drawbacks of excessive JavaScript use, especially regarding SEO in the age of AI search crawlers.

As previously reported, AI crawlers like GPTBot often have difficulty processing websites that rely heavily on JavaScript, which can decrease their visibility in search results.

To address this issue, Splitt recommends using server-side rendering or pre-rendering to ensure that essential content is accessible to both users and search engines. Developers are encouraged to implement progressive enhancement techniques and to limit JavaScript usage to situations where it genuinely adds value.

See the video below to learn more about rendering strategies.


Featured Image: Screenshot from: YouTube.com/GoogleSearchCentral, January 2025

How Google Detects Duplicate Content

The myth of a duplicate content penalty has existed for years. Google seeks diverse search results and must choose when two or more pages are the same or similar, resulting in the others losing organic traffic — different from a penalty.

Google’s “Search Central” blog includes a guide on ranking systems that describes deduplication:

Searches on Google may find thousands or even millions of matching web pages. Some of these may be very similar to each other. In such cases, our systems show only the most relevant results to avoid unhelpful duplication.

Yet the guide doesn’t specify how the deduplication system chooses a page. In my experience, duplication occurs in four ways.

Similar pages

When a site has similar product or category pages or syndicates content (knowingly or not), Google will likely show only one page in search results. It’s not a penalty, but it does dilute traffic among the identical pages. Thus, ensure Google ranks the original, up-to-date, detailed, and relevant page (not a syndicated or scraped version).

Canonical tags and 301 redirects can point Google to the best page. Neither is foolproof, as Google views them as suggestions. The only way to force the best page is to avoid duplicating it.

The danger of duplicate content is when a third-party scraped version overranks the original. Google can usually identify scraped content, which is typically on low-quality sites with few or no authority signals. Thus a higher-ranking scraped version implies a problem with the original site.

Featured snippets

Featured snippets appear above organic search results and provide a quick answer to a query. Google removes featured snippet URLs from lower organic positions to avoid duplication.

The purpose of featured snippets is to answer queries, removing the need to click. Thus a featured snippet page likely receives less organic traffic, and there is no surefire method to prevent it. If a page suddenly loses traffic, check Search Console to see if it’s featured.

Google will likely deduplicate AI Overviews in the same way.

Top stories

Top stories” is a separate search-result section for breaking or relevant news. A URL in top stories typically loses its organic position.

Domains

Domain names trigger a different type of duplication beyond content. Google won’t typically show the same domain in top results, even for brand name queries. Keep an eye on queries for your brand to know other domains that rank for it and how to combat them.

Google Clarifies 404 & Redirect Validation In Search Console via @sejournal, @MattGSouthern

Google’s Search Advocate, John Mueller, has provided insights into Search Console’s validation process, addressing how it handles 404 errors and redirects during site migrations.

Key Points

A Reddit user shared their experience with a client’s website migration that led to a loss in rankings.

They explained that they took several steps to address the issues, including:

  • Fixing on-site technical problems.
  • Redirecting 404 pages to the appropriate URLs.
  • Submitting these changes for validation in Google Search Console.

Although they confirmed that all redirects and 404 pages were working correctly, they failed to validate the changes in Search Console.

Feeling frustrated, the user sought advice on what to do next.

This prompted a response from Mueller, who provided insights into how Google processes these changes.

Mueller’s Response

Mueller explained how Google manages 404 errors and redirect validations in Search Console.

He clarified that the “mark as fixed” feature doesn’t speed up Google’s reprocessing of site changes. Instead, it’s a tool for site owners to monitor their progress.

Mueller noted:

“The ‘mark as fixed’ here will only track how things are being reprocessed. It won’t speed up reprocessing itself.”

He also questioned the purpose of marking 404 pages as fixed, noting that no further action is needed if a page intentionally returns a 404 error.

Mueller adds:

“If they are supposed to be 404s, then there’s nothing to do. 404s for pages that don’t exist are fine. It’s technically correct to have them return 404. These being flagged don’t mean you’re doing something wrong, if you’re doing the 404s on purpose.”

For pages that aren’t meant to be 404, Mueller advises:

“If these aren’t meant to be 404 – the important part is to fix the issue though, set up the redirects, have the new content return 200, check internal links, update sitemap dates, etc. If it hasn’t been too long (days), then probably it’ll pick up again quickly. If it’s been a longer time, and if it’s a lot of pages on the new site, then (perhaps obviously) it’ll take longer to be reprocessed.”

Key Takeaways From Mueller’s Advice

Mueller outlined several key points in his response.

Let’s break them down:

For Redirects and Content Updates

  • Ensure that redirects are correctly set up and new content returns a 200 (OK) status code.
  • Update internal links to reflect the new URLs.
  • Refresh the sitemap with updated dates to signal changes to Google.

Reprocessing Timeline

  • If changes were made recently (within a few days), Google will likely process them quickly.
  • For larger websites or older issues, reprocessing may take more time.

Handling 404 Pages

  • If a page is no longer meant to exist, returning a 404 error is the correct approach.
  • Seeing 404s flagged in Search Console doesn’t necessarily indicate a problem, provided the 404s are intentional.

Why This Matters

Website migrations can be complicated and may temporarily affect search rankings if not done correctly.

Google Search Console is useful for tracking changes, but it has limitations.

The validation process checks if fixes are implemented correctly, not how quickly changes will be made.

Practice patience and ensure all technical details—redirects, content updates, and internal linking—are adequately addressed.


Featured Image: Sammby/Shutterstock

Google’s JavaScript Warning & How It Relates To AI Search via @sejournal, @MattGSouthern

A recent discussion among the Google Search Relations team highlights a challenge in web development: getting JavaScript to work well with modern search tools.

In Google’s latest Search Off The Record podcast, the team discussed the rising use of JavaScript, and the tendency to use it when it’s not required.

Martin Splitt, a Search Developer Advocate at Google, noted that JavaScript was created to help websites compete with mobile apps, bringing in features like push notifications and offline access.

However, the team cautioned that excitement around JavaScript functionality can lead to overuse.

While JavaScript is practical in many cases, it’s not the best choice for every part of a website.

The JavaScript Spectrum

Splitt described the current landscape as a spectrum between traditional websites and web applications.

He says:

“We’re in this weird state where websites can be just that – websites, basically pages and information that is presented on multiple pages and linked, but it can also be an application.”

He offered the following example of the JavaScript spectrum:

“You can do apartment viewings in the browser… it is a website because it presents information like the square footage, which floor is this on, what’s the address… but it’s also an application because you can use a 3D view to walk through the apartment.”

Why Does This Matter?

John Mueller, Google Search Advocate, noted a common tendency among developers to over-rely on JavaScript:

“There are lots of people that like these JavaScript frameworks, and they use them for things where JavaScript really makes sense, and then they’re like, ‘Why don’t I just use it for everything?’”

As I listened to the discussion, I was reminded of a study I covered weeks ago. According to the study, over-reliance on JavaScript can lead to potential issues for AI search engines.

Given the growing prominence of AI search crawlers, I thought it was important to highlight this conversation.

While traditional search engines typically support JavaScript well, its implementation demands greater consideration in the age of AI search.

The study finds AI bots make up an increasing percentage of search crawler traffic, but these crawlers can’t render JavaScript.

That means you could lose out on traffic from search engines like ChatGPT Search if you rely too much on JavaScript.

Things To Consider

The use of JavaScript and the limitations of AI crawlers present several important considerations:

  1. Server-Side Rendering: Since AI crawlers can’t execute client-side JavaScript, server-side rendering is essential for ensuring visibility.
  2. Content Accessibility: Major AI crawlers, such as GPTBot and Claude, have distinct preferences for content consumption. GPTBot prioritizes HTML content (57.7%), while Claude focuses more on images (35.17%).
  3. New Development Approach: These new constraints may require reevaluating the traditional “JavaScript-first” development strategy.

The Path Foward

As AI crawlers become more important for indexing websites, you need to balance modern features and accessibility for AI crawlers.

Here are some recommendations:

  • Use server-side rendering for key content.
  • Make sure to include core content in the initial HTML.
  • Apply progressive enhancement techniques.
  • Be cautious about when to use JavaScript.

To succeed, adapt your website for traditional search engines and AI crawlers while ensuring a good user experience.

Listen to the full podcast episode below:


Featured Image: Ground Picture/Shutterstock

AI Crawlers Account For 28% Of Googlebot’s Traffic, Study Finds via @sejournal, @MattGSouthern

A report released by Vercel highlights the growing impact of AI bots in web crawling.

OpenAI’s GPTBot and Anthropic’s Claude generate nearly 1 billion requests monthly across Vercel’s network.

The data indicates that GPTBot made 569 million requests in the past month, while Claude accounted for 370 million.

Additionally, PerplexityBot contributed 24.4 million fetches, and AppleBot added 314 million requests.

Together, these AI crawlers represent approximately 28% of Googlebot’s total volume, which stands at 4.5 billion fetches.

Here’s what this could mean for SEO.

Key Findings On AI Crawlers

The analysis looked at traffic patterns on Vercel’s network and various web architectures. It found some key features of AI crawlers:

  • Major AI crawlers do not render JavaScript, though they do pull JavaScript files.
  • AI crawlers are often inefficient, with ChatGPT and Claude spending over 34% of their requests on 404 pages.
  • The type of content these crawlers focus on varies. ChatGPT prioritizes HTML (57.7%), while Claude focuses more on images (35.17%).

Geographic Distribution

Unlike traditional search engines that operate from multiple regions, AI crawlers currently maintain a concentrated U.S. presence:

  • ChatGPT operates from Des Moines (Iowa) and Phoenix (Arizona)
  • Claude operates from Columbus (Ohio)

Web Almanac Correlation

These findings align with data shared in the Web Almanac’s SEO chapter, which also notes the growing presence of AI crawlers.

According to the report, websites now use robots.txt files to set rules for AI bots, telling them what they can or cannot crawl.

GPTBot is the most mentioned bot, appearing on 2.7% of mobile sites studied. The Common Crawl bot, often used to collect training data for language models, is also frequently noted.

Both reports stress that website owners need to adjust to how AI crawlers behave.

3 Ways To Optimize For AI Crawlers

Based on recent data from Vercel and the Web Almanac, here are three ways to optimize for AI crawlers.

1. Server-Side Rendering

AI crawlers don’t execute JavaScript. This means any content that relies on client-side rendering might be invisible.

Recommended actions:

  • Implement server-side rendering for critical content
  • Ensure main content, meta information, and navigation structures are present in the initial HTML
  • Use static site generation or incremental static regeneration where possible

2. Content Structure & Delivery

Vercel’s data shows distinct content type preferences among AI crawlers:

ChatGPT:

  • Prioritizes HTML content (57.70%)
  • Spends 11.50% of fetches on JavaScript files

Claude:

  • Focuses heavily on images (35.17%)
  • Dedicates 23.84% of fetches to JavaScript files

Optimization recommendations:

  • Structure HTML content clearly and semantically
  • Optimize image delivery and metadata
  • Include descriptive alt text for images
  • Implement proper header hierarchy

3. Technical Considerations

High 404 rates from AI crawlers mean you need to keep these technical considerations top of mind:

  • Maintain updated sitemaps
  • Implement proper redirect chains
  • Use consistent URL patterns
  • Regular audit of 404 errors

Looking Ahead

For search marketers, the message is clear: AI chatbots are a new force in web crawling, and sites need to adapt their SEO accordingly.

Although AI bots may rely on cached or dated information now, their capacity to parse fresh content from across the web will grow.

You can help ensure your content is crawled and indexed with server-side rendering, clean URL structures, and updated sitemaps.


Featured Image: tete_escape/Shutterstock