Why Now’s The Time To Adopt Schema Markup via @sejournal, @marthavanberkel

There is no better time for organizations to prioritize Schema Markup.

Why is that so, you might ask?

First of all, Schema Markup (aka structured data) is not new.

Google has been awarding sites that implement structured data with rich results. If you haven’t taken advantage of rich results in search, it’s time to gain a higher click-through rate from these visual features in search.

Secondly, now that search is primarily driven by AI, helping search engines understand your content is more important than ever.

Schema Markup allows your organization to clearly articulate what your content means and how it relates to other things on your website.

The final reason to adopt Schema Markup is that, when done correctly, you can build a content knowledge graph, which is a critical enabler in the age of generative AI. Let’s dig in.

Schema Markup For Rich Results

Schema.org has been around since 2011. Back then, Google, Bing, Yahoo, and Yandex worked together to create the standardized Schema.org vocabulary to enable website owners to translate their content to be understood by search engines.

Since then, Google has incentivized websites to implement Schema Markup by awarding rich results to websites with certain types of markup and eligible content.

Websites that achieve these rich results tend to see higher click-through rates from the search engine results page.

In fact, Schema Markup is one of the most well-documented SEO tactics that Google tells you to do. With so many things in SEO that are backward-engineered, this one is straightforward and highly recommended.

You might have delayed implementing Schema Markup due to the lack of applicable rich results for your website. That might have been true at one point, but I’ve been doing Schema Markup since 2013, and the number of rich results available is growing.

Even though Google deprecated how-to rich results and changed the eligibility of FAQ rich results in August 2023, it introduced six new rich results in the months following – the most new rich results introduced in a year!

These rich results include vehicle listing, course info, profile page, discussion forum, organization, vacation rental, and product variants.

There are now 35 rich results that you can use to stand out in search, and they apply to a wide range of industries such as healthcare, finance, and tech.

Here are some widely applicable rich results you should consider utilizing:

  • Breadcrumb.
  • Product.
  • Reviews.
  • JobPosting.
  • Video.
  • Profile Page.
  • Organization.

With so many opportunities to take control of how you appear in search, it’s surprising that more websites haven’t adopted it.

A statistic from Web Data Commons’ October 2023 Extractions Report showed that only 50% of pages had structured data.

Of the pages with JSON-LD markup, these were the top types of entities found.

  • http://schema.org/ListItem (2,341,592,788 Entities)
  • http://schema.org/ImageObject (1,429,942,067 Entities)
  • http://schema.org/Organization (907,701,098 Entities)
  • http://schema.org/BreadcrumbList (817,464,472 Entities)
  • http://schema.org/WebSite (712,198,821 Entities)
  • http://schema.org/WebPage (691,208,528 Entities)
  • http://schema.org/Offer (623,956,111 Entities)
  • http://schema.org/SearchAction (614,892,152 Entities)
  • http://schema.org/Person (582,460,344 Entities)
  • http://schema.org/EntryPoint (502,883,892 Entities)

(Source: October 2023 Web Data Commons Report)

Most of the types on the list are related to the rich results mentioned above.

For example, ListItem and BreadcrumbList are required for the Breadcrumb Rich Result, SearchAction is required for Sitelink Search Box, and Offer is required for the Product Rich Result.

This tells us that most websites are using Schema Markup for rich results.

Even though these Schema.org types can help your site achieve rich results and stand out in search, they don’t necessarily tell search engines what each page is about in detail and help your site be more semantic.

Help AI Search Engines Understand Your Content

Have you ever seen your competitor’s sites using specific Schema.org Types that are not found in Google’s structured data documentation (i.e. MedicalClinic, IndividualPhysician, Service, etc)?

The Schema.org vocabulary has over 800 types and properties to help websites explain what the page is about. However, Google’s structured data features only require a small subset of these properties for websites to be eligible for a rich result.

Many websites that solely implement Schema Markup to get rich results tend to be less descriptive with their Schema Markup.

AI search engines now look at the meaning and intent behind your content to provide users with more relevant search results.

Therefore, organizations that want to stay ahead should use more specific Schema.org types and leverage appropriate properties to help search engines better understand and contextualize their content. You can be descriptive with your content while still achieving rich results.

For example, each type (e.g. Article, Person, etc.) in the Schema.org vocabulary has 40 or more properties to describe the entity.

The properties are there to help you fully describe what the page is about and how it relates to other things on your website and the web. In essence, it’s asking you to describe the entity or topic of the page semantically.

The word ‘semantic’ is about understanding the meaning of language.

Note that the word “understanding” is part of the definition. Funny enough, in October 2023, John Mueller at Google released a Search Update video. In this six-minute video, he leads with an update on Schema Markup.

For the first time, Mueller described Schema Markup as “a code you can add to your web pages, which search engines can use to better understand the content. ”

While Mueller has historically spoken a lot about Schema Markup, he typically talked about it in the context of rich result eligibility. So, why the change?

This shift in thinking about Schema Markup for enhanced search engine understanding makes sense. With AI’s growing role and influence in search, we need to make it easy for search engines to consume and understand the content.

Take Control Of AI By Shaping Your Data With Schema Markup

Now, if being understood and standing out in search is not a good enough reason to get started, then doing it to help your enterprise take control of your content and prepare it for artificial intelligence is.

In February 2024, Gartner published a report on “30 Emerging Technologies That Will Guide Your Business Decisions,”  highlighting generative AI and knowledge graphs as critical emerging technologies companies should invest in within the next 0-1 years.

Knowledge graphs are collections of relationships between entities defined using a standardized vocabulary that enables new knowledge to be gained by way of inferencing.

Good news! When you implement Schema Markup to define and connect the entities on your site, you are creating a content knowledge graph for your organization.

Thus, your organization gains a critical enabler for generative AI adoption while reaping its SEO benefits.

Learn more about building content knowledge graphs in my article, Extending Your Schema Markup From Rich Results to Knowledge Graphs.

We can also look at other experts in the knowledge graph field to understand the urgency of implementing Schema Markup.

In his LinkedIn post, Tony Seale, Knowledge Graph Architect at UBS in the UK, said,

“AI does not need to happen to you; organizations can shape AI by shaping their data.

It is a choice: We can allow all data to be absorbed into huge ‘data gravity wells’ or we can create a network of networks, each of us connecting and consolidating our data.”

The “networks of networks” Seale refers to is the concept of knowledge graphs – the same knowledge graph that can be built from your web data using semantic Schema Markup.”

The AI revolution has only just begun, and there is no better time than now to shape your data, starting with your web content through the implementation of Schema Markup.

Use Schema Markup As The Catalyst For AI

In today’s digital landscape, organizations must invest in new technology to keep pace with the evolution of AI and search.

Whether your goal is to stand out on the SERP or ensure your content is understood as intended by Google and other search engines, the time to implement Schema Markup is now.

With Schema Markup, SEO pros can become heroes, enabling generative AI adoption through content knowledge graphs while delivering tangible benefits, such as increased click-through rates and improved search visibility.

More resources: 


Featured Image by author

AI Overviews: Initial Tests and Studies

Google went public with AI Overviews in late May. I’ve been tracking queries and third-party studies since then.

AI Overviews occasionally produces incorrect answers. An account on X, Goog Enough, curates some of these errors. To its credit, Google has fixed many glitches, mainly by limiting the occurrence of Overviews and excluding user-generated content from the sources.

Here’s what we know thus far in AI Overview’s rollout.

15% of Queries

An article last week in Search Engine Land cited a study from BrightEdge, the search engine optimization firm, that found Overviews now shows for just 15% of queries. This aligns with my observations for queries I am following. Others in the industry have similar experiences.

Public AI Overviews only show for U.S.-based searchers signed in to Google accounts. Hence the visibility is likely even lower.

Labs vs. Public

Searchers who signed up for the Labs version of Search Generative Experience — the predecessor of AI Overviews — may see Labs-only results. To access public results, those searchers must opt out of Labs or create a Google profile in Chrome to test both.

I tested the query “how to choose a career.” When logged-in to my Labs profile, I see an Overview with the notation “Search Labs” above it.

Search Labs participants see a unique notation above the Overview. Click image to enlarge.

When using my non-Labs profile, I see the same AI Overviews with visible sources.

Public, non-Labs searchers see sources below the Overview. Click image to enlarge.

Optimizing for AI Overviews

In testing SGE results before the public rollout of Overviews, we knew getting a site listed in an AI answer wasn’t too difficult. From Google’s 2023 SGE patent:

  • AI answers in Google summarize existing search results
  • Google generates references (i.e., links in the answers) after creating the summaries.

The relative ease apparently remains. A simple test by search optimizer Cyrus Shepard immediately generated a link in AI Overviews. He first identified a query that produced an Overview for which his page ranked organically with a featured snippet. He then updated that page with a paragraph closely matching the text in the Overview. A link to his page appeared right away.

However, Shepard listed two caveats:

  • His page lost the featured snippet.
  • The link in the Overview disappeared quickly.

The link disappeared too soon for meaningful conclusions, such as the impact on featured snippets. However, the rapid change — in and then out of Overviews — suggests the fluidity of AI answers and caution in implementing new optimization tactics. Google is changing AI Overviews seemingly daily. While short-lived gains are possible, there’s too little data for long-term action.

How SEO Experts Can Utilize ChatGPT For BigQuery With Examples via @sejournal, @vahandev

AI is shaping every field by making skills (such as coding or data visualization) accessible to everyone, which weren’t available in the past.

An AI operator who can run the right prompts can perform low- and medium-level difficulty tasks, allowing more focus on strategic decision-making.

In this guide, we will walk you through step by step how to use AI chatbots with ChatGPT as an example to run complex BigQuery queries for your SEO reporting needs.

We will review two examples:

It will also give you an overall idea of how you can use chatbots to reduce the burden when running SEO reports.

Why Do You Need To Learn BigQuery?

SEO tools like Google Search Console or Google Analytics 4 have accessible user interfaces you can use to access data. But often, they limit what you can do and show incomplete data, which is usually called data sampling.

In GSC, this happens because the tool omits anonymized queries and limits table rows to up to 1,000 rows.

Screenshot from GSC Screenshot from Google Search Console, May 2024

By using BigQuery, you can solve that problem and run any complex reports you want, eliminating the data sampling issue that occurs quite often when working with large websites.

(Alternatively, you may try using Looker Studio, but the purpose of this article is to illustrate how you can operate ChatGPT for BigQuery.)

For this article, we assume you have already connected your GSC and GA4 accounts to BigQuery. If you haven’t done it yet, you may want to check our guides on how to do it:

SQL Basics

If you know Structured Query Language (SQL), you may skip this section. But for those who don’t, here is a quick reference to SQL statements:

Statement Description
SELECT Retrieves data from tables
INSERT Inserts new data into a table
UNNEST Flattens an array into a set of rows
UPDATE Updates existing data within a table
DELETE Deletes data from a table
CREATE Creates a new table or database
ALTER Modifies an existing table
DROP Deletes a table or a database.

The conditions we will be using so you can familiarize yourself:

Condition Description
WHERE Filters records for specific conditions
AND Combines two or more conditions where all conditions must be true
OR Combines two or more conditions where at least one condition must be true
NOT Negates a condition
LIKE Searches for a specified pattern in a column.
IN Checks if a value is within a set of values
BETWEEN Select values within a given range
IS NULL Checks for null values
IS NOT NULL Checks for non-null values
EXISTS Checks if a subquery returns any records

Now, let’s dive into examples of how you can use BigQuery via ChatGPT.

1. How To Analyze Traffic Decline Because Of Google Algorithm Impact 

If you have been affected by a Google algorithm update, the first thing you should do is run reports on affected pages and analyze why you have been impacted.

Remember, the worst thing you can do is start changing something on the website right away in panic mode. This may cause fluctuations in search traffic and make analyzing the impact even harder.

If you have fewer pages in the index, you may find using GSC UI data satisfactory for analyzing your data, but if you have tens of thousands of pages, it won’t let you export more than 1,000 rows (either pages or queries) of data.

Say you have a week of data since the algorithm update has finished rolling out and want to compare it with the previous week’s data. To run that report in BigQuery, you may start with this simple prompt:

Imagine you are a data analyst experienced in Google Analytics 4 (GA4), Google Search Console, SQL, and BigQuery.
Your task is to generate an SQL query to compare 'WEB' Search Console data for the periods '2024-05-08' to '2024-05-20' and '2024-04-18' to '2024-04-30'. 
Extract the total clicks, impressions, and average position for each URL for each period. 
Additionally, calculate the differences in these metrics between the periods for each URL 
(where average position should be calculated as the sum of positions divided by the sum of impressions).

Details:

BigQuery project name: use_your_bigquery_projectname
Dataset name: searchconsole
Table name: searchdata_url_impression
Please provide the SQL query that meets these requirements.

Once you get an SQL code, copy and paste it into the BigQuery SQL editor, but I bet the initial code you will get will have errors. For example, table column names may not match what is in your BigQuery dataset.

Error in BigQuery SQL when column name doesn't match the dataset column.Error in BigQuery SQL when column name doesn’t match the dataset column.

Things like this happen quite often when performing coding tasks via ChatGPT. Now, let’s dive into how you can quickly fix issues like this.

Simply click on your dataset in the left-right panel, select all columns on the right side, and click Copy as Table.

How to select all columns of table in bigquery.How to select all columns of the table in BigQuery.

Once you have it, just copy and paste it as a follow-up prompt and hit enter.

Generate SQL again by taking into account that my table schema is as follows : 
"fullname mode type description
data_date DATE 
site_url STRING 
url STRING 
query STRING 
is_anonymized_query BOOLEAN 
is_anonymized_discover BOOLEAN 
country STRING 
search_type STRING 
device STRING 
is_amp_top_stories BOOLEAN 
is_amp_blue_link BOOLEAN 
is_job_listing BOOLEAN 
is_job_details BOOLEAN 
is_tpf_qa BOOLEAN 
is_tpf_faq BOOLEAN 
is_tpf_howto BOOLEAN 
is_weblite BOOLEAN 
is_action BOOLEAN 
is_events_listing BOOLEAN 
is_events_details BOOLEAN 
is_search_appearance_android_app BOOLEAN 
is_amp_story BOOLEAN 
is_amp_image_result BOOLEAN 
is_video BOOLEAN 
is_organic_shopping BOOLEAN 
is_review_snippet BOOLEAN 
is_special_announcement BOOLEAN 
is_recipe_feature BOOLEAN 
is_recipe_rich_snippet BOOLEAN 
is_subscribed_content BOOLEAN 
is_page_experience BOOLEAN 
is_practice_problems BOOLEAN 
is_math_solvers BOOLEAN 
is_translated_result BOOLEAN 
is_edu_q_and_a BOOLEAN 
impressions INTEGER 
clicks INTEGER 
sum_position INTEGER 
is_product_snippets BOOLEAN 
is_merchant_listings BOOLEAN 
is_learning_videos BOOLEAN "

This will fix the generated SQL according to the GSC table structure as follows:

WITH period1 AS (
 SELECT 
   url,
   SUM(clicks) AS clicks,
   SUM(impressions) AS impressions,
   IFNULL(SUM(sum_position) / NULLIF(SUM(impressions), 0), 0) AS avg_position
 FROM 
   `use_your_bigquery_projectname.searchconsole.searchdata_url_impression`
 WHERE 
   data_date BETWEEN '2024-05-08' AND '2024-05-20' 
 AND search_type = 'WEB'
  GROUP BY 
  url
), 
period2 AS (
 SELECT 
  url,
  SUM(clicks) AS clicks,
  SUM(impressions) AS impressions,
  IFNULL(SUM(sum_position) / NULLIF(SUM(impressions), 0), 0) AS avg_position
 FROM 
  `use_your_bigquery_projectname.searchconsole.searchdata_url_impression`
  WHERE 
  data_date BETWEEN '2024-04-18' AND '2024-04-30'
 AND search_type = 'WEB'
 GROUP BY 
 url
),
comparison AS (
 SELECT
  COALESCE(p1.url, p2.url) AS url,
  IFNULL(p1.clicks, 0) AS clicks_period1,
  IFNULL(p2.clicks, 0) AS clicks_period2,
  IFNULL(p1.impressions, 0) AS impressions_period1,
  IFNULL(p2.impressions, 0) AS impressions_period2,
  IFNULL(p1.avg_position, 0) AS avg_position_period1,
  IFNULL(p2.avg_position, 0) AS avg_position_period2,
  IFNULL(p1.clicks, 0) - IFNULL(p2.clicks, 0) AS clicks_diff,
  IFNULL(p1.impressions, 0) - IFNULL(p2.impressions, 0) AS impressions_diff,
  IFNULL(p1.avg_position, 0) - IFNULL(p2.avg_position, 0) AS avg_position_diff
 FROM
  period1 p1
 FULL OUTER JOIN
   period2 p2
 ON
  p1.url = p2.url
)
SELECT 
 url,
 clicks_period1,
 clicks_period2,
 clicks_diff,
 impressions_period1,
 impressions_period2,
 impressions_diff,
 avg_position_period1,
 avg_position_period2,
 avg_position_diff
FROM 
 comparison
ORDER BY 
 url;

Now, run it and enjoy the data either by exporting it into the CSV or Google Sheets.

How to run SQL in BigQueryHow to run SQL in BigQuery.

In case you have millions of URLs, you may not be able to work in Google Sheets or CSV export because the data is too big. Plus, those apps have limitations on how many rows you can have in a single document. In that case, you can save results as a BigQuery table and connect to it with Looker Studio to view the data.

But please remember that BigQuery is a freemium service. It is free up to 1 TB of processed query data a month. Once you exceed that limit, your credit card will be automatically charged based on your usage.

That means if you connect your BigQuery to Looker Studio and browse your data there, it will count against your billing every time you open your Looker dashboard.

That is why, when exports have a few tens of thousands or hundreds of thousands of rows, I like using Google Sheets. I can easily connect it to Looker Studio for data visualization and blending, and this will not count against my billing.

If you have ChatGPT Plus, you can simply use this custom GPT I’ve made, which takes into account table schemas for GA4 and Search Console. In the above guide, I assumed you were using the free version, and it illustrated how you can use ChatGPT overall for running BigQuery.

In case you want to know what is in that custom GPT, here is the screenshot of the backend.

Custom GPT with bigQuery table schemasCustom GPT with BigQuery table schemas.

Nothing complicated – you just need to copy tables from BigQuery as JSON in the step explained above and upload them into the custom GPT so it can refer to the table structure. Additionally, there is a prompt that asks GPT to refer to the JSON files attached when composing queries.

This is another illustration of how you can use ChatGPT to perform tasks more effectively, eliminating repetitive tasks.

If you need to work with another dataset (different from GA4 or GSC) and you don’t know SQL, you can upload the table schema from BigQuery into ChatGPT and compose SQLs specific to that table structure. Easy, isn’t it?

As homework, I suggest you analyze which queries have been affected by AI Overviews.

There is no differentiator in the Google Search Console table to do that, but you can run a query to see which pages didn’t lose ranking but had a significant CTR drop after May 14, 2024, when Google introduced AI Overviews.

You can compare the two-week period after May 14th with the two weeks prior. There is still a possibility that the CTR drop happened because of other search features, like a competitor getting a Featured Snippet, but you should find enough valid cases where your clicks were affected by AI Overviews (formerly Search Generative Experience or “SGE”).

2. How To Combine Search Traffic Data With Engagement Metrics From GA4 

When analyzing search traffic, it is vital to understand how much users engage with content because user engagement signals are ranking factors. Please note that I don’t mean the exact metrics defined in GA4.

However, GA4’s engagement metrics – such as “average engagement time per session,” which is the average time your website was in focus in a user’s browser – may hint at whether your articles are good enough for users to read.

If it is too low, it means your blog pages may have an issue, and users don’t read them.

If you combine that metric with Search Console data, you may find that pages with low rankings also have a low average engagement time per session.

Please note that GA4 and GSC have different attribution models. GA4 uses last data-driven or last-click attribution models, which means if one visits from Google to an article page once and then comes back directly two more times, GA4 may attribute all three visits to Google, whereas GSC will report only one.

So, it is not 100% accurate and may not be suitable for corporate reporting, but having engagement metrics from GA4 alongside GSC data provides valuable information to analyze your rankings’ correlations with engagement.

Using ChatGPT with BigQuery requires a little preparation. Before we jump into the prompt, I suggest you read how GA4 tables are structured, as it is not as simple as GSC’s tables.

It has an event_params column, which has a record type and contains dimensions like page_location, ga_session_id, and engagement_time_msec.  It tracks how long a user actively engages with your website.

event_params key engagement_time_msec is not the total time on the site but the time spent on specific interactions (like clicking or scrolling), when each interaction adds a new piece of engagement time. It is like adding up all the little moments when users are actively using your website or app.

Therefore, if we sum that metric and average it across sessions for the pages, we obtain the average engagement time per session.

Now, once you understand engagement_time_msec , let’s ask ChatGPT to help us construct a query that pulls GA4 “average engagement time per session” for each URL and combines it with GSC search performance data of articles.

The prompt I would use is:

Imagine you are a data analyst experienced in Google Analytics 4 (GA4), Google Search Console, SQL, and BigQuery.
Compose a SQL query that pulls the following data from Google Search Console for each URL for the previous 7 days, excluding the current day:

1. Clicks,
2. Impressions,
3. Average position (calculated as the sum of positions divided by the sum of impressions).

From GA4 BigQuery table unnest from event_params ga_session_id, engagement_time_msec and page_location.
Select only rows which have engagement_time_msec set as not null group all sessions with the same IDs and page_location and SUM engagement_time_msec and devides to SUM of sessions number 

Join GA4 and Google Search Console data by URLs for the same period. 
Additonally Optimize the query to pull from GA4's table partitions and not query the entire table.

Details:
1. BigQuery project name: use_your_bigquery_projectname 
2. GA4 dataset name: use_your_ga4_bigquery_dataset_name
3. GA4 table name: use_your_ga4_bigquery_table_name_under_dataset
3. Google Search Console dataset name: use_your_gsc_bigquery_table_name_under_dataset
3. Google Search Console table name: use_your_gsc_bigquery_table_name_under_dataset
4. Here is BigQuery tables schemas for GA4: [copy table schema here]
5. Here is BigQuery tables schemas for Google Search Console: [copy table schema here]


Once I copied and pasted into BigQuery, it gave me results with “average engagement time per session” being all nulls. So, apparently, ChatGPT needs more context and guidance on how GA4 works.

I’ve helped to provide additional knowledge as a follow-up question from GA4’s official documentation on how it calculates engagement_time_msec. I copied and pasted the document into the follow-up prompt and asked to refer to that knowledge when composing the query, which helped. (If you get any syntax error, just copy/paste it as a follow-up question and ask to fix it.)

As a result of 15 minutes of effort, I got the right SQL:


WITH
-- Step 1: Filter GA4 data for the last 7 days and unnest event_params
ga4_data AS (
  SELECT
    event_date,
    event_timestamp,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
  FROM
    `your_bq_project_name.your_bq_ga4_dataset.events_*`
  WHERE
     _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 8 DAY))
    AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
    AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') IS NOT NULL
),

-- Step 2: Aggregate GA4 data by session ID and page location
aggregated_ga4_data AS (
  SELECT
    page_location,
    SUM(engagement_time_msec/1000) / NULLIF(COUNT(DISTINCT ga_session_id), 0) AS avg_engagement_time_msec
  FROM
    ga4_data
  GROUP BY
    page_location
),
-- Step 3: Filter GSC data for the last 7 days and select urls which had clicks
gsc_data AS (
  SELECT
    url,
    SUM(clicks) AS clicks,
    SUM(impressions) AS impressions,
    SUM(sum_position) / SUM(impressions) AS avg_position
  FROM
    `your_bq_project_name.searchconsole.searchdata_url_impression`
  WHERE
    data_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 8 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
    and 
    clicks > 0
  GROUP BY
    url
)

-- Joining Google Search Console data with GA4 data by page_location and url 
SELECT
  gsc.url,
  gsc.clicks,
  gsc.impressions,
  gsc.avg_position,
  ga4.avg_engagement_time_msec
FROM
  gsc_data AS gsc
LEFT JOIN
  aggregated_ga4_data AS ga4
ON
  gsc.url = ga4.page_location
ORDER BY
  gsc.clicks DESC;

This pulls GSC data with engagement metrics from GA4.

Search Console combined data with GA4Search Console combined data with GA4

Please note that you might notice discrepancies between the numbers in the GA4 UI and the data queried from BigQuery tables.

This happens because GA4 focuses on “Active Users” and groups rare data points into an “(other)” category, while BigQuery shows all raw data. GA4 also uses modeled data for gaps when consent isn’t given, which BigQuery doesn’t include.

Additionally, GA4 may sample data for quicker reports, whereas BigQuery includes all data. These variations mean GA4 offers a quick overview, while BigQuery provides detailed analysis. Learn a more detailed explanation of why this happens in this article.

Perhaps you may try modifying queries to include only active users to bring results one step closer to GA4 UI.

Alternatively, you can use Looker Studio to blend data, but it has limitations with very large datasets. BigQuery offers scalability by processing terabytes of data efficiently, making it ideal for large-scale SEO reports and detailed analyses.

Its advanced SQL capabilities allow complex queries for deeper insights that Looker Studio or other dashboarding tools cannot match.

Conclusion

Using ChatGPT’s coding abilities to compose BigQuery queries for your reporting needs elevates you and opens new horizons where you can combine multiple sources of data.

This demonstrates how ChatGPT can streamline complex data analysis tasks, enabling you to focus on strategic decision-making.

At the same time, these examples taught us that humans absolutely need to operate AI chatbots because they may hallucinate or produce wrong answers.

More resources: 


Featured Image: NicoElNino/Shutterstock

Why Using A Log Analyzer Is A Must For Big Websites

This post was sponsored by JetOctopus. The opinions expressed in this article are the sponsor’s own.

If you manage a large website with over 10,000 pages, you can likely appreciate the unique SEO challenges that come with such scale.

Sure, the traditional tools and tactics — keyword optimization, link building, etc. — are important to establish a strong foundation and maintain basic SEO hygiene.

However, they may not fully address the technical complexities of Site Visibility for Searchbots and the dynamic needs of a large enterprise website.

This is where log analyzers become crucial. An SEO log analyzer monitors and analyzes server access logs to give you real insights into how search engines interact with your website. It allows you to take strategic action that satisfies both search crawlers and users, leading to stronger returns on your efforts.

In this post, you’ll learn what a log analyzer is and how it can enable your enterprise SEO strategy to achieve sustained success. But first, let’s take a quick look at what makes SEO tricky for big websites with thousands of pages.

The Unique SEO Challenges For Large Websites

Managing SEO for a website with over 10,000 pages isn’t just a step up in scale; it’s a whole different ball game.

Relying on traditional SEO tactics limits your site’s potential for organic growth. You can have the best titles and content on your pages, but if Googlebot can’t crawl them effectively, those pages will be ignored and may not get ranked ever.

Image created by JetOctopus, May 2024

For big websites, the sheer volume of content and pages makes it difficult to ensure every (important) page is optimized for visibility to Googlebot. Then, the added complexity of an elaborate site architecture often leads to significant crawl budget issues. This means Googlebot is missing crucial pages during its crawls.

Image created by JetOctopus, May 2024

Furthermore, big websites are more vulnerable to technical glitches — such as unexpected tweaks in the code from the dev team — that can impact SEO. This often exacerbates other issues like slow page speeds due to heavy content, broken links in bulk, or redundant pages that compete for the same keywords (keyword cannibalization).

All in all, these issues that come with size necessitate a more robust approach to SEO. One that can adapt to the dynamic nature of big websites and ensure that every optimization effort is more meaningful toward the ultimate goal of improving visibility and driving traffic.

This strategic shift is where the power of an SEO log analyzer becomes evident, providing granular insights that help prioritize high-impact actions. The primary action being to better understand Googlebot like it’s your website’s main user — until your important pages are accessed by Googlebot, they won’t rank and drive traffic.

What Is An SEO Log Analyzer?

An SEO log analyzer is essentially a tool that processes and analyzes the data generated by web servers every time a page is requested. It tracks how search engine crawlers interact with a website, providing crucial insights into what happens behind the scenes. A log analyzer can identify which pages are crawled, how often, and whether any crawl issues occur, such as Googlebot being unable to access important pages.

By analyzing these server logs, log analyzers help SEO teams understand how a website is actually seen by search engines. This enables them to make precise adjustments to enhance site performance, boost crawl efficiency, and ultimately improve SERP visibility.

Put simply, a deep dive into the logs data helps discover opportunities and pinpoint issues that might otherwise go unnoticed in large websites.

But why exactly should you focus your efforts on treating Googlebot as your most important visitor?

Why is crawl budget a big deal?

Let’s look into this.

Optimizing Crawl Budget For Maximum SEO Impact

Crawl budget refers to the number of pages a search engine bot — like Googlebot — will crawl on your site within a given timeframe. Once a site’s budget is used up, the bot will stop crawling and move on to other websites.

Crawl budgets vary for every website and your site’s budget is determined by Google itself, by considering a range of factors such as the site’s size, performance, frequency of updates, and links. When you focus on optimizing these factors strategically, you can increase your crawl budget and speed up ranking for new website pages and content.

As you’d expect, making the most of this budget ensures that your most important pages are frequently visited and indexed by Googlebot. This typically translates into better rankings (provided your content and user experience are solid).

And here’s where a log analyzer tool makes itself particularly useful by providing detailed insights into how crawlers interact with your site. As mentioned earlier, it allows you to see which pages are being crawled and how often, helping identify and resolve inefficiencies such as low-value or irrelevant pages that are wasting valuable crawl resources.

An advanced log analyzer like JetOctopus offers a complete view of all the stages from crawling and indexation to getting organic clicks. Its SEO Funnel covers all the main stages, from your website being visited by Googlebot to being ranked in the top 10 and bringing in organic traffic.

Image created by JetOctopus, May 2024

As you can see above, the tabular view shows how many pages are open to indexation versus those closed from indexation. Understanding this ratio is crucial because if commercially important pages are closed from indexation, they will not appear in subsequent funnel stages.

The next stage examines the number of pages crawled by Googlebot, with “green pages” representing those crawled and within the structure, and “gray pages” indicating potential crawl budget waste because they are visited by Googlebot but not within the structure, possibly orphan pages or accidentally excluded from the structure. Hence, it’s vital to analyze this part of your crawl budget for optimization.

The later stages include analyzing what percentage of pages are ranked in Google SERPs, how many of these rankings are in the top 10 or top three, and, finally, the number of pages receiving organic clicks.

Overall, the SEO funnel gives you concrete numbers, with links to lists of URLs for further analysis, such as indexable vs. non-indexable pages and how crawl budget waste is occurring. It is an excellent starting point for crawl budget analysis, allowing a way to visualize the big picture and get insights for an impactful optimization plan that drives tangible SEO growth.

Put simply, by prioritizing high-value pages — ensuring they are free from errors and easily accessible to search bots — you can greatly improve your site’s visibility and ranking.

Using an SEO log analyzer, you can understand exactly what should be optimized on pages that are being ignored by crawlers, work on them, and thus attract Googlebot visits. A log analyzer benefits in optimizing other crucial aspects of your website:

Image created by JetOctopus, May 2024
  • Detailed Analysis of Bot Behavior: Log analyzers allow you to dissect how search bots interact with your site by examining factors like the depth of their crawl, the number of internal links on a page, and the word count per page. This detailed analysis provides you with the exact to-do items for optimizing your site’s SEO performance.
  • Improves Internal Linking and Technical Performance: Log analyzers provide detailed insights into the structure and health of your site. They help identify underperforming pages and optimize the internal links placement, ensuring a smoother user and crawler navigation. They also facilitate the fine-tuning of content to better meet SEO standards, while highlighting technical issues that may affect site speed and accessibility.
  • Aids in Troubleshooting JavaScript and Indexation Challenges: Big websites, especially eCommerce, often rely heavily on JavaScript for dynamic content. In the case of JS websites, the crawling process is lengthy. A log analyzer can track how well search engine bots are able to render and index JavaScript-dependent content, underlining potential pitfalls in real-time. It also identifies pages that are not being indexed as intended, allowing for timely corrections to ensure all relevant content can rank.
  • Helps Optimize Distance from Index (DFI): The concept of Distance from Index (DFI) refers to the number of clicks required to reach any given page from the home page. A lower DFI is generally better for SEO as it means important content is easier to find, both by users and search engine crawlers. Log analyzers help map out the navigational structure of your site, suggesting changes that can reduce DFI and improve the overall accessibility of key content and product pages.

Besides, historical log data offered by a log analyzer can be invaluable. It helps make your SEO performance not only understandable but also predictable. Analyzing past interactions allows you to spot trends, anticipate future hiccups, and plan more effective SEO strategies.

With JetOctopus, you benefit from no volume limits on logs, enabling comprehensive analysis without the fear of missing out on crucial data. This approach is fundamental in continually refining your strategy and securing your site’s top spot in the fast-evolving landscape of search.

Real-World Wins Using Log Analyzer

Big websites in various industries have leveraged log analyzers to attain and maintain top spots on Google for profitable keywords, which has significantly contributed to their business growth.

For example, Skroutz, Greece’s biggest marketplace website with over 1 million sessions daily, set up a real-time crawl and log analyzer tool that helped them know things like:

  • Does Googlebot crawl pages that have more than two filters activated?
  • How extensively does Googlebot crawl a particularly popular category?
  • What are the main URL parameters that Googlebot crawls?
  • Does Googlebot visit pages with filters like “Size,” which are typically marked as nofollow?

This ability to see real-time visualization tables and historical log data spanning over ten months for monitoring Googlebot crawls effectively enabled Skroutz to find crawling loopholes and decrease index size, thus optimizing its crawl budget.

Eventually, they also saw a reduced time for new URLs to be indexed and ranked — instead of taking 2-3 months to index and rank new URLs, the indexing and ranking phase took only a few days.

This strategic approach to technical SEO using log files has helped Skroutz cement its position as one of the top 1000 websites globally according to SimilarWeb, and the fourth most visited website in Greece (after Google, Facebook, and Youtube) with over 70% share of its traffic from organic search.

Image created by JetOctopus, May 2024

Another case in point is DOM.RIA, Ukraine’s popular real estate and rental listing website, which doubled the Googlebot visits by optimizing their website’s crawl efficiency. As their site structure is huge and elaborate, they needed to optimize the crawl efficiency for Googlebot to ensure the freshness and relevance of content appearing in Google.

Initially, they implemented a new sitemap to improve the indexing of deeper directories. Despite these efforts, Googlebot visits remained low.

By using the JetOctopus to analyze their log files, DOM.RIA identified and addressed issues with their internal linking and DFI. They then created mini-sitemaps for poorly scanned directories (such as for the city, including URLs for streets, districts, metro, etc.) while assigning meta tags with links to pages that Googlebot often visits. This strategic change resulted in a more than twofold increase in Googlebot activity on these crucial pages within two weeks.

Image created by JetOctopus, May 2024

Getting Started With An SEO Log Analyzer

Now that you know what a log analyzer is and what it can do for big websites, let’s take a quick look at the steps involved in logs analysis.

Here is an overview of using an SEO log analyzer like JetOctopus for your website:

  • Integrate Your Logs: Begin by integrating your server logs with a log analysis tool. This step is crucial for capturing all data related to site visits, which includes every request made to the server.
  • Identify Key Issues: Use the log analyzer to uncover significant issues such as server errors (5xx), slow load times, and other anomalies that could be affecting user experience and site performance. This step involves filtering and sorting through large volumes of data to focus on high-impact problems.
  • Fix the Issues: Once problems are identified, prioritize and address these issues to improve site reliability and performance. This might involve fixing broken links, optimizing slow-loading pages, and correcting server errors.
  • Combine with Crawl Analysis: Merge log analysis data with crawl data. This integration allows for a deeper dive into crawl budget analysis and optimization. Analyze how search engines crawl your site and adjust your SEO strategy to ensure that your most valuable pages receive adequate attention from search bots.

And that’s how you can ensure that search engines are efficiently indexing your most important content.

Conclusion

As you can see, the strategic use of log analyzers is more than just a technical necessity for large-scale websites. Optimizing your site’s crawl efficiency with a log analyzer can immensely impact your SERP visibility.

For CMOs managing large-scale websites, embracing a log analyzer and crawler toolkit like JetOctopus is like getting an extra tech SEO analyst that bridges the gap between SEO data integration and organic traffic growth.


Image Credits

Featured Image: Image by JetOctopus Used with permission.

What is ‘Crawled – currently not indexed’ in Search Console?

Google Search Console (GSC) is a powerful tool for site owners. It helps you monitor and maintain your site’s presence in the SERPs. One of the common issues you might encounter is the “Crawled — currently not indexed” status. This message means that Google has crawled your page but hasn’t indexed it. As a result, your page won’t appear in search results, which can affect your site’s traffic.

It is important to understand why this happens and how to fix it. Here, we’ll explain the “Crawled — currently not indexed” status and provide tips to help you improve your site’s indexing and visibility.

What is “Crawled – Currently Not Indexed” in GSC?

The “Crawled — currently not indexed” status in Search Console means that Google has crawled your page but has not indexed it. This status indicates that while Google knows your page exists, it hasn’t added it to its search index. As a result, the page won’t appear in the SERPs.

This status can occur for several reasons. It might be due to content quality, technical issues, or even Google’s indexing process. Understanding why this happens is the first step in resolving the issue. By addressing the factors that lead to this status, you can improve your chances of getting your pages indexed and visible in search results.

An example of a site with many Crawled – currently not indexed errors in Google Search Console

But first…

Before we continue, you must understand the crawling and indexing process. In short, before your content can appear in search engine results, it goes through three key stages: crawling, indexing, and ranking.

Crawling

Search engines use automated bots called crawlers or spiders to discover new and updated pages. These bots follow links from known pages to new ones, exploring and recording the web’s content.

Indexing

After crawling, the search engine processes and stores the information it finds. It analyzes the content, keywords, and usability and then adds valuable pages to its index.

Ranking

When a user searches, the search engine retrieves relevant pages from its index and ranks them based on relevance, authority, and user experience. Higher-ranked pages appear at the top of search results, attracting more traffic.

If you have that error message, the process stalls during indexing. For reasons unknown, your content does not move on to the ranking phase.

Reasons why this happens

Many factors can cause the “Crawled – currently not indexed” status in Google Search Console. Google may determine your content isn’t valuable or unique enough to be indexed. Duplicate content or thin content with little value can lead to this status.

Technical problems can also prevent Google from indexing your pages. These issues might include server errors, incorrect robots.txt configuration, or the presence of noindex tags. A poorly organized website can make it difficult for Google to understand and index your content. The lack of a clear hierarchy and insufficient internal linking can contribute to this problem.

Sometimes, Google simply needs more time to index new or recently updated content. This is especially true for new websites or pages. Addressing these factors can improve your chances of getting your pages indexed and appearing in search results.

Google’s changing indexing priorities

There’s another aspect to all of this. Due to the vast amount of content generated daily, Google has become more critical and uses fewer resources to process new pages. With the rise of generative AI, there’s a significant overlap in content, leading Google to be more selective about what it indexes. The recent updates, including the Helpful Content Update and the March 2024 Core Update, reflect this shift.

This means you might mean you need to ask yourself the following question:

Why should Google even index your page?

Given the vast amount of similar content out there, Google needs a compelling reason to index your page. Here’s why your content should stand out:

  • Originality: Unique content that offers new insights or information is more likely to be indexed.
  • Value: Content that genuinely helps or informs users will be prioritized.
  • Quality: Google’s algorithms favor high-quality, well-written content with good structure and readability.

Tips to fix “Crawled – Currently Not Indexed”

Improving content quality, resolving technical issues, and optimizing your website’s structure is essential to address the “Crawled – currently not indexed” status.

Fix your content

Ensure your content is original and provides value to your audience. Avoid duplicate content. Cover topics thoroughly to offer in-depth information that meets user needs. Make your content as engaging and informative as possible.

Make sure your robots.txt file is not blocking Google from crawling your pages. Use Search Console to test your robots.txt file. If you want your pages indexed, verify that they do not have noindex tags. Use the Inspect URL tool in GSC to check. Resolve any 5xx server errors that may prevent Google from accessing your pages. Regularly monitor server logs for issues. Optimize crawling with Yoast SEOs tailored tools.

Improve your site structure

Organize your website with a clear structure, using categories and subcategories to help Google understand the relationships between your pages. Use internal links to connect related content and help Google navigate your site more effectively. Implement breadcrumb navigation to improve site structure and user experience.

Generate an XML sitemap that lists all the important pages on your website. This helps Google discover and crawl your pages more efficiently. Upload the XML sitemap Yoast SEO generates for your site to Search Console. This ensures that Google knows all your pages and can crawl them more easily.

Patience, please

Understand that indexing can take time, especially for new websites. Be patient and monitor your indexing status regularly. Regularly check Google Search Console for any issues related to crawling and indexing. Address any warnings or errors promptly. Keep track of changes you make to your site and monitor their impact on indexing and search visibility.

Additional best practices

Consider these additional SEO best practices to improve your indexing chances and enhance your site’s performance.

Add content regularly to keep your site relevant and valuable. Fresh content can attract more visitors and encourage Google to crawl your site more frequently. In addition, you should periodically review and update older content to ensure it remains accurate and useful.

Try to improve engagement on your site. Engaged users can signal to Google that your content is valuable. Promote your content on social media to increase visibility and drive traffic. Social signals can indirectly influence your SEO.

Ensure your site performs well on mobile devices. Google’s indexing processes work from a mobile perspective, so a good mobile user experience is crucial. Optimize your site’s loading speed. Faster pages provide a better user experience and can improve your search rankings.

Write guest posts for reputable websites in your niche. Quality backlinks from authoritative sites can boost your own site’s credibility. Create valuable content that others want to link to. Natural backlinks can improve your site’s authority and indexing.

Addressing the “Crawled – currently not indexed” status

Addressing the “Crawled – currently not indexed” status in Google Search Console is necessary if you want to improve your site’s visibility. Understanding the reasons behind this status and implementing practical solutions can enhance your chances of getting indexed.

Focus on improving content quality, resolving technical issues, and optimizing your website’s structure. Regularly update your content, engage with users, and monitor your site’s performance. These efforts will help you maintain a solid online presence and ensure your content reaches your audience.

Coming up next!

Optimizing Interaction To Next Paint (INP): A Step-By-Step Guide via @sejournal, @DebugBear

This post was sponsored by DebugBear. The opinions expressed in this article are the sponsor’s own.

Keeping your website fast is important for user experience and SEO.

The Core Web Vitals initiative by Google provides a set of metrics to help you understand the performance of your website.

The three Core Web Vitals metrics are:

This post focuses on the recently introduced INP metric and what you can do to improve it.

How Is Interaction To Next Paint Measured?

INP measures how quickly your website responds to user interactions – for example, a click on a button. More specifically, INP measures the time in milliseconds between the user input and when the browser has finished processing the interaction and is ready to display any visual updates on the page.

Your website needs to complete this process in under 200 milliseconds to get a “Good” score. Values over half a second are considered “Poor”. A poor score in a Core Web Vitals metric can negatively impact your search engine rankings.

Google collects INP data from real visitors on your website as part of the Chrome User Experience Report (CrUX). This CrUX data is what ultimately impacts rankings.

Image created by DebugBear, May 2024

How To Identify & Fix Slow INP Times

The factors causing poor Interaction to Next Paint can often be complex and hard to figure out. Follow this step-by-step guide to understand slow interactions on your website and find potential optimizations.

1. How To Identify A Page With Slow INP Times

Different pages on your website will have different Core Web Vitals scores. So you need to identify a slow page and then investigate what’s causing it to be slow.

Using Google Search Console

One easy way to check your INP scores is using the Core Web Vitals section in Google Search Console, which reports data based on the Google CrUX data we’ve discussed before.

By default, page URLs are grouped into URL groups that cover many different pages. Be careful here – not all pages might have the problem that Google is reporting. Instead, click on each URL group to see if URL-specific data is available for some pages and then focus on those.

Screenshot of Google Search Console, May 2024

Using A Real-User Monitoring (RUM) Service

Google won’t report Core Web Vitals data for every page on your website, and it only provides the raw measurements without any details to help you understand and fix the issues. To get that you can use a real-user monitoring tool like DebugBear.

Real-user monitoring works by installing an analytics snippet on your website that measures how fast your website is for your visitors. Once that’s set up you’ll have access to an Interaction to Next Paint dashboard like this:

Screenshot of the DebugBear Interaction to Next Paint dashboard, May 2024

You can identify pages you want to optimize in the list, hover over the URL, and click the funnel icon to look at data for that specific page only.

Image created by DebugBear, May 2024

2. Figure Out What Element Interactions Are Slow

Different visitors on the same page will have different experiences. A lot of that depends on how they interact with the page: if they click on a background image there’s no risk of the page suddenly freezing, but if they click on a button that starts some heavy processing then that’s more likely. And users in that second scenario will experience much higher INP.

To help with that, RUM data provides a breakdown of what page elements users interacted with and how big the interaction delays were.

Screenshot of the DebugBear INP Elements view, May 2024

The screenshot above shows different INP interactions sorted by how frequent these user interactions are. To make optimizations as easy as possible you’ll want to focus on a slow interaction that affects many users.

In DebugBear, you can click on the page element to add it to your filters and continue your investigation.

3. Identify What INP Component Contributes The Most To Slow Interactions

INP delays can be broken down into three different components:

  • Input Delay: Background code that blocks the interaction from being processed.
  • Processing Time: The time spent directly handling the interaction.
  • Presentation Delay: Displaying the visual updates to the screen.

You should focus on which INP component is the biggest contributor to the slow INP time, and ensure you keep that in mind during your investigation.

Screenshot of the DebugBear INP Components, May 2024

In this scenario, Processing Time is the biggest contributor to the slow INP time for the set of pages you’re looking at, but you need to dig deeper to understand why.

High processing time indicates that there is code intercepting the user interaction and running slow performing code. If instead you saw a high input delay, that suggests that there are background tasks blocking the interaction from being processed, for example due to third-party scripts.

4. Check Which Scripts Are Contributing To Slow INP

Sometimes browsers report specific scripts that are contributing to a slow interaction. Your website likely contains both first-party and third-party scripts, both of which can contribute to slow INP times.

A RUM tool like DebugBear can collect and surface this data. The main thing you want to look at is whether you mostly see your own website code or code from third parties.

Screenshot of the INP Primary Script Domain Grouping in DebugBear, May 2024

Tip: When you see a script, or source code function marked as “N/A”, this can indicate that the script comes from a different origin and has additional security restrictions that prevent RUM tools from capturing more detailed information.

This now begins to tell a story: it appears that analytics/third-party scripts are the biggest contributors to the slow INP times.

5. Identify Why Those Scripts Are Running

At this point, you now have a strong suspicion that most of the INP delay, at least on the pages and elements you’re looking at, is due to third-party scripts. But how can you tell whether those are general tracking scripts or if they actually have a role in handling the interaction?

DebugBear offers a breakdown that helps see why the code is running, called the INP Primary Script Invoker breakdown. That’s a bit of a mouthful – multiple different scripts can be involved in slowing down an interaction, and here you just see the biggest contributor. The “Invoker” is just a value that the browser reports about what caused this code to run.

Screenshot of the INP Primary Script Invoker Grouping in DebugBear, May 2024

The following invoker names are examples of page-wide event handlers:

  • onclick
  • onmousedown
  • onpointerup

You can see those a lot in the screenshot above, which tells you that the analytics script is tracking clicks anywhere on the page.

In contrast, if you saw invoker names like these that would indicate event handlers for a specific element on the page:

  • .load_more.onclick
  • #logo.onclick

6. Review Specific Page Views

A lot of the data you’ve seen so far is aggregated. It’s now time to look at the individual INP events, to form a definitive conclusion about what’s causing slow INP in this example.

Real user monitoring tools like DebugBear generally offer a way to review specific user experiences. For example, you can see what browser they used, how big their screen is, and what element led to the slowest interaction.

Screenshot of a Page View in DebugBear Real User Monitoring, May 2024

As mentioned before, multiple scripts can contribute to overall slow INP. The INP Scripts section shows you the scripts that were run during the INP interaction:

Screenshot of the DebugBear INP script breakdown, May 2024

You can review each of these scripts in more detail to understand why they run and what’s causing them to take longer to finish.

7. Use The DevTools Profiler For More Information

Real user monitoring tools have access to a lot of data, but for performance and security reasons they can access nowhere near all the available data. That’s why it’s a good idea to also use Chrome DevTools to measure your page performance.

To debug INP in DevTools you can measure how the browser processes one of the slow interactions you’ve identified before. DevTools then shows you exactly how the browser is spending its time handling the interaction.

Screenshot of a performance profile in Chrome DevTools, May 2024

How You Might Resolve This Issue

In this example, you or your development team could resolve this issue by:

  • Working with the third-party script provider to optimize their script.
  • Removing the script if it is not essential to the website, or finding an alternative provider.
  • Adjusting how your own code interacts with the script

How To Investigate High Input Delay

In the previous example most of the INP time was spent running code in response to the interaction. But often the browser is already busy running other code when a user interaction happens. When investigating the INP components you’ll then see a high input delay value.

This can happen for various reasons, for example:

  • The user interacted with the website while it was still loading.
  • A scheduled task is running on the page, for example an ongoing animation.
  • The page is loading and rendering new content.

To understand what’s happening, you can review the invoker name and the INP scripts section of individual user experiences.

Screenshot of the INP Component breakdown within DebugBear, May 2024

In this screenshot, you can see that a timer is running code that coincides with the start of a user interaction.

The script can be opened to reveal the exact code that is run:

Screenshot of INP script details in DebugBear, May 2024

The source code shown in the previous screenshot comes from a third-party user tracking script that is running on the page.

At this stage, you and your development team can continue with the INP workflow presented earlier in this article. For example, debugging with browser DevTools or contacting the third-party provider for support.

How To Investigate High Presentation Delay

Presentation delay tends to be more difficult to debug than input delay or processing time. Often it’s caused by browser behavior rather than a specific script. But as before, you still start by identifying a specific page and a specific interaction.

You can see an example interaction with high presentation delay here:

Screenshot of the an interaction with high presentation delay, May 2024

You see that this happens when the user enters text into a form field. In this example, many visitors pasted large amounts of text that the browser had to process.

Here the fix was to delay the processing, show a “Waiting…” message to the user, and then complete the processing later on. You can see how the INP score improves from May 3:

Screenshot of an Interaction to Next Paint timeline in DebugBear, May 2024

Get The Data You Need To Improve Interaction To Next Paint

Setting up real user monitoring helps you understand how users experience your website and what you can do to improve it. Try DebugBear now by signing up for a free 14-day trial.

Screenshot of the DebugBear Core Web Vitals dashboard, May 2024

Google’s CrUX data is aggregated over a 28-day period, which means that it’ll take a while before you notice a regression. With real-user monitoring you can see the impact of website changes right away and get alerted automatically when there’s a big change.

DebugBear monitors lab data, CrUX data, and real user data. That way you have all the data you need to optimize your Core Web Vitals in one place.

This article has been sponsored by DebugBear, and the views presented herein represent the sponsor’s perspective.

Ready to start optimizing your website? Sign up for DebugBear and get the data you need to deliver great user experiences.


Image Credits

Featured Image: Image by Redesign.co. Used with permission.

International SEO For 2024: 9-Point Checklist For Success via @sejournal, @LidiaInfanteM

Getting your international SEO strategy right can be an elusive feat.

There are a lot more factors at play than people give credit for, and it’s often a thankless job.

A successful international SEO strategy requires a deep knowledge of your company’s commercial strategy as well as technical SEO knowledge, cultural sensitivity, and excellent data skills.

Yet the industry often regards international SEO as just your hreflang setup.

In this article, I will distill the complexities of international SEO success into an actionable step-by-step list that will take you from beginner to advanced practitioner. Let’s begin!

Part I: Be Commercially Aware

1. Understand Why Your Company Is Going International

Companies can grow by expanding their products and services, focusing on gaining market penetration or expanding into new markets.

While your team’s goal might be traffic, leads, or revenue, the leadership team is likely working under a different set of parameters. Most of the time, leadership’s ultimate goal is to maximize shareholder value.

  • In founder-owned companies, growth goals might be slower and more sustainable, usually aimed at maintaining and growing profitability.
  • VC-owned companies have high growth goals because they must provide their investors with a return that’s higher than the stock market. This is what is known as the alpha, or your company’s ability to beat the market in growth.
  • Publicly traded companies are likely aiming to grow their share value.
  • Startups, depending on their maturity stage, are likely looking to prove product-market fit or expand their reach fast to show that their operations are scalable and have the potential to be profitable in the future. The goal of this is to aid in raising further capital from investors.

Understanding why businesses go international is essential for informing your SEO decisions. What’s best practice for SEO isn’t always what’s best for business.

You must adapt your strategy to your company’s growth model.

  • Companies choosing to grow sustainably and maintain profitability will likely expand more slowly to a market that resembles their core market.
  • VC-owned companies will be able to invest in a wider range of countries, with a smaller concern for providing their users with an experience on par with that of their core markets.
  • Startups can try to beat their competitors to market by expanding quickly and throwing a lot of money at the project, or they might be concerned with cash flow and try to expand fast but cut corners by using automatic translation.

2. Stack Rank Your Target Markets To Prioritize Your Investment

I promise I’ll get to hreflang implementation soon, but so much about international SEO has to do with commercial awareness – so bear with me; this will make you a better professional.

Many companies have different market tiers to reflect how much of a priority each market is. Market prioritization can happen using many different metrics, such as:

  • Average order value or lifetime customer value.
  • Amount of investment required.
  • Market size.
  • And market similarity.

American companies often prioritize developed English-speaking countries such as the UK, Canada, or Australia. These are most similar to their core market, and most of their market knowledge will be transferable.

After that, companies are likely to target large European economies, such as Germany and France. They might also target the LatAm market and Spain in the same effort.

The last prioritization tier can vary widely among companies, with a focus on the Nordic, Brazilian, or Asian markets.

Part II: Know Your Tech

3. Define Your International URL Structure

When doing international SEO, there are 4 different possible URL structures, each with its pros and cons.

ccTLD Structure

A ccTLD structure is set up to target different countries based on the domain type.

This structure is not ideal for companies that target different languages rather than different countries. For example, a .es website is targeting Spain, not the Spanish language.

An advantage to this kind of structure is that the ccTLD sends a very strong localization signal to search engines as to what market they are targeting, and they can lead to improved trust and CTR in your core country.

On the other hand, ccTLDs can dilute your site’s authority, as links will be spread across domains rather than concentrated on the .com.

gTLD With Subdirectories

This is my personal favorite when it comes to international SEO.

These URL structures can look like website.com/en if they’re targeting languages or website.com/en-gb if they’re targeting countries.

This configuration aggregates the authority you gain across your different territories into a single domain, it’s cheaper to maintain, and the .com TLD is widely recognizable by users worldwide.

On the other hand, this setup can look less personalized to people outside the US, who might wonder if you can service their markets.

gTLD With Subdomains

This setup involves placing international content on a subdomain like us.website.com. While once popular, it’s slipping in favor because it doesn’t bring anything unique to the table anymore.

This setup offers a clear signal to users and search engines about the intended audience of a specific subdomain.

However, subdomains often face issues with SEO, as Google tends to view them as separate entities. This separation can dilute link, similar to the ccTLD approach but without the geo-targeting advantages.

gTLD With Parameters

This is the setup where you add parameters at the end of the URL to indicate the language of the page, such as website.com/?lang=en.

I strongly advise against this setup, as it can present multiple technical SEO challenges and trust issues.

4. Understand Your Hreflang Setup

In the words of John Mueller: hreflang can be one of the most complex aspects of SEO.

Tweet by John Mueller talking about how hreflang can be one of the more complex aspects of SEO.Screenshot from Twitter, May 2024

Hreflang reminds me of a multilingual form of a canonical tag, where we tell search engines that one document is a version of the other and explain the relationship between them.

I find hreflang implementation very interesting from a technical point of view. Because development teams mostly manage it, and it can be very much hit or miss.

Often, hreflang is constructed from existing fields in your content management system (CMS) or content database.

You might find that your development team is pulling the HTML lang tag, which follows a different ISO standard than hreflang, leading to a broken implementation.

Other times, there is a field in your CMS that your development team pulls from to build your hreflang setup.

Finding out how your hreflang tags are generated can be extremely helpful in identifying the sources of different issues or mitigating potential risks.

So speak to your engineering team and ask them how you’re currently generating hreflang.

5. Implement Hreflang Without Errors

There are three ways to implement hreflang on your site:

  • On your sitemap.
  • Through your HTTP header.
  • On your HTML head.

The method most of us are most familiar with is the HTML head. And while you can use more than one method, they should match each other perfectly. Otherwise, you risk confusing search engines.

Here are some basic rules for getting it done correctly:

  • In your hreflang implementation, the URL must include domain and protocol.
  • You must follow the ISO 639-1 language codes – don’t go around making up your own.
  • Hreflang tags must be reciprocal. If the page you’re listing as a language alternative does not list you back, your implementation won’t work.
  • Audit your hreflang regularly. My favorite tool for this, since it added the hreflang cluster analysis and link graphs, is Ahrefs. For the record, Ahrefs is not paying me to say this; it’s a genuine recommendation and has helped me a lot in my work.
  • You should only have one page per language.
  • Your hreflang URLs should be self-canonicalizing and respond with a 200 code.

Follow the above rules, and you’ll avoid the most common hreflang mistakes that SEO pros make.

And if you’re interested in the technical SEO aspect beyond hreflang, I recommend reading Mind your language by Rob Owen.

Part III: Invest In Content Incrementally

6. Translate Your Top-performing Content Topics

Now that you have the basic commercial and technical knowledge covered, you’re ready to start creating a content strategy.

You likely have a wealth of content in your core market that can be recycled. But you want to focus on translating high-converting topics, not just any topic; otherwise, you might be wasting your budget!

Let’s go step by step.

Cluster Your Website’s Content By Topic

  • Crawl your site using your favorite SEO tool and extract the URL and H1.
  • Use ChatGPT to classify that list of URLs into topics. You might already know what you usually write about, so include those topics in your prompt. You don’t want to have a classification that’s too granular, so you can prompt chatGPT to only create groups with a minimum of 10 URLs (adjust this to reflect the size of your website) and class everything else as other. This is an example of what your prompt might look like: “I will provide you with a list of article titles and their corresponding URL. Classify this list into the following topics: survey best practices, research and analysis, employee surveys, market research and others. Return this in a table format with the URL, title and group name.”
  • Start a spreadsheet with all your URLs in the first column, titles in the second column, and the group they belong to in the third column.

Measure Your Performance By Topic

  • Export your GSC data and use a =VLOOKUP formula to match your clicks to your URLs.
  • Export your conversion data and use a =VLOOKUP formula to match your conversions (leads, sales, sign-ups, or revenue) to the right URL.
  • You can then copy your topics column onto a new sheet. Remove duplicates and use the =SUMIF formula to aggregate your click data and conversion data by topic.

Choose What Topics You’ll Be Translating First

Using this data, you can now choose what topics are most likely to drive conversions based on your core market data. Choose how many topics or pieces of content you’ll be translating based on your budget.

Personally, I like translating one topic at a time because I’ve found that generating topical authority on one specific topic makes it easier for me to rank on an adjacent topic that I write about next.

7. Localize Your English Content

Once you’re set up with all your key pages and a few content topics, it’s time to evaluate your investment and see where you could be getting a bigger return.

At this stage, many companies have translated their content into a few different languages and likely copied the US content into their UK and Australian sites. Now that you’ve done some translation, it’s time to work on localization.

If you’ve just copied your US content into your UK and Australian sites, your Google Search Console indexing report might be screaming at you, “Duplicate, Google selected a different canonical than the user.”

A very easy fix that could yield great returns is to localize your English content to the nuances of those English-speaking markets.

You will want to instruct your translation and localization providers to adapt the spellings of certain words, change the choice of words, introduce local expressions, and update any cited statistic for the US with their local equivalent.

For example, if I’m targeting a British audience, “analyze” becomes “analyse,” a “stroller” becomes a “pram,” and “soccer” becomes “football.”

8. Invest In In-market Content

Once you’ve got the basics in place, you can start tackling the specific needs of other markets. This strategy is expensive, and you should only use it in your priority markets, but it can really set you apart from your competitors.

For this, you will need to work with a local linguist to identify pain points, use cases, or needs exclusive to your target market.

For example, if France suddenly made it mandatory to run a diversity and inclusion study for companies with over 250 employees, I’d want to know this and create some content on DEI surveys at SurveyMonkey.

9. Integrate With Other Content Workflows

In step six, we evaluated our top-performing content, chose the best articles to translate, and got it all down. But wait. Some of these source articles have been updated. And there is even more content now!

To run a successful international SEO campaign you must integrate with all the other teams publishing content within your organization.

Usually, the teams creating content in an organization are SEO, content, PR, product marketing, demand generation, customer marketing, customer service, customer education, or solutions engineering.

That’s a lot, and you won’t be able to integrate with everyone all at once. Prioritize the teams that create the most revenue-generating content, such as SEO, content, or product marketing.

Working with these teams, you will have to establish a process for what happens when they create a new piece, update some content, or remove an existing piece.

These processes can differ for everyone, but I can tell you what I do with my team and hope it inspires you.

  • When a piece of content that’s already been localized into international markets is updated, we get the content in a queue to be re-localized the next quarter.
  • When they create a new piece of content, we evaluate its performance, and if it’s performing above average, we add it to a localization queue for the next quarter.
  • When they change the URL of a piece of content or delete it, all international sites must follow suit at the same time, since due to some technical limitations, not making the change globally would create some hreflang issues.

Wrapping Up

International SEO is vast and complex, and no article can cover it all, but many interesting resources have been created by SEO pros across the community for those who want to learn more.

Navigating the complexities of international SEO is no small feat. It’s an intricate dance of aligning commercial strategies with technical precision, cultural insights, and data-driven decisions.

From understanding your company’s core motives for global expansion to meticulously implementing hreflang tags and localizing content, every step plays a crucial role in building a successful international presence.

More resources: 


Featured Image: BritCats Studio/Shutterstock

Early Look at Google’s AI Overviews

AI Overviews, Google’s generative AI search feature, is now live for all users. Google tested it for months, calling it Search Generative Experience.

AI Overviews summarizes search results for some queries, but Google has not disclosed the percentage. The feature has two versions:

  • An expandable AI answer on top of search results, pushing organic search listings further down the page.
  • A “Generate” button to create an Overview by clicking it.

The latter is less intrusive, but I’ve seen no statistics on which is more frequent.

AI Overviews often contain links referencing the sources. Google claims those links are more “clickable” — i.e., prominent — than conventional organic listings. We cannot verify this information because Google has provided no AI click data in Search Console or elsewhere.

Yet I doubt Google’s claim because frequently the links are not visible without expanding an AI answer. For example, searching “how to choose a career” produces an AI answer but no immediately visible source link.

Searching “how to choose a career” produces an AI answer but no immediately visible source link. Click image to enlarge.

Content providers can block Google from showing their info in Overviews using nosnippet, max-snippet, or data-nosnippet meta tags. But any of those could impact organic search listings. I suggest waiting a bit before deciding, although it’s worth experimenting if you see a drop in overall organic clicks for an important query.

Keep an eye on your key pages for traffic losses via Search Console and Google Analytics 4. On Search Console, identify the queries that bring fewer clicks and then search on them for potential AI Overviews.

Despite widespread angst, the traffic impact of AI Overviews is impossible to evaluate at this early stage. Last month I addressed a third-party study of SGE’s impact, as it was then partially public.

Organic search traffic has been declining for years owing to all the new sections in search result pages. AI Overviews will likely continue this trend.

Still, monitoring traffic losses is important.

It’s possible to optimize a page to appear in Overviews. Last year, in “SEO for SGE,” I listed a few basics:

  • Create relevant content addressing all kinds of problems of your target audience.
  • Optimize product pages and categories based on users’ needs.
  • Structure the site to surface popular topics.
  • Obtain external links to key pages. Links drive discovery and the Knowledge Graph, among other things. They are especially important for co-citation links, which place your site next to known entities and gradually become one through those associations.
  • Use Google’s submission tools. Ranking organically is the only way to appear in Gemini, hence AI Overviews.

Indexed and Ranked

My main takeaway is this. AI Overviews rely on current rankings for each query. In that respect, SEO isn’t changing. It is still about getting pages indexed and ranked for relevant queries.

Google Revs Ecommerce SERPs

Google is revving up its product search results, making it easier for consumers to price shop without leaving search engine page results.

Search for an unbranded product such as “buy blue womens sun dress” and scroll past sponsored listings and local results. Below that, on the primary SERP, Google added a grid of tile-like product boxes triggered by purchase-intent queries. Each tile can include a product name, images, price, store name, average star ratings, and review count.

Screenshot of product-grid boxes

Product boxes appear on primary SERPs and can include product names, images, prices, store names, average star ratings, and review counts. Click image to enlarge.

The tiles function differently from conventional organic results. Instead of sending shoppers to a product detail page on an ecommerce site, the tiles link to shopping knowledge panels that load in the SERP. The panels are similar to product detail pages but with one big difference: Google tacks on a merchant list with pricing.

“This is particularly useful for users because they can compare prices much more easily,” says ecommerce SEO consultant Aleyda Solis. But for online stores, it’s yet another hurdle to get the click.

How Google ranks product tiles remains unclear. But they are populated by structure data — Schema.org markup or similar. SEO consultants and ecommerce store owners have wrestled for years over which structured data types are worth publishing since Google wasn’t paying attention to all of them.

But last February, Google expanded support for product structured data, announcing new shipping and returns classes and product variants such as sizes, colors, and materials. This will likely bury skirmishes about the value of structured data since visibility in product grids and shopping knowledge panels depends on it.

Shopping Knowledge Panels

In shopping knowledge panels, the store name on the product tile gets the top ranking on the merchant list. But size, color, and other sort-by options let shoppers reshuffle the merchant list by those variants.

Screenshot of a shopping knowledge panel

Shopping knowledge panels load directly in SERPs and contain sort-by options that reorder the list of merchants. Click image to enlarge.

The sort-by feature will likely incent store owners to get their Schema act together or risk disappearing from the merchant list. Shoppers using the feature could unwittingly filter out merchants that ignore product variants.

“If you have technical constraints or don’t have a developer, there are tools that facilitate the implementation of product Schema markup. Wordlift is one. Schema App is another,” says Solis. You can also use ChatGPT to generate product Schema.

For ecommerce merchants, the shopping knowledge panel lessens the importance of unique landing pages. Many searchers will likely go straight from the product grid to the shopping knowledge panel to a merchant’s product detail page.

The development could be a win for Amazon, which will appear in more product knowledge panels due to the breadth and depth of its catalog. Moreover, Amazon could use predatory pricing to undercut smaller ecommerce stores in merchant lists.

Last September, Google’s domain name registrar business was acquired by Squarespace. “Maybe Google thinks we won’t need domains anymore,” speculates Ross Kernez, a digital strategist. “If everything gets converted to SGE [Search Generative Experience] and only ecommerce survives, the top of the funnel will be gone. Transactional queries will still be here, but that means people could need fewer domains,” says Kernez.

Mike King, CEO of marketing agency iPullRank, disagrees. “We’ve heard of the death of websites when mobile apps appeared. People were like, we’re not going to need websites anymore. Everything’s going to be an app. Well, that didn’t happen,” says King.

Diminished Value?

Either way, conventional organic listings are getting pushed further below the fold. With AI results, paid shopping, pay-per-click ads, map packs, forums, image carousels, and now product grids, it is possible to secure top traditional organic rankings and receive less traffic.

With the rise of ChatGPT, the growth of product review search on TikTok and Instagram, and the recent completion of its March core update, Google appears to be reinventing web search and, perhaps, diminishing the value of organic search as a marketing channel.

The result could force marketers to prioritize other traffic sources such as social networks, email marketing, and generative AI optimization.

Google’s enormous audience cannot be ignored. But with so much volatility in the SERPs, diversifying ecommerce traffic sources is becoming increasingly important. I see no evidence of ecommerce merchants shifting resources from organic search to TikTok, ChatGPT, Reddit, and Facebook. But it does appear that relying on organic traffic is getting riskier.

Using Python To Explain Homepage Redirection To C-Suite (Or Any SEO Best Practise) via @sejournal, @artios_io

If you’re an SEO professional you might be working on a site that redirects the home page to a subdirectory, for example redirecting to a country specific version of the site. Or, using placeholder content linking to the main site’s home page in a subdirectory.

In both those cases you could be struggling to convince your client or colleagues to follow best practices. If so, then this article is for you.

I will show you a way to communicate to C-suite that makes sense to them technically and commercially by providing data-driven examples.

To do this, I will show you how to use Python to calculate the TIPR of all site pages to provide a comparative before-and-after scenario to justify your requested changes.

We will cover:

First of all, let’s discuss why the home page should be merged with the root.

Hosting Placeholder Pages On The Root and Root Redirects

Some sites host a placeholder page on the root URL or, worse, redirect the root to a subdirectory or page. 

Many global brands, especially those in the fashion sector, will operate multi-regional sites where each regional store resides in their regional folder.

For example, if I went to Gucci, I’d expect to find the following stores:

…and so on.

In Gucci’s case, not only is there a regional folder, but there’s also a language folder, which is all very logical.

Because I reside in London, the root folder https://www.gucci.com/ redirects me to the UK store.

A site search for Gucci.com (site:gucci.com) shows that the root folder is indexed and offers a regional store selection menu.

For many sites, the root folder will permanently redirect to their default or most popular regional store.

Why The Home Page Should Be Merged With The Root

Search engines use authority (a measure of a page’s probability of being discovered via hyperlinks) to determine their relative importance on the web. Therefore, the more authority a page has, the higher its rank position potential in the search results (SERPs).

Given most sites accrue the most links to their root URL, this is where the search engine rank power resides.

This isn’t ideal for the site architecture, as it means that all the product listing pages (PLPs) and product description pages (PDPs) are an extra hop away from the home page.

This extra hop sounds small; however, it’s not inconsequential, as we’ll illustrate now and quantify later.

Let’s visualize the link graph of sites note setting their home page in the root folder.

Below is a real site where its root URL has a page-level authority score (according to Ahrefs) of 40 PR, redirecting to its main English language store /en (21 PR) before linking to all of the PLPs and PDPs.

Root URL has a page level authority score (according to Ahrefs) of 40Image from author, April 2024

Naturally, all of the pages (blue) via the logo will link to their regional store home page (for their users) and other regional home pages (shown in pink) instead of linking the root URL, which artificially inflates the value of the regional home page.

Note the site pages in site level 2 (which are directly linked from the home page) have a page level rating of 19 PR and the other pages in site level 3 have 18 PR.

What also happens is that the pages are one step removed from the root URL and thus don’t receive all of the authority.

Think of the deterioration of musical sound quality when making copies of a copy instead of a copy of the original music.

That’s the experience your site is offering to search engines when they’re trying to evaluate the relative importance of your site content!

If the stores linked to the root URL, this would be undesirable as it would create a load of redirects sitewide, further wasting the distribution of authority.

The best practice approach would be to cut out the middle man by merging the root with the home page so that all site pages are one less hop removed, as shown below:

merging the root with the home pageImage from author, April 2024

Following the merge of the home page and the root URL, the home page PR is now 72, which is much closer to the site’s domain authority of 75 DR, and each of the pages got an additional 1 PR, increasing their potential to rank.

The Struggles Of Communicating The Benefits To Non-SEO Expert Leadership

To a non-SEO expert audience, such as your marketing and IT colleagues, this all sounds rather academic and abstract – and probably quite unbelievable.

Even if you used the diagrams above, they’re naturally more interested in the traffic impact, if not the revenue implications.

They probably have no idea of Google’s PageRank metric for measuring page authority and don’t care unless you provide the numbers.

Using Python To Estimate PageRank Uplift

Fortunately, with the power of data science, we can make those complex calculations in Python to estimate the new PR values following the best practice move to the root URL.

Take the PageRank formula:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

As explained in The Anatomy of a Large-Scale Hypertextual Web Search Engine by the founders of Google:

“We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. … Also C(A) is defined as the number of links going out of page A.

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.”

The main gist of the formula is that the amount of PageRank a URL (A) has is mainly determined by the PageRank (PR Ti) of the pages linking to it (Ti) and the number of internal links on those pages C(Ti).

The Python version of the PageRank formula may be found here.

As a thought experiment armed with the knowledge of the above formula, we’d expect:

  • The new home page to benefit from having all pages link to the root URL (PR Ti) shared along with the other outbound internal links C(Ti).
  • All of the site pages to benefit from their more powerful parent URL (the new merged home page in the root URL).

With the above in mind, all we need to do now is recalculate the sitewide impact of merging the /en folder with the root URL on the whole site, which is done in several phases:

  • Calculate TIPR of all site pages: As explained earlier in what data science can do for site architectures, while site auditing software gives the relative PageRank internally, this needs to be combined with the external PageRank from the internet using link intelligence tools like Ahrefs.
  • Calculate the new TIPR of the new home page: i.e. /en merged or migrated with the root URL.
  • Calculate the new TIPR of all subsequent and remaining pages on the website.

As shown in the diagrams above the new best practice configuration shows the new TIPR values of all pages.

Once the TIPR calculation steps are followed, your next job is to translate the technical benefits of SEO into the commercial impact to secure buy-in from your colleagues.

One outcome metric to model would be the organic search traffic as a function of TIPR. With sufficient data points (say 10,000), this can be achieved using machine learning (ML).

Your input would be the dataset prior to the TIPR recalculation where you’d feed the TIPR column and the search clicks (presumably joined on from Google Search Console).

The chart below visualizes the relationship between TIPR and clicks.

Relationship between TIPR and clicks: Blue line modelImage from author, April 2024

The blue line is an approximate model indicating how many more clicks a page would receive with an increase in unit PageRank.

Mathematicians are fond of saying, “All models are wrong but some are useful.” However, the science can be quite persuasive in providing some credibility to your forecasted uplifts using the Python predict() function using your ML model. You can find an example here.

In the above case, we see that up to 20 TIPR, there’s a 0.35 visits per month traffic uplift per page, and beyond 20 TIPR, it’s 0.75 visits.

Using A Data-Driven Approach Is More Persuasive To C-Suite

This might not sound like much. However, aggregated across the hundreds of thousands of indexable URLs, we forecasted an additional 200,000 in monthly traffic for one client.

This forecast gave them the confidence and willingness to finally follow through on the repeated recommendation of setting the home page to root, which the company received from numerous SEO consultants.

The difference is the quantification, both technically and commercially.

By combining TIPR and applying the PageRank formula to simulate a before-and-after scenario for your technical SEO recommendation – in this case, setting the root URL as the home page – your SEO is data-driven and, more importantly, much more persuasive.

Not only technically but also commercially, which will help you implement more of your SEO recommendations and, hopefully, promote your career.

That aside, taking a data-driven approach can also help you sense-check your best practice recommendations based on ideas you’ve read online.

It’s true today as it was 20 years ago: The best SEO pros constantly test ideas rather than unquestioningly follow best practice dogma.

More resources:


Featured Image: BestForBest/Shutterstock