Google Search Console Data & BigQuery For Enhanced Analytics

Google Search Console is a great tool for SEO pros.

But as many of us know, using the interface exclusively comes with some limitations.

In the past, you often had to have specific knowledge or the help of a developer to overcome some of them by pulling the data from the API directly.

Google started offering a native Google Search Console (GSC) to what was Google Data Studio (now Looker Studio) connector around 2018.

This integration allows users to directly pull data from GSC into Looker Studio (Google Data Studio) to create customizable reports and dashboards without needing third-party connectors or additional API configurations.

But then, in February 2023, things got interesting.

Google now allows you to put in place an automated, built-in bulk data export to BigQuery, Google’s data warehouse storage solution.

Let’s get candid for a minute: most of us still rely on the GSC interface to do many of our activities.

This article will dive into why the bulk data export to BigQuery is a big deal.

Be warned: This is not a silver bullet that will solve all of the limitations we face as SEO pros. But it’s a great tool if you know how to set it up and use it properly.

Break Free From Data Constraints With BigQuery Bulk Exports

Initially, the bulk data export was meant for websites that received traffic to tens of thousands of pages and/or from tens of thousands of queries.

Data Volumes

Currently, you have three data export options beyond the BigQuery bulk data export:

  • Most of the reports in GSC allow you to export up to 1,000 rows.
  • You can get up to 50,000 rows via a Looker Studio integration.
  • With the API, you get up to 50,000 rows, enabling you to pull a few more elements beyond the performance data: URL Inspection, sitemaps, and sites’ data.

Daniel Waisberg, Search Advocate at Google, explains it this way:

“The most powerful way to export performance data is the bulk data export, where you can get the biggest amount of data.”

There are no row limits when you use the BigQuery bulk export.

BigQuery’s bulk data export allows you to pull all rows of data available in your GSC account.

This makes BigQuery much more suitable for large websites or SEO analyses requiring a complete dataset.

Data Retention

Google BigQuery enables unlimited data retention, allowing SEO pros to perform historical trend analyses that are not restricted by the 16-month data storage limit in Google Search Console.

Looker Studio and the API do not inherently offer this feature. This means you gain a real capacity to see evolutions over multiple years, and better understand and analyze progressions.

As a storage solution, BigQuery allows you to stock your data for as long as you wish and overcome this limitation.

The ability to retain and access unlimited historical data is a game-changer for SEO professionals for several reasons:

  • Comprehensive long-term analysis: Unlimited data retention means that SEO analysts can conduct trend analyses over extended periods. This is great news for those of us who want a more accurate assessment of how our SEO strategies are performing in the long term.
  • Seasonal and event-driven trends: If your website experiences seasonal fluctuations or events that cause periodic spikes in traffic, the ability to look back at longer historical data will provide invaluable insights.
  • Customized reporting: Having all of your data stored in BigQuery makes it easier to generate custom reports tailored to specific needs. You can create a report to answer virtually any question.
  • Improved troubleshooting: The ability to track performance over time makes it easier to identify issues, understand their root causes, and implement effective fixes.
  • Adaptability: Unlimited data retention gives you the flexibility to adapt your SEO strategies while maintaining a comprehensive historical perspective for context.

Data Caveats

Just like most data tracking tools, you won’t be surprised to learn that there is no retroactivity.

Keep in mind that the GSC bulk data export starts sending data daily to BigQuery only after you set it up. This means that you won’t be able to store and access the data before that.

It’s a “from this point forward” system, meaning you need to plan ahead if you want to make use of historical data later on. And even if you plan ahead, the data exports will start up to 48 hours later.

While the bulk data export does include significant metrics such as site and URL performance data, not all types of data are exported.

For example, coverage reports and other specialized reports available in GSC are not part of what gets sent to BigQuery.

Two primary tables are generated: searchdata_site_impression and searchdata_url_impression. The former aggregates data by property, so if two pages show up for the same query, it counts as one impression.

The latter table provides data aggregated by URL, offering a more granular view. In plain English, when you use Google Search Console’s bulk data export to BigQuery, two main tables are created:

  • searchdata_site_impression: This table gives you an overview of how your entire website is doing in Google Search. For example, if someone searches for “best sausage dog costume” and two pages from your website appear in the results, this table will count it as one “impression” (or one view) for your entire site rather than two separate views for each page.
  • searchdata_url_impression: This table is more detailed and focuses on individual web pages. Using the same example of “best sausage dog costume,” if two pages from your site show up in the search results, this table will count it as two separate impressions, one for each page that appears.

Another important element is that you are dealing with partitioned data tables. The data in BigQuery is organized into partition tables based on dates.

Each day’s data gets an update, and it’s crucial to be mindful of this when formulating your queries, especially if you want to keep your operations efficient.

If this is still a bit obscure for you, just remember that the data comes in daily and that it has an impact on how you go about things when doing data analysis.

Why Set This Up?

There are advantages to setting up BigQuery bulk exports:

Joining GSC Data With Other Data Sources

Getting the Google Search Console out in a data warehouse means that you can enjoy the advantages of joining it with other data sources (either directly in BigQuery or in your own data warehouse).

You could, for instance, blend data from the GSC and Google Analytics 4 and have more insightful information regarding conversions and behaviors driven by organic Google traffic.

Run Complex Calculations/Operations Using SQL

A solution such as BigQuery allows you to query your data in order to run complex calculations and operations to drive your analysis deeper.

Using SQL, you can segment, filter, and run your own formulas.

Anonymized Queries

BigQuery deals with anonymized queries differently from other ETL vendors that access the data via the API.

It aggregates all the metrics for the anonymized queries per site/URL per day.

It doesn’t just omit the rows, which helps analysts get complete sums of impressions and clicks when you aggregate the data.

What’s The Catch?

Unfortunately, no tool or solution is perfect. This new built-in integration has some downfalls. Here are the main ones:

It Means Developing Expertise Beyond SEO

You should get familiar with Google Cloud Platform, BigQuery, and SQL on top of your GSC knowledge.

Starting a bulk data export entails carrying out tasks in GSC but also Google Cloud.

An SQL-Based Platform Requiring Specific Expertise

With BigQuery, you need SQL to access and make the most of your data.

You therefore need to make SQL queries or have someone in-house to do it for you.

The platform also has its own way of functioning.

Using it efficiently requires knowing how to use it, which requires time and experience.

While Looker Studio does allow SQL-like data manipulation, it may not offer the full power and flexibility of SQL for complex analyses.

API data would need to be further processed to achieve similar results.

URL Impressions Contain More Anonymized Queries

“One thing to be mindful of is the difference in anonymized query volume between the  searchdata_url_impression table and the searchdata_site_impression table.

Like the GSC interface, some queries for particular URLs in particular countries might be so infrequent that they could potentially identify the searcher.

As a result, you’ll see a greater portion of anonymized queries in your searchdata_url_impression table than in your searchdata_site_impression table.” Source: Trevor Fox.

Potential Costs

Even though this feature is initially free, it might not be the case forever.

BigQuery is billed based on the amount of data stored in a project and the queries that you run.

The solution has thresholds from where you start to pay potentially each month.

Over time, it might then become costly – but it all depends on the amount of data exported (websites with many pages and queries will probably be heavier in that regard) and the queries you run to access and manipulate it.

How To Get Your GSC Data In BigQuery

1. Create A Google Cloud Project With BigQuery And Billing Enabled

The first step is to create a project in Google Cloud with BigQuery and billing enabled.

Access the Console. On the top left, click on the project you currently are in (or Select a project if you have none), this will open a popup.

Click on NEW PROJECT and follow the steps. Be careful when you choose the region because you will have to pick the same one when you set up the bulk export in the GSC.

This part is not spoken about very often. If you wish to query two datasets like GSC and GA4 data, they need to be in the same region.

“For some areas like Europe and North America, you can query across the wider continental region but in places like Australia you can’t query across Melbourne and Sydney.

Both datasets need to be in the exact same location”

Sarah Crooke, BigQuery Consultant at Melorium, Australia, said:

Once the project is created, go to the Billing section. Use the search bar at the top to find it. Google Cloud does not have the most user-friendly interface without the search bar.

You need to create a billing account. Piece of advice before you proceed: Take the time to investigate if you don’t already have a billing account set up by someone else in the company.

Once that’s done, you can assign the billing account to your project. You need a billing account in order to set up the bulk export.

Please follow the instructions provided by the Google Cloud documentation to do so.

Then, you need to go to the APIs & Services section (again, you can use the search bar to find it).

Look for the Bigquery API. Enable it for the project you created.

One more step: You need to add a user. This will enable Google Search Console to dump the data in BigQuery. Here is the official documentation to do this.

Let’s break it down quickly: 

  • Navigate in the sidebar to IAM and Admin. The page should say Permissions for project .
  • Click + GRANT ACCESS.
  • It will open a panel with Add principals.
  • In New Principals, put search-console-data-export@system.gserviceaccount.com
  • Select two roles: BigQuery Job User and BigQuery Data Editor. You can use the search bar to find them.
  • Save.

Lastly, select your project and copy the Cloud project ID associated with it.

You’re done in Google Cloud!

2. Setup The Bulk Data Export In The GSC Property Of Your Choice

Once the Google Cloud part is completed, you will need to activate the bulk data export to your new Google Cloud project directly in the Google Search Console.

To do so, go to the Settings section of the property you want to export data from and click on Bulk data export.

Paste the Cloud project ID of the project you created before. You can also customize the name of the dataset that the GSC will create in your project (it is “searchconsole” by default).

Lastly, pick the same dataset location that use chose for your Google Cloud project.

Once you are all set, click on Continue. The GSC will let you know if this initial setup is functional or not. The dataset will also be created in your project.

The data exports will start up to 48 hours later.

They are daily and include the data for the day of the setup. While API can be set to do scheduled pulls, it often requires additional programming.

This is why the bulk data export works for many big websites.

Keep in mind that the GSC can run into data export issues after this initial setup, in which case it is supposed to retry an export the following day.

We recommend you query your data in the first days to check if it is being stored properly.

So, What Next?

You can get started querying data now! Here are some things you can analyze that cannot be analyzed easily in another way:

  • Query multiple pages at once: In BigQuery, you can run a single SQL query to get metrics for all pages (or a subset of pages) without having to click through each one individually.
  • Traffic seasonality report: Compare performance metrics by season to identify trends and optimize campaigns accordingly.
  • Bulk analysis across multiple sites: If you manage a brand with more than one website, this allows you to look at clicks across all these sites at once.
  • Click-through rate (CTR) by page and query: Instead of just looking at the average CTR, you could calculate the CTR for each individual page and search query.

In summary

In summary, the built-in bulk data export feature from Google Search Console to Google’s BigQuery offers a more robust solution for data analytics in SEO.

However, there are limitations, such as the need to develop expertise in Google Cloud and SQL, and potential costs associated with BigQuery storage and queries.

More resources: 


Featured Image: Suvit Topaiboon/Shutterstock

New GA4 Shopper Journey Reports Are Powerful

New ecommerce-focused Google Analytics reports could help merchants identify issues with shopper journeys and the checkout process.

The Google Analytics 4 purchase journey report shows the funnel from when a would-be customer starts a web session through purchase. And the newer checkout journey report displays the checkout funnel in four detailed steps.

Both reports show where shoppers abandon the buying journey. The data could identify problems in a store’s navigation or point out opportunities for tests or optimization.

Checkout Journey Report

Found under Reports > Monetization in GA4, the checkout journey report shows the number and percentage of users who start the checkout process on an ecommerce site or app and then complete each subsequent step. 

Screenshot of a GA4 checkout journey report.Screenshot of a GA4 checkout journey report.

GA4 introduced the checkout journey report in September 2023. Click image to enlarge.

It uses a closed funnel method, focusing only on shoppers who started at the “Begin checkout” step.

The report can identify bottlenecks or issues in the checkout flow. For example, a business with a massive drop-off after the shipping step could have a technical or pricing problem with the available delivery options.

Checkout Steps

For the checkout journey report, data is collected from four GA4 events.

  • begin_checkout for the “Begin checkout” step.
  • add_shipping_info for the “Add shipping” step.
  • add_payment_info for the “Add payment” step.
  • purchase for the “Purchase” step.

Merchants will add these events to the checkout flow. Google offers implementation instructions for desktop and mobile devices to ensure the proper GA4 event is specified at each checkout stage.

gtag("event", "begin_checkout", {...

Checkout Dimensions

When its report loads, the checkout journey will show the flow by “device category,” i.e., mobile or desktop. But the report may also include other dimensions. Here is the complete list:

  • Device category,
  • Country,
  • Region,
  • City,
  • Language,
  • Age,
  • Gender,
  • Browser.

Thus marketers can find checkout glitches for a specific group of shoppers.

Screenshot of a table in a checkout journey reportScreenshot of a table in a checkout journey report

The table in the checkout journey report is initially divided by device category, but other dimensions are available. Click image to enlarge.

Purchase Journey Report

The GA4 purchase journey report provides insights into shopper drop-offs at each step of the funnel, although with less detail during the checkout. The two reports can work together to identify optimization opportunities.

The purchase journey report is located in GA4 at Reports > Monetization.

Screenshot of purchase journey reportScreenshot of purchase journey report

The purchase journey report shows buyers’ actions from session start to purchase, although the report for checkout journeys is more detailed. Click image to enlarge.

Purchase Steps

The purchase journey report gathers data from five GA4 events:

  • session_start marks the start of a session.
  • view_item is triggered when a product is viewed.
  • add_to_cart is triggered when an item is added to the cart.
  • begin_checkout for the “Begin checkout” step.
  • purchase or in_app_purchase for the “Purchase” step.

Like the checkout journey report, the GA4 events for collecting purchase journey data will need to be set up and added to the appropriate pages of an ecommerce website. 

Some ecommerce platforms include some or most of these events in their GA4 integration. For example, Shopify’s GA4 integration automatically adds the view_item,  add_to_cartpurchase, and several other events.

Purchase Dimensions

As of October 2023, the purchase journey report had the following analytics dimensions, two fewer (age and gender) than the checkout journey report:

  • Device category,
  • Country,
  • Region,
  • City,
  • Language,
  • Browser.

Using the Reports

The reports and funnels described here are helpful only if ecommerce marketers use them to optimize the buyer’s journey. Here’s an example.

Monitor the funnel. Start by watching the funnel, paying attention to how promotions, days of the week, or holidays impact buyers’ journeys.

Find drop-offs. Any point in the purchase or checkout funnel where shoppers drop off is a potential problem or opportunity.

Investigate. Examine drop-off points and hypothesize on the underlying cause. For example, loads of shoppers leaving at the shipping step could indicate the cost is too expensive relative to the average order value.

Take action. Address the potential cause of a drop-off and implement a solution. This could be a test at the add_shipping_info step that reduces the shipping cost.

Watch and iterate. Did the test work? Are more shoppers arriving at the next step? Measure the results. If the drop-off continues, develop a new hypothesis and solution.

Keep optimizing. Include optimizing the buyer’s journey as a routine marketing activity. 

Can’t Find the Reports?

Add purchase journeys and checkout journeys from the report library if they do not appear under Reports > Monetization