OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model via @sejournal, @martinibuster

Revelations that OpenAI secretly funded and had access to the FrontierMath benchmarking dataset are raising concerns about whether it was used to train its reasoning o3 AI reasoning model, and the validity of the model’s high scores.

In addition to accessing the benchmarking dataset, OpenAI funded its creation, a fact that was withheld from the mathematicians who contributed to developing FrontierMath. Epoch AI belatedly disclosed OpenAI’s funding only in the final paper published on Arxiv.org, which announced the benchmark. Earlier versions of the paper omitted any mention of OpenAI’s involvement.

Screenshot Of FrontierMath Paper

Closeup Of Acknowledgement

Previous Version Of Paper That Lacked Acknowledgement

OpenAI 03 Model Scored Highly On FrontierMath Benchmark

The news of OpenAI’s secret involvement are raising questions about the high scores achieved by  the o3 reasoning AI model and causing disappointment with the FrontierMath project. Epoch AI responded with transparency about what happened and what they’re doing to check if the o3 model was trained with the FrontierMath dataset.

Giving OpenAI access to the dataset was unexpected because the whole point of it is to  test AI models but that can’t be done if the models know the questions and answers beforehand.

A post in the r/singularity subreddit expressed this disappointment and cited a document that claimed that the mathematicians didn’t know about OpenAI’s involvement:

“Frontier Math, the recent cutting-edge math benchmark, is funded by OpenAI. OpenAI allegedly has access to the problems and solutions. This is disappointing because the benchmark was sold to the public as a means to evaluate frontier models, with support from renowned mathematicians. In reality, Epoch AI is building datasets for OpenAI. They never disclosed any ties with OpenAI before.”

The Reddit discussion cited a publication that revealed OpenAI’s deeper involvement:

“The mathematicians creating the problems for FrontierMath were not (actively)[2] communicated to about funding from OpenAI.

…Now Epoch AI or OpenAI don’t say publicly that OpenAI has access to the exercises or answers or solutions. I have heard second-hand that OpenAI does have access to exercises and answers and that they use them for validation.”

Tamay Besiroglu (LinkedIn Profile), associated director at Epoch AI, acknowledged that OpenAI had access to the datasets but also asserted that there was a “holdout” dataset that OpenAI didn’t have access to.

He wrote in the cited document:

“Tamay from Epoch AI here.

We made a mistake in not being more transparent about OpenAI’s involvement. We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible. Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset. We own this error and are committed to doing better in the future.

Regarding training usage: We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.

OpenAI has also been fully supportive of our decision to maintain a separate, unseen holdout set—an extra safeguard to prevent overfitting and ensure accurate progress measurement. From day one, FrontierMath was conceived and presented as an evaluation tool, and we believe these arrangements reflect that purpose. “

More Facts About OpenAI & FrontierMath Revealed

Elliot Glazer (LinkedIn profile/Reddit profile), the lead mathematician at Epoch AI confirmed that OpenAI has the dataset and that they were allowed to use it to evaluate OpenAI’s o3 large language model, which is their next state of the art AI that’s referred to as a reasoning AI model. He offered his opinion that the high scores obtained by the o3 model are “legit” and that Epoch AI is conducting an independent evaluation to determine whether or not o3 had access to the FrontierMath dataset for training, which could cast the model’s high scores in a different light.

He wrote:

“Epoch’s lead mathematician here. Yes, OAI funded this and has the dataset, which allowed them to evaluate o3 in-house. We haven’t yet independently verified their 25% claim. To do so, we’re currently developing a hold-out dataset and will be able to test their model without them having any prior exposure to these problems.

My personal opinion is that OAI’s score is legit (i.e., they didn’t train on the dataset), and that they have no incentive to lie about internal benchmarking performances. However, we can’t vouch for them until our independent evaluation is complete.”

Glazer had also shared that Epoch AI was going to test o3 using a “holdout” dataset that OpenAI didn’t have access to, saying:

“We’re going to evaluate o3 with OAI having zero prior exposure to the holdout problems. This will be airtight.”

Another post on Reddit by Glazer described how the “holdout set” was created:

“We’ll describe the process more clearly when the holdout set eval is actually done, but we’re choosing the holdout problems at random from a larger set which will be added to FrontierMath. The production process is otherwise identical to how it’s always been.”

Waiting For Answers

That’s where the drama stands until the Epoch AI evaluation is completed which will indicate whether or not OpenAI had trained their AI reasoning model with the dataset or only used it for benchmarking it.

Featured Image by Shutterstock/Antonello Marangi

Confirmed: Google Is Requiring JavaScript To Block SEO Tools via @sejournal, @martinibuster

Google has made a change to how it’s search results are served which will also help to secure it against bots and scrapers. Whether this will have further effect on SEO Tools or if they can use a headless Chrome that uses JavaScript remains an open question at the moment but it’s likely that Google is using rate limiting to throttle how many pages can be requested within a set period of time.

Google Search Now Requires JavaScript

Google quietly updated their search box to require all users, including bots, to have JavaScript turned on when searching.

Surfing Google Search without JavaScript turned on results in the following message:

Turn on JavaScript to keep searching
The browser you’re using has JavaScript turned off. To continue your search, turn it on.

Screenshot Of Google Search JavaScript Message

In an email to TechCrunch a Google spokesperson shared the following details:

“Enabling JavaScript allows us to better protect our services and users from bots and evolving forms of abuse and spam, …and to provide the most relevant and up-to-date information.”

JavaScript possibly enables personalization in the search experience, which is what that spokesperson may mean by providing the most relevant information. But JavaScript can also be used for blocking bots.

Using the latest version of Chrome I copied some of the JavaScript and ran it through ChatGPT to ask what it does. One part of it may relate to restricting abusive requests for documents.

Screenshot Of Chrome Dev Tools

ChatGPT gave me the following feedback:

“Core Functionalities
Randomized Value Generation (rdb)

Generates a random value based on properties (D_d, idc, and p4b) of the input object a, constrained by p7d.
This may be used for rate-limiting, exponential backoff, or similar logic.

Purpose and Context
From its components, the script:

Likely handles request retries or access control for web resources.

Implements a policy enforcement system, where:

Policies determine if requests are valid.

Errors are logged and sometimes retried based on rules.

Randomized delays or limits might control the retry mechanism.

Appears optimized for error handling and resilience in distributed or high-traffic systems, possibly within a Google service or API.”

ChatGPT said that the code may use rate-limiting which is a way to limit the number of actions a user or a system can take within a specific time period.

Rate-Limiting:

Used to enforce a limit on the number of actions (e.g., API requests) a user or system can perform within a specific time frame.
In this code, the random values generated by rdb could be used to introduce variability in when or how often requests are allowed, helping to manage traffic effectively.

Exponential Backoff:

ChatGPT explained that exponential backoff is a way to limit the amount of retries for a failed action a user or system is allowed to make. The time period between retries for a failed action increases exponentially.

Similar Logic:

ChatGPT explained that random value generation could be used to manage access to resources to prevent abusive requests.

I don’t know for certain that this is what that specific JavaScript is doing, that’s what ChatGPT explained and it definitely matches the information that Google shared that they are using JavaScript as part of their strategy for blocking bots.

Google Workspace Support: Unclear If Opting Out AI Features Avoids Price Hike via @sejournal, @MattGSouthern

Google has made its AI-powered features in Gmail, Docs, Sheets, and Meet free for all Workspace users, but questions remain around pricing adjustments and feature visibility for specific accounts.

AI Now Included Without Extra Cost

Google announced that its full suite of AI tools, previously available only through the $20-per-user-per-month Gemini for Workspace plan, is now included in its standard offerings at no additional charge.

AI capabilities like automated email summaries, meeting note-taking, spreadsheet design suggestions, and the Gemini chatbot are now accessible to all customers.

However, this announcement comes with a catch: Workspace plans will see a $2 price hike per user per month.

The new pricing structure raises the base cost of the Workspace Business Standard plan from $12 to $14 per user, effective immediately for new customers.

Starting March 17, existing customers will see the change reflected. Small business accounts are currently exempt from this adjustment.

Confusion Over Pricing & Settings

While the price increase has been widely reported, Google Workspace support has offered additional clarification, indicating that it may not apply to all users.

According to support representatives, it’s unclear whether organizations that opt out of AI features will still face the increased costs. Official guidance on this matter has yet to be issued, leaving many customers uncertain.

Screenshot from Google support chat, January 2025.

Chats between Google Workspace reps and the Search Engine Journal development team reveal that opting out of AI features isn’t straightforward.

The settings to turn off AI features like Gemini aren’t visible by default for business accounts.

Administrators must contact Google support to enable access to these settings. For enterprise customers, the settings are accessible directly within the Workspace admin console.

Competitive Push Against Microsoft

Google’s move to bundle AI features into its standard Workspace offerings mirrors Microsoft’s recent decision to integrate its Copilot Pro AI tools into the standard Microsoft 365 subscription.

Both companies aim to attract more users to their AI-powered productivity platforms while simplifying pricing structures.

Key Takeaways

For organizations using Google Workspace, here are the critical points to note:

  1. AI Features Are Enabled by Default: Gemini and other AI tools will be active for most accounts unless explicitly disabled.
  2. Opt-Out Process Is Complicated: Business account holders must contact Google support to access and change the AI feature settings. Enterprise accounts can manage these settings directly.
  3. Pricing Uncertainty: It’s unclear whether the $2 price increase will still apply if you opt out of AI tools. Google has stated that further updates on this issue are forthcoming.

Businesses are advised to monitor their Workspace settings closely and contact Google support for clarification.

Google Causes Global SEO Tool Outages via @sejournal, @martinibuster

Google cracked own on web scrapers that harvest search results data, triggering global outages at many popular rank tracking tools like SEMRush that depend on providing fresh data from search results pages.

What happens if Google’s SERPs are completely blocked? A certain amount of data provided by tracking services have long been extrapolated by algorithms from a variety of data sources. It’s possible that one way around the current block is to extrapolate the data from other sources.

SERP Scraping Prohibited By Google

Google’s guidelines have long prohibited automated rank checking in the search results but apparently Google has also allowed many companies to scrape their search results and charge for accessing ranking data for the purposes of tracking keywords and rankings.

According to Google’s guidelines:

“Machine-generated traffic (also called automated traffic) refers to the practice of sending automated queries to Google. This includes scraping results for rank-checking purposes or other types of automated access to Google Search conducted without express permission. Machine-generated traffic consumes resources and interferes with our ability to best serve users. Such activities violate our spam policies and the Google Terms of Service.”

Blocking Scrapers Is Complex

It’s highly resource intensive to block scrapers, especially because they can respond to blocks by doing things like changing their IP address and user agent to get by any blocks. Another way to block scrapers is through targeting specific behaviors like how many pages are requested by a user. Excessive amounts of page requests can trigger a block. The problem to that approach is that it can become resource intensive keeping track of all the blocked IP addresses which can quickly number in the millions.

Reports On Social Media

A post in the private SEO Signals Lab Facebook Group announced that Google was striking hard against web scrapers, with one member commenting that the Scrape Owl tool wasn’t working for them while others cited that SEMRush’s data has not updated.

Another post, this time on LinkedIn, noted multiple tools that weren’t refreshing their content but it also noted that the blocking hasn’t affected all data providers, noting that Sistrix and MonitorRank were still working. Someone from a company called HaloScan reported that they made adjustments to resume scraping data from Google and have recovered and someone else reported that another tool called MyRankingMetrics is still reporting data.

So whatever Google is doing it’s not currently affecting all scrapers. It may be that Google is targeting certain scraping behavior, learning from the respones and improving their blocking ability. The coming weeks may reveal that Google is improving its ability to block scrapers or it’s only targeting the biggest ones.

Another post on LinkedIn speculated that blocking may result in higher resources and fees charged to end users of SaaS SEO tools. They posted:

“This move from Google is making data extraction more challenging and costly. As a result, users may face higher subscription fees. “

Ryan Jones tweeted:

“Google seems to have made an update last night that blocks most scrapers and many APIs.

Google, just give us a paid API for search results. we’ll pay you instead.”

No Announcement By Google

So far there has not been any announcement by Google but it may be that the chatter online may force someone at Google to consider making a statement.

Featured Image by Shutterstock/Krakenimages.com

Google Study: 29% In The U.S. & Canada Used AI Last Year via @sejournal, @MattGSouthern

A new Google-Ipsos report shows AI adoption is increasing globally, especially in emerging markets.

However, the study reveals challenges like regional divides, gender disparities, and slower adoption in developed countries.

Critics, including Nate Hake, founder of Travel Lemming, point out how Google overlooks these challenges in its report coverage.

While optimism around AI is rising, it’s not resonating with everyone.

Here’s a closer look at the report and what the numbers indicate.

AI Is Growing, But Unevenly

Globally, 48% of people used generative AI last year, with countries like Nigeria, Mexico, and South Africa leading adoption. These regions also show the most excitement about AI’s potential to boost economies and improve lives.

Adoption lags at 29% in developed nations like the U.S. and Canada, meaning that 71% of people in these regions haven’t knowingly engaged with generative AI tools.

Screenshot: Google-Ipsos Study ‘Our life with AI: From innovation to application,’ January 2025.

Optimism Outweighs Concerns

Globally, 57% of people are excited about AI, compared to 43% who are concerned—a shift from the year prior, when excitement and concerns were evenly split.

People cite AI’s potential in science (72%) and medicine (71%) as reasons for their optimism. Respondents see opportunities for breakthroughs in healthcare and research.

However, in the U.S., skepticism lingers—only 52% believe AI will directly benefit “people like them,” compared to the global average of 59%.

Gender Gaps Persist

The report highlights a gender gap in AI usage: 55% of global AI users are men compared to 45% women.

The disparity is even bigger in workplace adoption, where 41% of professional AI users are women.

Emerging Markets Are Leading the Way

Emerging markets are using AI more and are more optimistic about its potential.

In regions like Nigeria and South Africa, people are more likely to believe AI will transform their economies.

Meanwhile, developed countries like the U.S. and U.K. remain cautious.

Only 53% of Americans prioritize AI innovation, compared to much higher enthusiasm in emerging markets.

Non-Generative AI

While generative AI tools like chatbots and content generators grab headlines, the public is more appreciative of non-generative AI applications.

These include AI for healthcare, fraud detection, flood forecasting, and other practical, high-impact use cases.

Generative AI, on the other hand, gets mixed reviews.

Writing, summarizing, or customer service applications don’t resonate as strongly with the public as AI’s potential to tackle bigger societal issues.

AI at Work: Young, Affluent, and Male-Dominated

AI is making its way into the workplace. 74% of AI users use it professionally for writing, brainstorming, and problem-solving tasks.

However, workplace AI adoption is skewed toward younger, wealthier, and male workers.

Blue-collar workers and older professionals are catching up—67% of blue-collar AI users and 68% of workers aged 50-74 use AI at work—but the gender gap remains pronounced.

Trust in AI Is Growing

Trust in AI governance is improving, with 61% of people confident their governments can regulate AI responsibly (up from 57% in 2023).

72% support collaboration between governments and companies to manage AI’s risks and maximize its benefits.

Takeaway

AI use is growing worldwide, though many people in North America still see little reason to use it.

To increase AI’s adoption, companies must build trust and clearly communicate the technology’s benefits.

For more details, check out the full report at Google Public Policy.


Featured Image: Stokkete/Shutterstock

Evidence That Google Detects AI-Generated Content via @sejournal, @martinibuster

A sharp-eyed Australian SEO spotted indirect confirmation about Google’s use of AI detection as part of search rankings that was hiding in plain sight for years. Although Google is fairly transparent about content policies, the new data from a Googler’s LinkedIn profile adds a little more detail.

Gagan Ghotra tweeted:

“Important FYI Googler Chris Nelson from Search Quality team his LinkedIn says He manages global team that build ranking solutions as part of Google Search ‘detection and treatment of AI generated content’.”

Googler And AI Content Policy

The Googler, Chris Nelson, works at Google in the Search Ranking department and is listed as co-author of Google’s guidance on AI-generated content, which makes knowing a little bit about him

The relevant work experience at Google is listed as:

“I manage a large, global team that builds ranking solutions as part of Google Search and direct the following areas:

-Prevent manipulation of ranking signals (e.g., anti-abuse, spam, harm)
-Provide qualitative and quantitative understanding of quality issues (e.g., user interactions, insights)
-Address novel content issues (e.g., detection and treatment of AI-generated content)
-Reward satisfying, helpful content”

There are no search ranking related research papers or patents listed under his name but that’s probably because his educational background is in business administration and economics.

What may be of special interest to publishers and digital marketers are the following two sections:

1. He lists addressing “detection and treatment of AI-generated content”

2. He provides “qualitative and quantitative understanding of quality issues (e.g., user interactions, insights)”

While the user interaction and insights part might seem unrelated to the detection and treatment of AI-generated content, the user interactions and insights part is in the service of understanding search quality issues, which is related.

His role is defined as evaluation and analysis of quality issues in Google’s Search Ranking department. “Quantitative understanding” refers to analyzing data and “qualitative understanding” is a more subjective part of his job that may be about insights, understanding the “why” and “how” of observed data.

Co-Author Of Google’s AI-Generated Content Policy

Chris Nelson is listed as a co-author of Google’s guidance on AI-generated content. The guidance doesn’t prohibit the use of AI for published content, suggesting that it shouldn’t be used to create content that violates Google’s spam guidelines. That may sound contradictory because AI is virtually synonymous with scaled automated content which has historically been considered spam by Google.

The answers are in the nuance of Google’s policy, which encourages content publishers to prioritize user-first content instead of a search-engine first approach. In my opinion, putting a strong focus on writing about the most popular search queries in a topic, instead of writing about the topic, can lead to search engine-first content as that’s a common approach of sites I’ve audited that contained relatively high quality content but lost rankings in the 2024 Google updates.

Google (and presumably Chris Nelson’s advice) for those considering AI-generated content is:

“…however content is produced, those seeking success in Google Search should be looking to produce original, high-quality, people-first content demonstrating qualities E-E-A-T.”

Why Doesn’t Google Ban AI-Generated Content Outright?

Google’s documentation that Chris Nelson co-authored states that automation has always been a part of publishing, such as dynamically inserting sports scores, weather forecasts, scaled meta descriptions and date-dependent content and products related to entertainment.

The documentation states:

“…For example, about 10 years ago, there were understandable concerns about a rise in mass-produced yet human-generated content. No one would have thought it reasonable for us to declare a ban on all human-generated content in response. Instead, it made more sense to improve our systems to reward quality content, as we did.

…Automation has long been used to generate helpful content, such as sports scores, weather forecasts, and transcripts. …Automation has long been used in publishing to create useful content. AI can assist with and generate useful content in exciting new ways.”

Why Does Googler Detect AI-Generated Content?

The documentation that Nelson co-authored doesn’t explicitly states that Google doesn’t differentiate between how low quality content is generated, which seemingly contradicts his LinkedIn profile that states “detection and treatment of AI-generated content” is a part of his job.

The AI-generated content guidance states:

“Poor quality content isn’t a new challenge for Google Search to deal with. We’ve been tackling poor quality content created both by humans and automation for years. We have existing systems to determine the helpfulness of content. …Our systems continue to be regularly improved.”

How do we reconcile that part of his job is detecting AI-generated content and Google’s policy states that it doesn’t matter how low quality content is generated?

Context is everything, that’s the answer. Here’s the context of his work profile:

“Address novel content issues (e.g., detection and treatment of AI-generated content)”

The phrase “novel content issues” means content quality issues that haven’t previously been encountered by Google. This refers to new types of AI-generated content, presumably spam, and how to detect it and “treat” it. Given that the context is “detection and treatment” it could very well be that the context is “low quality content” but it wasn’t expressly stated because he probably didn’t think his LinkedIn profile would be parsed by SEOs for a better understanding of how Google detects and treats AI-generated content (meta!).

Guidance Authored By Chris Nelson Of Google

A list of articles published by Chris Nelson show that he may have played a role in many of the most important updates from the past five years, from the Helpful Content update, site reputation abuse to detecting search-engine first AI-generated content.

List of Articles Authored By Chris Nelson (LinkedIn Profile)

Updating our site reputation abuse policy

What web creators should know about our March 2024 core update and new spam policies

Google Search’s guidance about AI-generated content

What creators should know about Google’s August 2022 helpful content update

Featured Image by Shutterstock/3rdtimeluckystudio

Google Rejects EU’s Call For Fact-Checking In Search & YouTube via @sejournal, @MattGSouthern

Google has reportedly told the EU it won’t add fact-checking to search results or YouTube videos, nor will it use fact-checks to influence rankings or remove content.

This decision defies new EU rules aimed at tackling disinformation.

Google Says No to EU’s Disinformation Code

In a letter to Renate Nikolay of the European Commission, Google’s global affairs president, Kent Walker, said fact-checking “isn’t appropriate or effective” for Google’s services.

The EU’s updated Disinformation Code, part of the Digital Services Act (DSA), would require platforms to include fact-checks alongside search results and YouTube videos and to bake them into their ranking systems.

Walker argued Google’s current moderation tools—like SynthID watermarking and AI disclosures on YouTube—are already effective.

He pointed to last year’s elections as proof Google can manage misinformation without fact-checking.

Google also confirmed it plans to fully exit all fact-checking commitments in the EU’s voluntary Disinformation Code before it becomes mandatory under the DSA.

Context: Major Elections Ahead

This refusal from Google comes ahead of several key European elections, including:

  • Germany’s Federal Election (Feb. 23)
  • Romania’s Presidential Election (May 4)
  • Poland’s Presidential Election (May 18)
  • Czech Republic’s Parliamentary Elections (Sept.)
  • Norway’s Parliamentary Elections (Sept. 8)

These elections will likely test how well tech platforms handle misinformation without stricter rules.

Tech Giants Backing Away from Fact-Checking

Google’s decision follows a larger trend in the industry.

Last week, Meta announced it would end its fact-checking program on Facebook, Instagram, and Threads and shift to a crowdsourced model like X’s (formerly Twitter) Community Notes.

Elon Musk has drastically reduced moderation efforts on X since buying the platform in 2022.

What It Means

As platforms like Google and Meta move away from active fact-checking, concerns are growing about how misinformation will spread—especially during elections.

While tech companies say transparency tools and user-driven features are enough, critics argue they’re not doing enough to combat disinformation.

Google’s pushback signals a growing divide between regulators and platforms over how to manage harmful content.


Featured Image: Wasan Tita/Shutterstock

.AI Domain Migrated To A More Secure Platform via @sejournal, @martinibuster

The Dot AI domain has migrated to a new domain name registry, giving all registrants of .AI domains stronger security and more stability, with greater protection against outages.

Dot AI Domain

.AI is a country-code top-level domain (ccTLD), which is distinct from a gTLD. A CCTLD is a two letter domain that is reserved for a specific country, like .US is reserved for the United States of America. .AI is reserved for the British Overseas Territory in the Caribbean, Anguilla.

.AI Is Now Handled By Identity Digital

The .AI domain was previously handled by a local small business named DataHaven.net but has now fully migrated to the Identity Digital platform, giving the .AI domain availability from over 90% of all registrars worldwide and a 100% availability guarantee. The migration also provides fast distribution of the .AI domain in milliseconds and greater resistance to denial of service attacks.

According to the announcement:

“Beginning today, .AI is exclusively being served on the Identity Digital platform, and we couldn’t be more thrilled for what this means for Anguilla.

The quick migration brings important enhancements to the .AI TLD like 24/7 global support, and a growing list of features that will benefit registrars, businesses and entrepreneurs today and in the years to come.”

Read the full announcement:

.ai Completes a Historic Migration to the Identity Digital Platform

Featured Image by Shutterstock/garagestock

Google Shopping Rankings: Key Factors For Retailers via @sejournal, @MattGSouthern

A new study analyzing 5,000 Google Shopping keywords sheds light on the factors that correlate with higher rankings.

The research, conducted by Jeff Oxford, Founder of 180 Marketing, reveals trends that could help ecommerce stores improve their visibility in Google’s free Shopping listings.

Amazon Dominates Google Shopping

Amazon ranks in the #1 position for 52% of Google Shopping searches, outpacing Walmart (6%) and Home Depot (3%).

Beyond Amazon’s dominance, the study found a strong correlation between website authority and rankings, with higher-ranking sites often belonging to well-established brands.

Takeaway: Building your brand and earning trust is vital to ranking well on Google Shopping.

Backlinks, Reviews, & Pricing

The study identified several trends that separate higher-ranking pages from the rest:

  • Referring Domains: Product pages in the top two positions had more backlinks than lower-ranking pages. Interestingly, most product pages analyzed (98%) had no backlinks at all.
  • Customer Reviews: Product pages with customer reviews ranked higher, and stores with star ratings below 3.5 struggled to rank well.
  • Pricing: Lower-priced products tended to rank higher, with top-performing listings often featuring prices below the category average.

Takeaway: Building backlinks, collecting customer reviews, and offering competitive pricing can make a difference.

Meta Descriptions A Top Signal

Among on-page factors, meta descriptions had the strongest correlation with rankings.

Pages that included exact-match keywords in their meta descriptions consistently ranked higher.

While keyword usage in title tags and H1 headers showed some correlation, the impact was much smaller.

Takeaway: Optimize meta descriptions and product copy with target keywords to improve rankings.

Structured Data Findings

Structured data showed mixed results in the study.

Product structured data had little to no correlation with rankings, and Amazon, despite dominating the top spots, doesn’t use structured data on its product pages.

However, pages using review structured data performed better.

Takeaway: Focus on collecting customer reviews and using review structured data, which appears more impactful than product structured data.

Shipping & Returns Scores

Google Shopping evaluates stores on shipping, returns, and website quality metrics.

The study found that stores with “Exceptional” or “Great” scores for shipping and returns were more likely to rank higher, especially in the top 10 positions.

Takeaway: Prioritize fast shipping and clear return policies to boost your Google Shopping scores.

What Does This Mean?

According to these findings, success in Google Shopping correlates with strong customer reviews, competitive pricing, and fast service.

Optimizing for traditional SEO—like backlinks and well-written metadata—can benefit both organic search and Shopping rankings.

Retailers should prioritize the customer experience, as Google’s scoring for shipping, returns, and website quality affects visibility.

Lastly, remember that correlation doesn’t equal causation—test changes thoughtfully and focus on delivering value to your customers.