Robots.txt Turns 30: Google Highlights Hidden Strengths via @sejournal, @MattGSouthern

In a recent LinkedIn post, Gary Illyes, Analyst at Google, highlights lesser-known aspects of the robots.txt file as it marks its 30th year.

The robots.txt file, a web crawling and indexing component, has been a mainstay of SEO practices since its inception.

Here’s one of the reasons why it remains useful.

Robust Error Handling

Illyes emphasized the file’s resilience to errors.

“robots.txt is virtually error free,” Illyes stated.

In his post, he explained that robots.txt parsers are designed to ignore most mistakes without compromising functionality.

This means the file will continue operating even if you accidentally include unrelated content or misspell directives.

He elaborated that parsers typically recognize and process key directives such as user-agent, allow, and disallow while overlooking unrecognized content.

Unexpected Feature: Line Commands

Illyes pointed out the presence of line comments in robots.txt files, a feature he found puzzling given the file’s error-tolerant nature.

He invited the SEO community to speculate on the reasons behind this inclusion.

Responses To Illyes’ Post

The SEO community’s response to Illyes’ post provides additional context on the practical implications of robots.txt’s error tolerance and the use of line comments.

Andrew C., Founder of Optimisey, highlighted the utility of line comments for internal communication, stating:

“When working on websites you can see a line comment as a note from the Dev about what they want that ‘disallow’ line in the file to do.”

Screenshot from LinkedIn, July 2024.

Nima Jafari, an SEO Consultant, emphasized the value of comments in large-scale implementations.

He noted that for extensive robots.txt files, comments can “help developers and the SEO team by providing clues about other lines.”

Screenshot from LinkedIn, July 2024.

Providing historical context, Lyndon NA, a digital marketer, compared robots.txt to HTML specifications and browsers.

He suggested that the file’s error tolerance was likely an intentional design choice, stating:

“Robots.txt parsers were made lax so that content might still be accessed (imagine if G had to ditch a site, because someone borked 1 bit of robots.txt?).”

Screenshot from LinkedIn, July 2024.

Why SEJ Cares

Understanding the nuances of the robots.txt file can help you optimize sites better.

While the file’s error-tolerant nature is generally beneficial, it could potentially lead to overlooked issues if not managed carefully.

What To Do With This Information

  1. Review your robots.txt file: Ensure it contains only necessary directives and is free from potential errors or misconfigurations.
  2. Be cautious with spelling: While parsers may ignore misspellings, this could result in unintended crawling behaviors.
  3. Leverage line comments: Comments can be used to document your robots.txt file for future reference.

Featured Image: sutadism/Shutterstock

WordPress Takes Bite Out Of Plugin Attacks via @sejournal, @martinibuster

WordPress announced over the weekend that they were pausing plugin updates and initiating a force reset on plugin author passwords in order to prevent additional website compromises due to the ongoing Supply Chain Attack on WordPress plugins.

Supply Chain Attack

Hackers have been attacking plugins directly at the source using password credentials exposed in previous data breaches (unrelated to WordPress itself). The hackers are looking for compromised credentials used by plugin authors who use the same passwords across multiple websites (including passwords exposed in a previous data breach).

WordPress Takes Action To Block Attacks

Some plugins have been compromised by the WordPress community has rallied to clamp down on further plugin compromises by instituting a forced password reset and encouraging plugin authors to use 2 factor authentication.

WordPress also temporarily blocked all new plugin updates at the source unless they received team approval in order to make sure that a plugin is not being updated with malicious backdoors. By Monday WordPress updated their post to confirm that plugin releases are no longer paused.

The WordPress announcement on the forced password reset:

“We have begun to force reset passwords for all plugin authors, as well as other users whose information was found by security researchers in data breaches. This will affect some users’ ability to interact with WordPress.org or perform commits until their password is reset.

You will receive an email from the Plugin Directory when it is time for you to reset your password. There is no need to take action before you’re notified.”

A discussion in the comments section between a WordPress community member and the author of the announcement revealed that WordPress did not directly contact plugin authors who were identified as using “recycled” passwords because there was evidence that the list of users found in the data breach list whose credentials were in fact safe (false positives). WordPress also discovered that some accounts that were assumed to be safe were in fact compromised (false negatives). That is what led to to the current action of forcing password resets.

Francisco Torres of WordPress answered:

“You’re right that specifically reaching out to those individuals mentioning that their data has been found in data breaches will make them even more sensitive, but unfortunately as I’ve already mentioned that might be inaccurate for some users and there will be others that are missing. What we’ve done since the beginning of this issue is to individually notify those users that we’re certain have been compromised.”

Read the official WordPress announcement:

Password Reset Required for Plugin Authors

Featured Image by Shutterstock/Aleutie

Google Discusses Core Topicality Systems via @sejournal, @martinibuster

Google’s latest Search Off the Record shared a wealth of insights on how Google Search actually works. Google’s John Mueller and Lizzi Sassman spoke with Elizabeth Tucker, Director, Product Management at Google, who shared insights into the many systems that work together to rank web pages, including a mention of a topicality system.

Google And Topicality

The word “topicality” means how something is relevant in the present moment. But when used in search the word “topicality” is about matching the topic of a search query with the content on a web page. Machine learning models play a strong role in helping Google understand what users mean.

An example that Elizabeth Tucker mentions is BERT (Bidirectional Encoder Representations from Transformers) which is a language model that helps Google understand a word within the context of the words that come before and after it (it’s more, that’s a thumbnail explanation).

Elizabeth explains the importance of matching topically relevant content to a search query within the context of user satisfaction.

Googler Lizzi Sassman asked about user satisfaction and Tucker mentioned that there are many dimensions to search, with many systems, using as an example the importance of the concept of topical relevance.

Lizzi asked (at about the 4:20 minute mark):

“In terms of the satisfaction bit that you mentioned, are there more granular ways that we’re looking at? What does it mean to be satisfied when you come away from a search?”

Elizabeth answered:

“Absolutely, Lizzi. Inside Search Quality, we think about so many important dimensions of search. We have so many systems. Obviously we want to show content that’s topically relevant to your search. In the early days of Google Search, that was sometimes a challenge.

Our systems have gotten much better, but that is still sometimes, for especially really difficult searches, we can struggle with. People search in so many ways: Everything from, of course, typing in keywords, to speaking to Google and using normal everyday language. I’ve seen amazing searches. “Hey Google, who is that person who, years ago, did this thing, and I don’t remember what it was called.” You know, these long queries that are very vague. And it’s amazing now that we have systems that can even answer some of those.”

Takeaway:

An important takeaway from that exchange is that there are many systems working together, with topicality being just one of them. Many in the search marketing community tend to focus on the importance of one thing like Authority or Helpfulness but in reality there are many “dimensions” to search and it’s counterproductive to reduce the factors that go into search to one, two or three concepts.

Biases In Search

Google’s John Mueller asked Elizabeth about biases in search and if that’s something that Google thinks about and she answered that there are many kinds of biases that Google watches out for and tries to catch. Tucker explains the different kinds of search results that may be topically relevant (such as evergreen and fresh) and then explains how it’s a balance that Google focuses on getting correctly.

John asked (at the 05:24 minute mark):

“When you look at the data, I assume biases come up. Is that a topic that we think about as well?”

Elizabeth answered:

“Absolutely. There are all sorts of biases that we worry about when you’re looking for information. Are we disproportionately showing certain types of sites, are we showing more, I don’t know, encyclopedias and evergreen results or are we showing more fresh results with up-to-date information, are we showing results from large institutional sites, are we showing results from small blogs, are we showing results from social media platforms where we have everyday voices?

We want to make sure we have an appropriate mix that we can surface the best of the web in any shape or size, modest goals.”

Core Topicality Systems (And Many Others)

Elizabeth next reiterated that she works with many kinds of systems in search. This is something to keep in mind because the search community only knows about a few systems when in fact there are many, many more systems.

That means it’s important to not focus on just one, two or three systems when trying to debug a ranking problem but instead to keep an open mind that it might be something else entirely, not just helpfulness or EEAT or some other reasons.

John Mueller asked whether Google Search responds by demoting a site when users complain about certain search results.

She speaks about multiple things, including that most of the systems she works on don’t have anything to do with demoting sites. I want to underline how she mentions that she works with many systems and many signals (not just the handful of signals that the search marketing community tends to focus on).

One of those systems she mentions is the core topicality systems. What does that mean? She explains that it’s about matching the topic of the search query. She says “core topicality systems” so I that probably means multiple systems and algorithms.

John asked (at the 11:20 minute mark):

“When people speak up loudly, is the initial step to do some kind of a demotion where you say “Well, this was clearly a bad site that we showed, therefore we should show less of it”? Or how do you balance the positive side of things that maybe we should show more of versus the content we should show less of?”

Elizabeth answered:

“Yeah, that’s a great question. So I work on many different systems. It’s a fun part of my job in Search Quality. We have many signals, many systems, that all need to work together to produce a great search result page.

Some of the systems are by their nature demotative, and webspam would be a great example of this. If we have a problem with, say, malicious download sites, that’s something we would probably want to fix by trying to find out which sites are behaving badly and try to make sure users don’t encounter those sites.

Most of the systems I work with, though, actually are trying to find the good. An example of this: I’ve worked with some of our core topicality systems, so systems that try to match the topic of the query.

This is not so hard if you have a keyword query, but language is difficult overall. We’ve had wonderful breakthroughs in natural language understanding in recent years with ML
models, and so we want to leverage a lot of this technology to really make sure we understand people’s searches so that we can find content that matches that. This is a surprisingly hard problem.

And one of the interesting things we found in working on, what we might call, topicality, kind of a nerdy word, is that the better we’re able to do this, the more interesting and difficult searches people will do.”

How Google Is Focused On Topics In Search

Elizabeth returns to discussing Topicality, this time referring to it as the “topicality space” and how much effort Google has expended on getting this right. Of particular importance she highlights how Google  used to be very focused on keywords, with the clear implication that they’re not as focused on it any more, explaining the importance of topicality.

She discusses it at the 13:16 minute mark:

“So Google used to be very keyword focused. If you just put together some words with prepositions, we were likely to go wrong. Prepositions are very difficult or used to be for our systems. I mean, looking back at this, this is laughable, right?

But, in the old days, people would type in one, two, three keywords. When I started at Google, if a search had more than four words, we considered it long. I mean, nowadays I routinely see long searches that can be 10-20 words or more. When we have those longer searches, understanding what words are important becomes challenging.

For example, this was now years and years ago, maybe close to ten years ago, but we used to be challenged by searches that were questions. A classic example is “how tall is Barack Obama?” Because we wanted pages that would provide the answer, not just match the words how tall, right?

And, in fact, when our featured snippets first came about, it was motivated by this kind of problem. How can we match the answer, not just keyword match on the words in the question? Over the years, we’ve done a lot of work in, what we might call, the topicality space. This is a space that we continue to work in even now.”

The Importance Of Topics And Topicality

There are a lot to understand in Tucker’s answer, including that it may be helpful that, when thinking about Google’s search ranking algorithms, to also consider the core topicality systems which help Google understand search query topics and match those to web page content because it underlines the importance of thinking in terms of topics instead of focusing hard on ranking for keywords.

A common mistake I see is in what people who are struggling with ranking is they are strongly focused on keywords.  I’ve been encouraging an alternate approach for the past many years that stresses the importance of thinking in terms of Topics. That’s a multidimensional way to think of SEO. Optimizing for keywords is one dimensional. Optimizing for a topic is multidimensional and aligns with how Google Search is ranking web pages in that topicality is an important part of ranking.

Listen to the Search Off The Record podcast starting at about the 4:20 minute mark and then fast forward to the 11:20 minute mark:

Featured Image by Shutterstock/dekazigzag

Google’s E-E-A-T & The Myth Of The Perfect Ranking Signal via @sejournal, @MattGSouthern

Few concepts have generated as much buzz and speculation in SEO as E-E-A-T.

Short for Experience, Expertise, Authoritativeness, and Trustworthiness, this framework has been a cornerstone of Google’s Search Quality Evaluator Guidelines for years.

But despite its prominence, more clarity about how E-E-A-T relates to Google‘s ranking algorithms is still needed.

In a recent episode of Google’s Search Off The Record podcast, Search Director & Product Manager Elizabeth Tucker addressed this complex topic.

Her comments offer insights into how Google evaluates and ranks content.

No Perfect Match

One key takeaway from Tucker’s discussion of E-E-A-T is that no single ranking signal perfectly aligns with all four elements.

Tucker explained

“There is no E-E-A-T ranking signal. But this really is for people to remember it’s a shorthand, something that should always be a consideration, although, you know, different types of results arguably need different levels of E-E-A-T.”

This means that while Google’s algorithms do consider factors like expertise, authoritativeness, and trustworthiness when ranking content, there isn’t a one-to-one correspondence between E-E-A-T and any specific signal.

The PageRank Connection

However, Tucker did offer an example of how one classic Google ranking signal – PageRank – aligns with at least one aspect of E-E-A-T.

Tucker said:

“PageRank, one of our classic Google ranking signals, probably is sort of along the lines of authoritativeness. I don’t know that it really matches up necessarily with some of those other letters in there.”

For those unfamiliar, PageRank is an algorithm that measures the importance and authority of a webpage based on the quantity and quality of links pointing to it.

In other words, a page with many high-quality inbound links is seen as more authoritative than one with fewer or lower-quality links.

Tucker’s comments suggest that while PageRank may be a good proxy for authoritativeness, it doesn’t necessarily capture the other elements of E-E-A-T, like expertise or trustworthiness.

Why SEJ Cares

While it’s clear that E-E-A-T matters, Tucker’s comments underscore that it’s not a silver bullet to ranking well.

Instead of chasing after a mythical “E-E-A-T score,” websites should create content that demonstrates their expertise and builds user trust.

This means investing in factors like:

  • Accurate, up-to-date information
  • Clear sourcing and attribution
  • Author expertise and credentials
  • User-friendly design and navigation
  • Secure, accessible web infrastructure

By prioritizing these elements, websites can send strong signals to users and search engines about the quality and reliability of their content.

The E-E-A-T Evolution

It’s worth noting that E-E-A-T isn’t a static concept.

Tucker explained in the podcast that Google’s understanding of search quality has evolved over the years, and the Search Quality Evaluator Guidelines have grown and changed along with it.

Today, E-E-A-T is just one of the factors that Google considers when evaluating and ranking content.

However, the underlying principles – expertise, authoritativeness, and trustworthiness – will likely remain key pillars of search quality for the foreseeable future.

Listen to the full podcast episode below:


Featured Image: salarko/Shutterstock

Google Warns Of Soft 404 Errors And Their Impact On SEO via @sejournal, @MattGSouthern

In a recent LinkedIn post, Google Analyst Gary Illyes raised awareness about two issues plaguing web crawlers: soft 404 and other “crypto” errors.

These seemingly innocuous mistakes can negatively affect SEO efforts.

Understanding Soft 404s

Soft 404 errors occur when a web server returns a standard “200 OK” HTTP status code for pages that don’t exist or contain error messages. This misleads web crawlers, causing them to waste resources on non-existent or unhelpful content.

Illyes likened the experience to visiting a coffee shop where every item is unavailable despite being listed on the menu. While this scenario might be frustrating for human customers, it poses a more serious problem for web crawlers.

As Illyes explains:

“Crawlers use the status codes to interpret whether a fetch was successful, even if the contents of the page is basically just an error message. They might happily go back to the same page again and again wasting your resources, and if there are many such pages, exponentially more resources.”

The Hidden Costs Of Soft Errors

The consequences of soft 404 errors extend beyond the inefficient use of crawler resources.

According to Illyes, these pages are unlikely to appear in search results because they are filtered out during indexing.

To combat this issue, Illyes advises serving the appropriate HTTP status code when the server or client encounters an error.

This allows crawlers to understand the situation and allocate their resources more effectively.

Illyes also cautioned against rate-limiting crawlers with messages like “TOO MANY REQUESTS SLOW DOWN,” as crawlers cannot interpret such text-based instructions.

Why SEJ Cares

Soft 404 errors can impact a website’s crawlability and indexing.

By addressing these issues, crawlers can focus on fetching and indexing pages with valuable content, potentially improving the site’s visibility in search results.

Eliminating soft 404 errors can also lead to more efficient use of server resources, as crawlers won’t waste bandwidth repeatedly visiting error pages.

How This Can Help You

To identify and resolve soft 404 errors on your website, consider the following steps:

  1. Regularly monitor your website’s crawl reports and logs to identify pages returning HTTP 200 status codes despite containing error messages.
  2. Implement proper error handling on your server to ensure that error pages are served with the appropriate HTTP status codes (e.g., 404 for not found, 410 for permanently removed).
  3. Use tools like Google Search Console to monitor your site’s coverage and identify any pages flagged as soft 404 errors.

Proactively addressing soft 404 errors can improve your website’s crawlability, indexing, and SEO.


Featured Image: Julia Tim/Shutterstock

WordPress Plugin Supply Chain Attacks Escalate via @sejournal, @martinibuster

WordPress plugins continue to be under attack by hackers using stolen credentials (from other data breaches) to gain direct access to plugin code.  What makes these attacks of particular concern is that these supply chain attacks can sneak in because the compromise appears to users as plugins with a normal update.

Supply Chain Attack

The most common vulnerability is when a software flaw allows an attacker to inject malicious code or to launch some other kind of attack, the flaw is in the code. But a supply chain attack is when the software itself or a component of that software (like a third party script used within the software) is directly altered with malicious code. This creates the situation where the software itself is delivering the malicious files.

The United States Cybersecurity and Infrastructure Security Agency (CISA) defines a supply chain attack (PDF):

“A software supply chain attack occurs when a cyber threat actor infiltrates a software vendor’s network and employs malicious code to compromise the software before the vendor sends it to their customers. The compromised software then compromises the customer’s data or system.

Newly acquired software may be compromised from the outset, or a compromise may occur through other means like a patch or hotfix. In these cases, the compromise still occurs prior to the patch or hotfix entering the customer’s network. These types of attacks affect all users of the compromised software and can have widespread consequences for government, critical infrastructure, and private sector software customers.”

For this specific attack on WordPress plugins, the attackers are using stolen password credentials to gain access to developer accounts that have direct access to plugin code to add malicious code to the plugins in order to create administrator level user accounts at every website that uses the compromised WordPress plugins.

Today, Wordfence announced that additional WordPress plugins have been identified as having been compromised. It may very well be the case that there will be more plugins that are or will be compromised. So it’s good to understand what is going on and to be proactive about protecting sites under your control.

More WordPress Plugins Attacked

Wordfence issued an advisory that more plugins were compromised, including a highly popular podcasting plugin called PowerPress Podcasting plugin by Blubrry.

These are the newly discovered compromised plugins announced by Wordfence:

  • WP Server Health Stats (wp-server-stats): 1.7.6
    Patched Version: 1.7.8
    10,000 active installations
  • Ad Invalid Click Protector (AICP) (ad-invalid-click-protector): 1.2.9
    Patched Version: 1.2.10
    30,000+ active installations
  • PowerPress Podcasting plugin by Blubrry (powerpress): 11.9.3 – 11.9.4
    Patched Version: 11.9.6
    40,000+ active installations
  • Latest Infection – Seo Optimized Images (seo-optimized-images): 2.1.2
    Patched Version: 2.1.4
    10,000+ active installations
  • Latest Infection – Pods – Custom Content Types and Fields (pods): 3.2.2
    Patched Version: No patched version needed currently.
    100,000+ active installations
  • Latest Infection – Twenty20 Image Before-After (twenty20): 1.6.2, 1.6.3, 1.5.4
    Patched Version: No patched version needed currently.
    20,000+ active installations

These are the first group of compromised plugins:

  • Social Warfare
  • Blaze Widget
  • Wrapper Link Element
  • Contact Form 7 Multi-Step Addon
  • Simply Show Hooks

More information about the WordPress Plugin Supply Chain Attack here.

What To Do If Using A Compromised Plugin

Some of the plugins have been updated to fix the problem, but not all of them. Regardless of whether the compromised plugin has been patched to remove the malicious code and the developer password updated, site owners should check their database to make sure there are no rogue admin accounts that have been added to the WordPress website.

The attack creates administrator accounts with the user names of “Options” or “PluginAuth” so those are the user names to watch for. However, it’s probably a good idea to look for any new admin level user accounts that are unrecognized in case the attack has evolved and the hackers are using different administrator accounts.

Site owners that use the Wordfence free or Pro version of the Wordfence WordPress security plugin are notified if there’s a discovery of a compromised plugin. Pro level users of the plugin receive malware signatures for immediately detecting infected plugins.

The official Wordfence warning announcement about these new infected plugins advises:

“If you have any of these plugins installed, you should consider your installation compromised and immediately go into incident response mode. We recommend checking your WordPress administrative user accounts and deleting any that are unauthorized, along with running a complete malware scan with the Wordfence plugin or Wordfence CLI and removing any malicious code.

Wordfence Premium, Care, and Response users, as well as paid Wordfence CLI users, have malware signatures to detect this malware. Wordfence free users will receive the same detection after a 30 day delay on July 25th, 2024. If you are running a malicious version of one of the plugins, you will be notified by the Wordfence Vulnerability Scanner that you have a vulnerability on your site and you should update the plugin where available or remove it as soon as possible.”

Read more:

WordPress Plugins Compromised At The Source – Supply Chain Attack

3 More Plugins Infected in WordPress.org Supply Chain Attack Due to Compromised Developer Passwords

Featured Image by Shutterstock/Moksha Labs

Google’s Search Dilemma: The Battle With ‘Not’ & Prepositions via @sejournal, @MattGSouthern

While Google has made strides in understanding user intent, Director & Product Manager Elizabeth Tucker says specific queries remain challenging.

In a recent episode of Google’s Search Off The Record podcast, Tucker discussed some lingering pain points in the company’s efforts to match users with the information they seek.

Among the top offenders were searches containing the word “not” and queries involving prepositions, Tucker reveals:

“Prepositions, in general, are another hard one. And one of the really big, exciting breakthroughs was the BERT paper and transformer-based machine learning models when we started to be able to get some of these complicated linguistic issues right in searches.”

BERT, or Bidirectional Encoder Representations from Transformers, is a neural network-based technique for natural language processing that Google began leveraging in search in 2019.

The technology is designed to understand the nuances and context of words in searches rather than treating queries as a bag of individual terms.

‘Not’ There Yet

Despite the promise of BERT and similar advancements, Tucker acknowledged that Google’s ability to parse complex queries is still a work in progress.

Searches with the word “not” remain a thorn in the search engine’s side, Tucker explains:

“It’s really hard to know when ‘not’ means that you don’t want the word there or when it has a different kind of semantic meaning.”

For example, Google’s algorithms could interpret a search like “shoes not made in China” in multiple ways.

Does the user want shoes made in countries other than China, or are they looking for information on why some shoe brands have moved their manufacturing out of China?

This ambiguity poses a challenge for websites trying to rank for such queries. If Google can’t match the searcher’s intent with the content on a page, it may struggle to surface the most relevant results.

The Preposition Problem

Another area where Google’s algorithms can stumble is prepositions, which show the relationship between words in a sentence.

Queries like “restaurants with outdoor seating” or “hotels near the beach” rely on prepositions to convey key information about the user’s needs.

For SEO professionals, this means that optimizing for queries with prepositions may require some extra finesse.

It’s not enough to include the right keywords on a page; the content needs to be structured to communicate the relationships between those keywords.

The Long Tail Challenge

The difficulties Google faces with complex queries are particularly relevant to long-tail searches—those highly specific, often multi-word phrases that make up a significant portion of all search traffic.

Long-tail keywords are often seen as a golden opportunity for SEO, as they tend to have lower competition and can signal a high level of user intent.

However, if Google can’t understand these complex queries, it may be harder for websites to rank for them, even with well-optimized content.

The Road Ahead

Tucker noted that Google is actively improving its handling of these linguistically challenging queries, but a complete solution may still be a way off.

Tucker said:

“I would not say this is a solved problem. We’re still working on it.”

In the meantime, users may need to rephrase their searches or try different query formulations to find the information they’re looking for – a frustrating reality in an age when many have come to expect Google to understand their needs intuitively.

Why SEJ Cares

While BERT and similar advancements have helped Google understand user intent, the search giant’s struggles with “not” queries and prepositions remind us that there’s still plenty of room for improvement.

As Google continues to invest in natural language processing and other AI-driven technologies, it remains to be seen how long these stumbling blocks will hold back the search experience.

What It Means For SEO

So, what can SEO professionals and website owners do in light of this information? Here are a few things to keep in mind:

  1. Focus on clarity and specificity in your content. The more you can communicate the relationships between key concepts and phrases, the easier it will be for Google to understand and rank your pages.
  2. Use structured data and other technical SEO best practices to help search engines parse your content more effectively.
  3. Monitor your search traffic and rankings for complex queries, and be prepared to adjust your strategy if you see drops or inconsistencies.
  4. Monitor Google’s efforts to improve its natural language understanding and be ready to adapt as new algorithms and technologies emerge.

Listen to the full podcast episode below:

Google Completes June 2024 Spam Update Rollout via @sejournal, @MattGSouthern

Google has officially confirmed the completion of its June 2024 spam update, a week-long process aimed at enhancing search result quality by targeting websites that violate the company’s spam policies.

The update began on June 20, 2024, and was announced via Google’s Search Central Twitter account.

Google’s Search Status Dashboard shows the update finished on June 27 at 9:10 PDT.

This spam update is part of Google’s ongoing efforts to combat web spam and improve user experience.

It’s important to note that this is not the algorithmic component of the site reputation abuse update, which Google has clarified is yet to be implemented.

Key Points Of The June 2024 Spam Update

  1. The update targets websites violating Google’s spam policies.
  2. It is separate from the anticipated site reputation abuse algorithmic update.
  3. The rollout process lasted approximately one week.

Google’s spam updates typically focus on eliminating various forms of web spam, including:

  • Automatically generated content aimed solely at improving search rankings
  • Purchased or sold links intended to manipulate rankings
  • Thin, duplicated, or poor-quality content
  • Hidden redirects or other deceptive techniques

This latest update follows Google’s previous spam update in March 2024.

Despite that update’s impact, some AI-generated content performed well in search results.

An analysis by Search Engine Journal’s Roger Montti revealed that certain AI spam sites ranked for over 217,000 queries, with more than 14,900 ranking in the top 10 search results.

The June update is expected to refine Google’s spam detection capabilities further. However, as with previous updates, it may cause fluctuations in website search rankings.

Those engaging in practices that violate Google’s spam policies or heavily relying on AI-generated content may see a decline in their search visibility.

Conversely, legitimate websites adhering to Google’s guidelines may benefit from reduced competition from spammy sites in search results.

SEO professionals and website owners are advised to review their sites for spammy practices and ensure compliance with Google’s Webmaster Guidelines.

For more information about the June 2024 spam update and its potential impact, refer to Google’s official communication channels, including the Google Search Central Twitter account and the Google Search Status Dashboard.


Featured Image: ninefotostudio/Shutterstock

Google Reveals Its Methods For Measuring Search Quality via @sejournal, @MattGSouthern

How does Google know if its search results are improving?

As Google rolls out algorithm updates and claims to reduce “unhelpful” content, many wonder about the true impact of these changes.

In an episode of Google’s Search Off The Record podcast, Google Search Directer, Product Management, Elizabeth Tucker discusses how Google measures search quality.

This article explores Tucker’s key revelations, the implications for marketers, and how you can adapt to stay ahead.

Multifaceted Approach To Measurement

Tucker, who transitioned to product management after 15 years as a data scientist at Google, says it’s difficult to determine whether search quality is improving.

“It’s really hard,” she admitted, describing a comprehensive strategy that includes user surveys, human evaluators, and behavioral analysis.

Tucker explained

“We use a lot of metrics where we sample queries and have human evaluators go through and evaluate the results for things like relevance.”

She also noted that Google analyzes user behavior patterns to infer whether people successfully find the information they seek.

The Moving Target Of User Behavior

Tucker revealed that users make more complex queries as search quality improves.

This creates a constantly shifting landscape for Google’s teams to navigate.

Tucker observed:

“The better we’re able to do this, the more interesting and difficult searches people will do.”

Counterintuitive Metrics

Tucker shared that in the short term, poor search performance might lead to increased search activity as users struggle to find information.

However, this trend reverses long-term, with sustained poor performance resulting in decreased usage.

Tucker cautioned:

“A measurement that can be good in the long term can be misleading in the short term.”

Quantifying Search Quality

To tackle the challenge of quantifying search quality, Google relies on an expansive (and expanding) set of metrics that gauge factors like relevance, accuracy, trustworthiness, and “freshness.”

But numbers don’t always tell the full story, Tucker cautioned:

“I think one important thing that we all have to acknowledge is that not everything important is measurable, and not everything that is measurable is important.”

For relatively straightforward queries, like a search for “Facebook,” delivering relevant results is a comparatively simple task for modern search engines.

However, more niche or complex searches demand rigorous analysis and attention, especially concerning critical health information.

The Human Element

Google aims to surface the most helpful information for searchers’ needs, which are as diverse as they are difficult to pin down at the scales Google operates at.

Tucker says:

“Understanding if we’re getting it right, where we’re getting it right, where needs focus out of those billions of queries – man, is that a hard problem.”

As developments in AI and machine learning push the boundaries of what’s possible in search, Tucker sees the “human element” as a key piece of the puzzle.

From the search quality raters who assess real-world results to the engineers and product managers, Google’s approach to quantifying search improvements blends big data with human insight.

Looking Ahead

As long as the web continues to evolve, Google’s work to refine its search quality measurements will be ongoing, Tucker says:

“Technology is constantly changing, websites are constantly changing. If we just stood still, search would get worse.”

What Does This Mean?

Google’s insights can help align your strategies with Google’s evolving standards.

Key takeaways include:

  1. Quality over quantity: Given Google’s focus on relevance and helpfulness, prioritize creating high-quality, user-centric content rather than aiming for sheer volume.
  2. Embrace complexity: Develop content that addresses more nuanced and specific user needs.
  3. Think long-term: Remember that short-term metrics can be misleading. Focus on sustained performance and user satisfaction rather than quick wins.
  4. Holistic approach: Like Google, adopt a multifaceted approach to measuring your content’s success, combining quantitative metrics with qualitative assessments.
  5. Stay adaptable: Given the constant changes in technology and user behavior, remain flexible and ready to adjust your strategies as needed.
  6. Human-centric: While leveraging AI and data analytics, don’t underestimate the importance of human insight in understanding and meeting user needs.

As Tucker’s insights show, this user-first approach is at the heart of Google’s efforts to improve search quality – and it should be at the center of every marketer’s strategy as well.

Listen to the discussion on measuring search quality in the video below, starting at the 17:39 mark:


Featured Image: Screenshot from YouTube.com/GoogleSearchCentral, June 2024