SEO Rockstar “Proves” You Don’t Need Meta Descriptions via @sejournal, @martinibuster

An SEO shared on social media that his SEO tests proved that not using a meta description resulted in a lift in traffic. Coincidentally, another well-known SEO published an article that claims that SEO tests misunderstand how Google and the internet actually work and lead to the deprioritization of meaningful changes. Who is right?

SEO Says Pages Without Meta Descriptions Received Ranking Improvement

Mark Williams-Cook posted the results of his SEO test on LinkedIn about using and omitting meta descriptions, concluding that pages lacking a meta description received an average traffic lift of approximately 3%.

Here’s some of what he wrote:

“This will get some people’s backs up, but we don’t recommend writing meta descriptions anymore, and that’s based on data and testing.

We have consistently found a small, usually around 3%, but statistically significant uplift to organic traffic on groups of pages with no meta descriptions vs test groups of pages with meta descriptions via SEOTesting.

I’ve come to the conclusion if you’re writing meta descriptions manually, you’re wasting time. If you’re using AI to do it, you’re probably wasting a small amount of time.”

Williams-Cook asserted that Google rewrites around 80% of meta descriptions and insisted that the best meta descriptions are query dependent, meaning that the ideal meta description would be one that’s custom written for the specific queries the page is ranking for, which is what Google does when the meta description is missing.

He expressed the opinion that omitting the meta description increases the likelihood that Google will step in and inject a query-relevant meta description into the search results which will “outperform” the normal meta description that’s optimized for whatever the page is about.

Although I have reservations about SEO tests in general, his suggestion is intriguing and has the ring of plausibility.

Are SEO Tests Performative Theater?

Coincidentally, Jono Alderson, a technical SEO consultant, published an article last week titled, “Stop testing. Start shipping.” where he discusses his view on SEO tests, calling it “performative theater.”

Alderson writes:

“The idea of SEO testing appeals because it feels scientific. Controlled. Safe…

You tweak one thing, you measure the outcome, you learn, you scale. It works for paid media, so why not here?

Because SEO isn’t a closed system. …It’s architecture, semantics, signals, and systems. And trying to test it like you would test a paid campaign misunderstands how the web – and Google – actually work.

Your site doesn’t exist in a vacuum. Search results are volatile. …Even the weather can influence click-through rates.

Trying to isolate the impact of a single change in that chaos isn’t scientific. It’s theatre.

…A/B testing, as it’s traditionally understood, doesn’t even cleanly work in SEO.

…most SEO A/B testing isn’t remotely scientific. It’s just a best-effort simulation, riddled with assumptions and susceptible to confounding variables. Even the cleanest tests can only hint at causality – and only in narrowly defined environments.”

Jono makes a valid point about the unreliability of tests where the inputs and the outputs are not fully controlled.

Statistical tests are generally done within a closed system where all the data being compared follow the same rules and patterns. But if you compare multiple sets of pages, where some pages target long-tail phrases and others target high-volume queries, then the pages will differ in their potential outcomes. External changes (daily traffic fluctuation, users clicking on the search results) aren’t controllable. As Jono suggested, even the weather can influence click rates.

Although Williams-Cook asserted that he had a control group for testing purposes, it’s extremely difficult to isolate a single variable on live websites due to the uncontrollable external factors as Jono points out.

So, even though Williams-Cook asserts that the 3% change he noted is consistent and statistically relevant, the unobservable factors within Google’s black box algorithm that determines the outcome makes it difficult to treat that result as a reliable causal finding in the way one could with a truly controlled and observable statistical testing method.

If it’s not possible to isolate one change then it’s very difficult to make reliable claims about the resulting SEO test results.

Focus On Meaningful SEO Improvements

Jono’s article calls out the shortcomings of SEO tests but the point of his essay is to call attention to how focusing on what can be tested and measured can become prioritized over the “meaningful” changes that should be made but aren’t because they cannot be measured. He argues that it’s important to focus on the things that matter in today’s search environment that are related to content and a better user experience.

And that’s where we circle back to Williams-Cook because although statistically valid A/B SEO tests may be “theatre” as Jono suggests, it doesn’t mean that Williams-Cook’s suggestion is wrong. He may actually may be correct that it’s better to omit the meta description and let Google rewrite them.

SEO is subjective which means what’s good for one might not be a priority for someone else. So the question remains, is removing all meta descriptions a meaningful change?

Featured Image by Shutterstock/baranq

Are You Still Optimizing for Rankings? AI Search May Not Care. [Webinar] via @sejournal, @hethr_campbell

No ranking data. No impression data. 

So, how do you measure success when AI-generated answers appear and disappear, prompt by prompt?

With these significant changes to how we optimize for search, many brands are seeking to understand how to achieve SEO success.

Some Brands Are Winning in Search. Others? Invisible.

If your content isn’t appearing in AI-generated responses, like AI Overviews, ChatGPT, or Perplexity, you’re already losing ground to competitors.

👉 RSVP: Learn from the brands still dominating SERPs through AI search

In This Free Webinar, You’ll Learn:

  • Data-backed insights on what drives visibility and performance in AI search
  • A proven framework to drive results in AI search, and why this approach works
  • Purpose-built content strategies for driving success in Answer Engine Optimization (AEO) and Generative Engine Optimization (GEO).

This webinar helps enterprise SEOs and executives move from “I don’t know what’s happening in AI search” to “I have a data-driven strategy to compete and win.”

This session is designed for:

  • Marketing managers and SEO strategists looking to stay ahead.
  • Brand leaders managing performance visibility across platforms.
  • Content teams building for modern search behaviors.

You’ll walk away with a usable playbook and a better understanding of how to optimize for the answer, not the query.

Learn from what today’s winning brands are doing right.

Secure your spot, plus get the recording sent straight to your inbox if you can’t make it live.

New Google AI Mode: Everything You Need To Know & What To Do Next via @sejournal, @lorenbaker

Is your SEO strategy ready for Google’s new AI Mode?

Is your 2025 SERP strategy in danger?

What’s changed between traditional search mode and AI mode? 

Will Google’s New AI Mode Hurt Your Traffic?

Watch our webinar on-demand as we explored early implications for click-through rates, organic visibility, and content performance so you can:

  • Spot AIO SERP triggers: Identify search types most likely to spark AI Overviews.
  • Analyze impact: Find out which industries are being hit hardest.
  • Audit AIO brand mentions: See which domains are dominating AI-generated answers.
  • Optimize visibility: Update your SEO strategy to stay competitive.
  • Accurately track AI traffic: Measure shifts in click-through rates, visibility, and content performance.

In this actionable session, Nick Gallagher, SEO Lead at Conductor, gave actionable SEO guidance in this new era of search engine results page (SERPs). 

Get recommendations for optimizing content to stay competitive as AI-generated answers grow in prominence.

Google’s New AI Mode: Learn To Analyze, Adapt & Optimize

Don’t wait for the SERPs to leave you behind.

Watch on-demand to uncover if AI Mode will hurt your traffic, and what to do about it.

View the slides below or check out the full webinar for all the details

Join Us For Our Next Webinar!

The Data Reveals: What It Takes To Win In AI Search

Register now to learn how to stay away from modern SEO strategies that don’t work.

LLM Visibility Tools: Do SEOs Agree On How To Use Them? via @sejournal, @martinibuster

A discussion on LinkedIn about LLM visibility and the tools for tracking it explored how SEOs are approaching optimization for LLM-based search. The answers provided suggest that tools for LLM-focused SEO are gaining maturity, though there is some disagreement about what exactly should be tracked.

Joe Hall (LinkedIn profile) raised a series of questions on LinkedIn about the usefulness of tools that track LLM visibility. He didn’t explicitly say that the tools lacked utility, but his questions appeared intended to open a conversation

He wrote:

“I don’t understand how these systems that claim to track LLM visibility work. LLM responses are highly subjective to context. They are not static like traditional SERPs are. Even if you could track them, how can you reasonably connect performance to business objectives? How can you do forecasting, or even build a strategy with that data? I understand the value of it from a superficial level, but it doesn’t really seem good for anything other than selling a service to consultants that don’t really know what they are doing.”

Joshua Levenson (LinkedIn profile) else answered saying that today’s SEO tools are out of date, remarking:

“People are using the old paradigm to measure a new tech.”

Joe Hall responded with “Bingo!”

LLM SEO: “Not As Easy As Add This Keyword”

Lily Ray (LinkedIn profile) responded to say that the entities that LLMs fall back on are a key element to focus on.

She explained:

“If you ask an LLM the same question thousands of times per day, you’ll be able to average the entities it mentions in its responses. And then repeat that every day. It’s not perfect but it’s something.”

Hall asked her how that’s helpful to clients and Lily answered:

“Well, there are plenty of actionable recommendations that can be gleaned from the data. But that’s obviously the hard part. It’s not as easy as “add this keyword to your title tag.”

Tools For LLM SEO

Dixon Jones (LinkedIn profile) responded with a brief comment to introduce Waikay, which stands for What AI Knows About You. He said that his tool uses entity and topic extraction, and bases its recommendations and actions on gap analysis.

Ryan Jones (LinkedIn profile) responded to discuss how his product SERPRecon works:

“There’s 2 ways to do it. one – the way I’m doing it on SERPrecon is to use the APIs to monitor responses to the queries and then like LIly said, extract the entities, topics, etc from it. this is the cheaper/easier way but is easiest to focus on what you care about. The focus isn’t on the exact wording but the topics and themes it keeps mentioning – so you can go optimize for those.

The other way is to monitor ISP data and see how many real user queries you actually showed up for. This is super expensive.

Any other method doesn’t make much sense.”

And in another post followed up with more information:

“AI doesn’t tell you how it fanned out or what other queries it did. people keep finding clever ways in the network tab of chrome to see it, but they keep changing it just as fast.

The AI Overview tool in my tool tries to reverse engineer them using the same logic/math as their patents, but it can never be 100%.”

Then he explained how it helps clients:

“It helps us in the context of, if I enter 25 queries I want to see who IS showing up there, and what topics they’re mentioning so that I can try to make sure I’m showing up there if I’m not. That’s about it. The people measuring sentiment of the AI responses annoy the hell out of me.”

Ten Blue Links Were Never Static

Although Hall stated that the “traditional” search results were static, in contrast to LLM-based search results, it must be pointed out that the old search results were in a constant state of change, especially after the Hummingbird update which enabled Google to add fresh search results when the query required it or when new or updated web pages were introduced to the web. Also, the traditional search results tended to have more than one intent, often as many as three, resulting in fluctuations in what’s ranking.

LLMs also show diversity in their search results but, in the case of AI Overviews, Google shows a few results that for the query and then does the “fan-out” thing to anticipate follow-up questions that naturally follow as part of discovering a topic.

Billy Peery (LinkedIn profile) offered an interesting insight into LLM search results, suggesting that the output exhibits a degree of stability and isn’t as volatile as commonly believed.

He offered this truly interesting insight:

“I guess I disagree with the idea that the SERPs were ever static.

With LLMs, we’re able to better understand which sources they’re pulling from to answer questions. So, even if the specific words change, the model’s likelihood of pulling from sources and mentioning brands is significantly more static.

I think the people who are saying that LLMs are too volatile for optimization are too focused on the exact wording, as opposed to the sources and brand mentions.”

Peery makes an excellent point by noting that some SEOs may be getting hung up on the exact keyword matching (“exact wording”) and that perhaps the more important thing to focus on is whether the LLM is linking to and mentioning specific websites and brands.

Takeaway

Awareness of LLM tools for tracking visibility is growing. Marketers are reaching some agreement on what should be tracked and how it benefits clients. While some question the strategic value of these tools, others use them to identify which brands and themes are mentioned, adding that data to their SEO mix.

Featured Image by Shutterstock/TierneyMJ

Google’s John Mueller: Core Updates Build On Long-Term Data via @sejournal, @MattGSouthern

Google Search Advocate John Mueller says core updates rely on longer-term patterns rather than recent site changes or link spam attacks.

The comment was made during a public discussion on Bluesky, where SEO professionals debated whether a recent wave of spammy backlinks could impact rankings during a core update.

Mueller’s comment offers timely clarification as Google rolls out its June core update.

Core Updates Aren’t Influenced By Recent Links

Asked directly whether recent link spam would be factored into core update evaluations, Mueller said:

“Off-hand, I can’t think of how these links would play a role with the core updates. It’s possible there’s some interaction that I’m not aware of, but it seems really unlikely to me.

Also, core updates generally build on longer-term data, so something really recent wouldn’t play a role.”

For those concerned about negative SEO tactics, Mueller’s statement suggests recent spam links are unlikely to affect how Google evaluates a site during a core update.

Link Spam & Visibility Concerns

The conversation began with SEO consultant Martin McGarry, who shared traffic data suggesting spam attacks were impacting sites targeting high-value keywords.

In a post linking to a recent SEJ article, McGarry wrote:

“This is traffic up in a high value keyword and the blue line is spammers attacking it… as you can see traffic disappears as clear as day.”

Mark Williams-Cook responded by referencing earlier commentary from a Google representative at the SEOFOMO event, where it was suggested that in most cases, links were not the root cause of visibility loss, even when the timing seemed suspicious.

This aligns with a broader theme in recent SEO discussions: it’s often difficult to prove that link-based attacks are directly responsible for ranking drops, especially during major algorithm updates.

Google’s Position On The Disavow Tool

As the discussion turned to mitigation strategies, Mueller reminded the community that Google’s disavow tool remains available, though it’s not always necessary.

Mueller said:

“You can also use the domain: directive in the disavow file to cover a whole TLD, if you’re +/- certain that there are no good links for your site there.”

He added that the tool is often misunderstood or overused:

“It’s a tool that does what it says; almost nobody needs it, but if you think your case is exceptional, feel free.

Pushing it as a service to everyone says a bit about the SEO though.”

That final remark drew pushback from McGarry, who clarified that he doesn’t sell cleanup services and only uses the disavow tool in carefully reviewed edge cases.

Community Calls For More Transparency

Alan Bleiweiss joined the conversation by calling for Google to share more data about how many domains are already ignored algorithmically:

“That would be the best way to put site owners at ease, I think. There’s a psychology to all this cat & mouse wording without backing it up with data.”

His comment reflects a broader sentiment. Many professionals still feel in the dark about how Google handles potentially manipulative or low-quality links at scale.

What This Means

Mueller’s comments offer guidance for anyone evaluating ranking changes during a core update:

  • Recent link spam is unlikely to influence a core update.
  • Core updates are based on long-term patterns, not short-term changes.
  • The disavow tool is still available but rarely needed in most cases.
  • Google’s systems may already discount low-quality links automatically.

If your site has seen changes in visibility since the start of the June core update, these insights suggest looking beyond recent link activity. Instead, focus on broader, long-term signals, such as content quality, site structure, and overall trust.

Keywords Are Dead, But The Keyword Universe Isn’t via @sejournal, @Kevin_Indig

Today’s Memo is a full refresh of one of the most important frameworks I use with clients – and one I’ve updated heavily based on how AI is reshaping search behavior…

…I’m talking about the keyword universe. 🪐

In this issue, I’m digging into:

  • Why the old way of doing keyword research doesn’t cut it anymore.
  • How to build a keyword pipeline that compounds over time.
  • A scoring system for prioritizing keywords that actually convert.
  • How to handle keyword chaos with structure and clarity.
  • A simple keyword universe tracker I designed that will save you hours of trial and error (for premium subscribers).

Initiating liftoff … we’re heading into search space. 🧑‍🚀🛸

Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

A single keyword no longer represents a single intent or SERP outcome. In today’s AI-driven search landscape, we need scalable structures that map and evolve with intent … not just “rank.”

Therefore, the classic approach to keyword research is outdated.

In fact, despite all the boy-who-cried-wolf “SEO is dead!” claims across the web, I’d argue that keyword-based SEO is actually dead, which I wrote about in Death of the Keyword.

And it has been for a while.

But the SEO keyword universe is not. And I’ll explain why.

What A Keyword Universe Is – And Why You Need It

A keyword universe is a big pool of language your target audience uses when they search that will help them find you.

It surfaces the most important queries and phrases (i.e., keywords) at the top and lives in a spreadsheet or database, like BigQuery.

Instead of hyperfocusing on specific keywords or doing a keyword sprint every so often, you need to build a keyword universe that you’ll explore and conquer across your site over time.

One problem I tried to solve with the keyword universe is that keyword and intent research is often static.

It happens maybe every month or quarter, and it’s very manual. A keyword universe is both static and dynamic. While that might sound counterintuitive, here’s what I mean:

The keyword universe is like a pool that you can fill with water whenever you want. You can update it daily, monthly, quarter – whenever. It always surfaces the most important intents at the top.

For the majority of brands, some keyword-universe-building tasks only need to be done once (or once on product/service launch), while other tasks might be ongoing. More on this below.

Within your database, you’ll assign weighted scores to prioritize content creation, but that scoring system might shift over time based on changes in initiatives, product/feature launches, and discovering topics with high conversion rates.

Image Credit: Kevin Indig

To Infinity And Beyond

The goal in building your keyword universe is to create a keyword pipeline for content creation – one that you prioritize by business impact.

Keyword universes elevate the most impactful topics to the top of a list, which allows you to focus on planning capacity, like:

  • The number of published articles needed to comprehensively cover core topics.
  • Resources needed to cover essential topics in a competitive timeframe.
  • Roadmapping content formats and angles (e.g., long-form guides, comparison tables, videos, etc.).
Image Credit: Kevin Indig

A big problem in SEO is knowing which keywords convert to customers before targeting them.

One big advantage of the keyword universe (compared to research sprints) is that new keywords automatically fall into a natural prioritization.

And with the advent of AI in search, like AI Overviews/Google’s AI Mode, this is more important than ever.

The keyword universe mitigates that problem through a clever sorting system.

SEO pros can continuously research and launch new keywords into the universe, while writers can pick keywords off the list at any time.

Think fluid collaboration.

Image Credit: Kevin Indig

Keyword universes are mostly relevant for companies that have to create content themselves instead of leaning on users or products. I call them integrators.

Typical integrator culprits are SaaS, DTC, or publishing businesses, which often have no predetermined, product-led SEO structure for keyword prioritization.

The opposite is aggregators, which scale organic traffic through user-generated content (UGC) or product inventory. (Examples include sites like TripAdvisor, Uber Eats, TikTok, and Yelp.)

The keyword path for aggregators is defined by their page types. And the target topics come out of the product.

Yelp, for example, knows that “near me keywords” and query patterns like “{business} in {city}” are important because that’s the main use case for their local listing pages.

Integrators don’t have that luxury. They need to use other signals to prioritize keywords for business impact.

Ready To Take On The Galaxy? Build Your Keyword Universe

Creating your keyword universe is a three-step process.

And I’ll bet it’s likely you have old spreadsheets of keywords littered throughout your shared drives, collecting dust.

Guess what? You can add them to this process and make good use of them, too. (Finally.)

Step 1: Mine For Queries

Keyword mining is the science of building a large list of keywords and a bread-and-butter workflow in SEO.

The classic way is to use a list of seed keywords and throw them into third-party rank trackers (like Semrush or Ahrefs) to get related terms and other suggestions.

That’s a good start, but that’s what your competitors are doing too.

You need to look for fresh ideas that are unique to your brand – data that no one else has…

…so start with customer conversations.

Dig into:

  • Sales calls.
  • Support requests.
  • Customer and/or target audience interviews.
  • Social media comments on branded accounts.
  • Product or business reviews.

And then extract key phrasing, questions, and terms your audience actually uses.

But don’t ignore other valuable sources of keyword ideas:

  • SERP features, like AIOs, PAAs, and Google Suggest.
  • Search Console: keywords Google tries to rank your site for.
  • Competitor ranks and paid search keywords.
  • Conversational prompts your target audience is likely to use.
  • Reddit threads, YouTube comments, podcast scripts, etc.
Semrush’s list of paid keywords a site bids on (Image Credit: Kevin Indig)

The goal of the first step is to grow our universe with as many keywords as we can find.

(Don’t obsess over relevance. That’s Step 2.)

During this phase, there are some keyword universe research tasks that will be one-time-only, and some that will likely need refreshing or repeating over time.

Here’s a quick list to distinguish between repeat and one-time tasks:

  1. Audience-based research: Repeat and refresh over time – quarterly is often sufficient. Pay attention to what pops up seasonally.
  2. Product-focused research: Complete for the initial launch of a new product or feature.
  3. Competitor-focused research: Complete once for both business and SEO competitors. Refresh/update when there’s a new feature, product/service, or competitor.
  4. Location-focused research: Do this once per geographic location serviced and when you expand into new service locations

Step 2: Sort And Align

Step 2, sorting the long list of mined queries, is the linchpin of keyword universes.

If you get this right, you’ll be installing a powerful SEO prioritization system for your company.

Getting it wrong is just wasting time.

Anyone can create a large list of keywords, but creating strong filters and sorting mechanisms is hard.

The old school way to go about prioritization is by search volume.

Throw that classic view out the window: We can do better than that.

Most times, keywords with higher search volume actually convert less well – or get no real traffic at all due to AIOs.

As I mentioned in Death of the Keyword:

A couple of months ago, I rewrote my guide to inhouse SEO and started ranking in position one. But the joke was on me. I didn’t get a single dirty click for that keyword. Over 200 people search for “in house seo” but not a single person clicks on a search result.

By the way, Google Analytics only shows 10 clicks from organic search over the last 3 months. So, what’s going on? The 10 clicks I actually got are not reported in GSC (privacy… I guess?), but the majority of searchers likely click on one of the People Also Asked features that show up right below my search result.

Keeping that in mind about search volume, since we don’t know which keywords are most important for the business before targeting them – and we don’t want to make decisions by volume alone – we need sorting parameters based on strong signals.

We can summarize several signals for each keyword and sort the list by total score.

That’s exactly what I’ve done with clients like Ramp, the fastest-growing fintech startup in history, to prioritize content strategy.

Image Credit: Kevin Indig

Sorting is about defining an initial set of signals and then refining it with feedback.

You’ll start by giving each signal a weight based on our best guess – and then refine it over time.

When you build your keyword universe, you’ll want to define an automated logic (say, in Google Sheets or BigQuery).

Your logic could be a simple “if this then that,” like “if keyword is mentioned by customer, assign 10 points.”

Potential signals (not all need to be used):

  • Keyword is mentioned in customer conversation.
  • Keyword is part of a topic that converts well.
  • Topic is sharply related to direct offering or pain point your brand solves.
  • Mmonthly search volume (MSV)
  • Keyword difficulty (KD)/competitiveness
  • (MSV * KD) / CPC → I like to use this simple formula to balance search demand with competitiveness and potential conversion value.
  • Traffic potential.
  • Conversions from paid search or other channels.
  • Growing or shrinking MSV.
  • Query modifier indicates users are ready to take action, like “buy” or “download.”

You should give each signal a weight from 0-10 or 0-3, with the highest number being strongest and zero being weakest.

Your scoring will be unique to you based on business goals.

Let’s pause here for a moment: I created a simple tool that will make this work way easier, saving a lot of time and trial + error. (It’s below!) Premium subscribers get full access to tools like this one, along with additional content and deep dives.

But let’s say you’re prioritizing building content around essential topics and have goals set around growing topical authority. And let’s say you’re using the 0-10 scale. Your scoring might look something like:

  • Keyword is mentioned in customer conversation: 10.
  • Keyword is part of a topic that converts well: 10.
  • Topic is sharply related to direct offering or pain point your brand solves: 10.
  • MSV: 3.
  • KD/competitiveness: 6.
  • (MSV * KD) / CPC → I like to use this simple formula to balance search demand with competitiveness and potential conversion value: 5.
  • Traffic potential: 3.
  • Conversions from paid search or other channels: 6.
  • Growing or shrinking MSV: 4.
  • Query modifier indicates users are ready to take action, like “buy” or “download”: 7.

The sum of all scores for each query in your universe then determines the priority sorting of the list.

Keywords with the highest total score land at the top and vice versa.

New keywords on the list fall into a natural prioritization.

Important note: If your research shows that sales are connected to queries related to current events, news, updates in research reports, etc., those should be addressed as soon as possible.

(Example: If your company sells home solar batteries and recent weather news increases demand due to a specific weather event, make sure to prioritize that in your universe ASAP.)

Amanda’s thoughts: I might get some hate for this stance, but if you’re a new brand or site just beginning to build a content library and you fall into the integrator category, focus on building trust first by securing visibility in organic search results where you can as quickly as you can.

I know, I know: What about conversions? Conversion-focused content is crucial to the long-term success of the org.

But to set yourself apart, you need to actually create the content that no one is making about the questions, pain points, and specific needs your target audience is voicing.

If your sales team repeatedly hears a version of the same question, it’s likely there’s no easy-to-find answer to the question – or the current answers out there aren’t trustworthy. Trust is the most important currency in the era of AI-based search. Start building it ASAP. Conversions will follow.

Step 3: Refine

Models get good by improving over time.

Like a large language model that learns from fine-tuning, we need to adjust our signal weighting based on the results we see.

We can go about fine-tuning in two ways:

1. Anecdotally, conversions should increase as we build new content (or update existing content) based on the keyword universe prioritization scoring.

Otherwise, sorting signals have the wrong weight, and we need to adjust.

2. Another way to test the system is a snapshot analysis.

To do so, you’ll run a comparison of two sets of data: the keywords that attract the most organic visibility and the pages that drive the most conversions, side-by-side with the keywords at the top of the universe.

Ideally, they overlap. If they don’t, aim to adjust your sorting signals until they come close.

Tips For Maintaining Your Keyword Universe

Look, there’s no point in doing all this work unless you’re going to maintain the hygiene of this data over time.

This is what you need to keep in mind:

1. Once you’ve created a page that targets a keyword in your list, move it to a second tab on the spreadsheet or another table in the database.

That way, you don’t lose track and end up with writers creating duplicate content.

2. Build custom click curves for each page type (blog article, landing page, calculator, etc.) when including traffic and revenue projections.

Assign each step in the conversion funnel a conversion rate – like visit ➡️newsletter sign-up, visit ➡️demo, visit ➡️purchase – and multiply search volume with an estimated position on the custom click curve, conversion rates, and lifetime value. (Fine-tune regularly.)

Here’s an example: MSV * CTR (pos 1) * CVRs * Lifetime value = Revenue prediction

3. GPT for Sheets or the Meaning Cloud extension for Google Sheets can speed up assigning each keyword to a topic.

Meaning Cloud allows us to easily train an LLM by uploading a spreadsheet with a few tagged keywords.

GPT for Sheets connects Google Sheets with the OpenAI API so we can give prompts like “Which of the following topics would this keyword best fit? Category 1, category 2, category 3, etc.”

LLMs like Chat GPT, Claude, or Gemini have become good enough that you can easily use them to assign topics as well. Just prompt for consistency!

4. Categorize the keywords by intent, and then group or sort your sheet by intent. Check out Query Fan Out to learn why.

5. Don’t build too granular and expansive of a keyword universe that you can’t activate it.

If you have a team of in-house strategists and three part-time freelancers, expecting a 3,000 keyword universe to feel doable and attainable is … an unmet expectation.

Your Keyword Universe Is Designed To Explore

The old way of doing SEO – chasing high-volume keywords and hoping for conversions – isn’t built for today’s search reality.

Trust is hard to earn. (And traffic is hard to come by.)

The keyword universe gives you a living, breathing SEO operating system. One that can evolve based on your custom scoring and prioritization.

Prioritizing what’s important (sorting) allows us to literally filter through the noise (distractions, offers, shiny objects) and bring us to where we want to be.

So, start with your old keyword docs. (Or toss them out if they’re irrelevant, aged poorly, or simply hyper-focused on volume.)

Then, dig into what your customers are really asking. Build smart signals. Assign weights. And refine as you go.

This isn’t about perfection. It’s about building a system that actually works for you.

And speaking of building a system…

Keyword Universe Tracker (For Premium Subscribers)

For premium Growth Memo subscribers, we’ve got a tool that will help save you time and score queries by unique priority weights that you set.

Image Credit: Kevin Indig

More Resources: 


Featured Image: Paulo Bobita/Search Engine Journal

Google’s Trust Ranking Patent Shows How User Behavior Is A Signal via @sejournal, @martinibuster

Google long ago filed a patent for ranking search results by trust. The groundbreaking idea behind the patent is that user behavior can be used as a starting point for developing a ranking signal.

The big idea behind the patent is that the Internet is full of websites all linking to and commenting about each other. But which sites are trustworthy? Google’s solution is to utilize user behavior to indicate which sites are trusted and then use the linking and content on those sites to reveal more sites that are trustworthy for any given topic.

PageRank is basically the same thing only it begins and ends with one website linking to another website. The innovation of Google’s trust ranking patent is to put the user at the start of that trust chain like this:

User trusts X Websites > X Websites trust Other Sites > This feeds into Google as a ranking signal

The trust originates from the user and flows to trust sites that themselves provide anchor text, lists of other sites and commentary about other sites.

That, in a nutshell, is what Google’s trust-based ranking algorithm is about.

The deeper insight is that it reveals Google’s groundbreaking approach to letting users be a signal of what’s trustworthy. You know how Google keeps saying to create websites for users? This is what the trust patent is all about, putting the user in the front seat of the ranking algorithm.

Google’s Trust And Ranking Patent

The patent was coincidentally filed around the same period that Yahoo and Stanford University published a Trust Rank research paper which is focused on identifying spam pages.

Google’s patent is not about finding spam. It’s focused on doing the opposite, identifying trustworthy web pages that satisfy the user’s intent for a search query.

How Trust Factors Are Used

The first part of any patent consists of an Abstract section that offers a very general description of the invention that that’s what this patent does as well.

The patent abstract asserts:

  • That trust factors are used to rank web pages.
  • The trust factors are generated from “entities” (which are later described to be the users themselves, experts, expert web pages, and forum members) that link to or comment about other web pages).
  • Those trust factors are then used to re-rank web pages.
  • Re-ranking web pages kicks in after the normal ranking algorithm has done its thing with links, etc.

Here’s what the Abstract says:

“A search engine system provides search results that are ranked according to a measure of the trust associated with entities that have provided labels for the documents in the search results.

A search engine receives a query and selects documents relevant to the query.

The search engine also determines labels associated with selected documents, and the trust ranks of the entities that provided the labels.

The trust ranks are used to determine trust factors for the respective documents. The trust factors are used to adjust information retrieval scores of the documents. The search results are then ranked based on the adjusted information retrieval scores.”

As you can see, the Abstract does not say who the “entities” are nor does it say what the labels are yet, but it will.

Field Of The Invention

The next part is called the Field Of The Invention. The purpose is to describe the technical domain of the invention (which is information retrieval) and the focus (trust relationships between users) for the purpose of ranking web pages.

Here’s what it says:

“The present invention relates to search engines, and more specifically to search engines that use information indicative of trust relationship between users to rank search results.”

Now we move on to the next section, the Background, which describes the problem this invention solves.

Background Of The Invention

This section describes why search engines fall short of answering user queries (the problem) and why the invention solves the problem.

The main problems described are:

  • Search engines are essentially guessing (inference) what the user’s intent is when they only use the search query.
  • Users rely on expert-labeled content from trusted sites (called vertical knowledge sites) to tell them which web pages are trustworthy
  • Explains why the content labeled as relevant or trustworthy is important but ignored by search engines.
  • It’s important to remember that this patent came out before the BERT algorithm and other natural language approaches that are now used to better understand search queries.

This is how the patent explains it:

“An inherent problem in the design of search engines is that the relevance of search results to a particular user depends on factors that are highly dependent on the user’s intent in conducting the search—that is why they are conducting the search—as well as the user’s circumstances, the facts pertaining to the user’s information need.

Thus, given the same query by two different users, a given set of search results can be relevant to one user and irrelevant to another, entirely because of the different intent and information needs.”

Next it goes on to explain that users trust certain websites that provide information about certain topics:

“…In part because of the inability of contemporary search engines to consistently find information that satisfies the user’s information need, and not merely the user’s query terms, users frequently turn to websites that offer additional analysis or understanding of content available on the Internet.”

Websites Are The Entities

The rest of the Background section names forums, review sites, blogs, and news websites as places that users turn to for their information needs, calling them vertical knowledge sites. Vertical Knowledge sites, it’s explained later, can be any kind of website.

The patent explains that trust is why users turn to those sites:

“This degree of trust is valuable to users as a way of evaluating the often bewildering array of information that is available on the Internet.”

To recap, the “Background” section explains that the trust relationships between users and entities like forums, review sites, and blogs can be used to influence the ranking of search results. As we go deeper into the patent we’ll see that the entities are not limited to the above kinds of sites, they can be any kind of site.

Patent Summary Section

This part of the patent is interesting because it brings together all of the concepts into one place, but in a general high-level manner, and throws in some legal paragraphs that explain that the patent can apply to a wider scope than is set out in the patent.

The Summary section appears to have four sections:

  • The first section explains that a search engine ranks web pages that are trusted by entities (like forums, news sites, blogs, etc.) and that the system maintains information about these labels about trusted web pages.
  • The second section offers a general description of the work of the entities (like forums, news sites, blogs, etc.).
  • The third offers a general description of how the system works, beginning with the query, the assorted hand waving that goes on at the search engine with regard to the entity labels, and then the search results.
  • The fourth part is a legal explanation that the patent is not limited to the descriptions and that the invention applies to a wider scope. This is important. It enables Google to use a non-existent thing, even something as nutty as a “trust button” that a user selects to identify a site as being trustworthy as an example. This enables an example like a non-existent “trust button” to be a stand-in for something else, like navigational queries or Navboost or anything else that is a signal that a user trusts a website.

Here’s a nutshell explanation of how the system works:

  • The user visits sites that they trust and click a “trust button” that tells the search engine that this is a trusted site.
  • The trusted site “labels” other sites as trusted for certain topics (the label could be a topic like “symptoms”).
  • A user asks a question at a search engine (a query) and uses a label (like “symptoms”).
  • The search engine ranks websites according to the usual manner then it looks for sites that users trust and sees if any of those sites have used labels about other sites.
  • Google ranks those other sites that have had labels assigned to them by the trusted sites.

Here’s an abbreviated version of the third part of the Summary that gives an idea of the inner workings of the invention:

“A user provides a query to the system…The system retrieves a set of search results… The system determines which query labels are applicable to which of the search result documents. … determines for each document an overall trust factor to apply… adjusts the …retrieval score… and reranks the results.”

Here’s that same section in its entirety:

  • “A user provides a query to the system; the query contains at least one query term and optionally includes one or more labels of interest to the user.
  • The system retrieves a set of search results comprising documents that are relevant to the query term(s).
  • The system determines which query labels are applicable to which of the search result documents.
  • The system determines for each document an overall trust factor to apply to the document based on the trust ranks of those entities that provided the labels that match the query labels.
  • Applying the trust factor to the document adjusts the document’s information retrieval score, to provide a trust adjusted information retrieval score.
  • The system reranks the search result documents based at on the trust adjusted information retrieval scores.”

The above is a general description of the invention.

The next section, called Detailed Description, deep dives into the details. At this point it’s becoming increasingly evident that the patent is highly nuanced and can not be reduced to simple advice similar to: “optimize your site like this to earn trust.”

A large part of the patent hinges on a trust button and an advanced search query:  label:

Neither the trust button or the label advanced search query have ever existed. As you’ll see, they are quite probably stand-ins for techniques that Google doesn’t want to explicitly reveal.

Detailed Description In Four Parts

The details of this patent are located in four sections within the Detailed Description section of the patent. This patent is not as simple as 99% of SEOs say it is.

These are the four sections:

  1. System Overview
  2. Obtaining and Storing Trust Information
  3. Obtaining and Storing Label Information
  4. Generated Trust Ranked Search Results

The System Overview is where the patent deep dives into the specifics. The following is an overview to make it easy to understand.

System Overview

1. Explains how the invention (a search engine system) ranks search results based on trust relationships between users and the user-trusted entities who label web content.

2. The patent describes a “trust button” that a user can click that tells Google that a user trusts a website or trusts the website for a specific topic or topics.

3. The patent says a trust related score is assigned to a website when a user clicks a trust button on a website.

4. The trust button information is stored in a trust database that’s referred to as #190.

Here’s what it says about assigning a trust rank score based on the trust button:

“The trust information provided by the users with respect to others is used to determine a trust rank for each user, which is measure of the overall degree of trust that users have in the particular entity.”

Trust Rank Button

The patent refers to the “trust rank” of the user-trusted websites. That trust rank is based on a trust button that a user clicks to indicate that they trust a given website, assigning a trust rank score.

The patent says:

“…the user can click on a “trust button” on a web page belonging to the entity, which causes a corresponding record for a trust relationship to be recorded in the trust database 190.

In general any type of input from the user indicating that such as trust relationship exists can be used.”

The trust button has never existed and the patent quietly acknowledges this by stating that any type of input can be used to indicate the trust relationship.

So what is it? I believe that the “trust button” is a stand-in for user behavior metrics in general, and site visitor data in particular. The patent Claims section does not mention trust buttons at all but does mention user visitor data as an indicator of trust.

Here are several passages that mention site visits as a way to understand if a user trusts a website:

“The system can also examine web visitation patterns of the user and can infer from the web visitation patterns which entities the user trusts. For example, the system can infer that a particular user trust a particular entity when the user visits the entity’s web page with a certain frequency.”

The same thing is stated in the Claims section of the patent, it’s the very first claim they make for the invention:

“A method performed by data processing apparatus, the method comprising:
determining, based on web visitation patterns of a user, one or more trust relationships indicating that the user trusts one or more entities;”

It may very well be that site visitation patterns and other user behaviors are what is meant by the “trust button” references.

Labels Generated By Trusted Sites

The patent defines trusted entities as news sites, blogs, forums, and review sites, but not limited to those kinds of sites, it could be any other kind of website.

Trusted websites create references to other sites and in that reference they label those other sites as being relevant to a particular topic. That label could be an anchor text. But it could be something else.

The patent explicitly mentions anchor text only once:

“In some cases, an entity may simply create a link from its site to a particular item of web content (e.g., a document) and provide a label 107 as the anchor text of the link.”

Although it only explicitly mentions anchor text once, there are other passages where it anchor text is strongly implied, for example, the patent offers a general description of labels as describing or categorizing the content found on another site:

“…labels are words, phrases, markers or other indicia that have been associated with certain web content (pages, sites, documents, media, etc.) by others as descriptive or categorical identifiers.”

Labels And Annotations

Trusted sites link out to web pages with labels and links. The combination of a label and a link is called an annotation.

This is how it’s described:

“An annotation 106 includes a label 107 and a URL pattern associated with the label; the URL pattern can be specific to an individual web page or to any portion of a web site or pages therein.”

Labels Used In Search Queries

Users can also search with “labels” in their queries by using a non-existent “label:” advanced search query. Those kinds of queries are then used to match the labels that a website page is associated with.

This is how it’s explained:

“For example, a query “cancer label:symptoms” includes the query term “cancel” and a query label “symptoms”, and thus is a request for documents relevant to cancer, and that have been labeled as relating to “symptoms.”

Labels such as these can be associated with documents from any entity, whether the entity created the document, or is a third party. The entity that has labeled a document has some degree of trust, as further described below.”

What is that label in the search query? It could simply be certain descriptive keywords, but there aren’t any clues to speculate further than that.

The patent puts it all together like this:

“Using the annotation information and trust information from the trust database 190, the search engine 180 determines a trust factor for each document.”

Takeaway:

A user’s trust is in a website. That user-trusted website is not necessarily the one that’s ranked, it’s the website that’s linking/trusting another relevant web page. The web page that is ranked can be the one that the trusted site has labeled as relevant for a specific topic and it could be a web page in the trusted site itself. The purpose of the user signals is to provide a starting point, so to speak, from which to identify trustworthy sites.

Experts Are Trusted

Vertical Knowledge Sites, sites that users trust, can host the commentary of experts. The expert could be the publisher of the trusted site as well. Experts are important because links from expert sites are used as part of the ranking process.

Experts are defined as publishing a deep level of content on the topic:

“These and other vertical knowledge sites may also host the analysis and comments of experts or others with knowledge, expertise, or a point of view in particular fields, who again can comment on content found on the Internet.

For example, a website operated by a digital camera expert and devoted to digital cameras typically includes product reviews, guidance on how to purchase a digital camera, as well as links to camera manufacturer’s sites, new products announcements, technical articles, additional reviews, or other sources of content.

To assist the user, the expert may include comments on the linked content, such as labeling a particular technical article as “expert level,” or a particular review as “negative professional review,” or a new product announcement as ;new 10MP digital SLR’.”

Links From Expert Sites

Links and annotations from user-trusted expert sites are described as sources of trust information:

“For example, Expert may create an annotation 106 including the label 107 “Professional review” for a review 114 of Canon digital SLR camera on a web site “www.digitalcameraworld.com”, a label 107 of “Jazz music” for a CD 115 on the site “www.jazzworld.com”, a label 107 of “Classic Drama” for the movie 116 “North by Northwest” listed on website “www.movierental.com”, and a label 107 of “Symptoms” for a group of pages describing the symptoms of colon cancer on a website 117 “www.yourhealth.com”.

Note that labels 107 can also include numerical values (not shown), indicating a rating or degree of significance that the entity attaches to the labeled document.

Expert’s web site 105 can also include trust information. More specifically, Expert’s web site 105 can include a trust list 109 of entities whom Expert trusts. This list may be in the form of a list of entity names, the URLs of such entities’ web pages, or by other identifying information. Expert’s web site 105 may also include a vanity list 111 listing entities who trust Expert; again this may be in the form of a list of entity names, URLs, or other identifying information.”

Inferred Trust

The patent describes additional signals that can be used to signal (infer) trust. These are more traditional type signals like links, a list of trusted web pages (maybe a resources page?) and a list of sites that trust the website.

These are the inferred trust signals:

“(1) links from the user’s web page to web pages belonging to trusted entities;
(2) a trust list that identifies entities that the user trusts; or
(3) a vanity list which identifies users who trust the owner of the vanity page.”

Another kind of trust signal that can be inferred is from identifying sites that a user tends to visit.

The patent explains:

“The system can also examine web visitation patterns of the user and can infer from the web visitation patterns which entities the user trusts. For example, the system can infer that a particular user trusts a particular entity when the user visits the entity’s web page with a certain frequency.”

Takeaway:

That’s a pretty big signal and I believe that it suggests that promotional activities that encourage potential site visitors to discover a site and then become loyal site visitors can be helpful. For example, that kind of signal can be tracked with branded search queries. It could be that Google is only looking at site visit information but I think that branded queries are an equally trustworthy signal, especially when those queries are accompanied by labels… ding, ding, ding!

The patent also lists some kind of out there examples of inferred trust like contact/chat list data. It doesn’t say social media, just contact/chat lists.

Trust Can Decay or Increase

Another interesting feature of trust rank is that it can decay or increase over time.

The patent is straightforward about this part:

“Note that trust relationships can change. For example, the system can increase (or decrease) the strength of a trust relationship for a trusted entity. The search engine system 100 can also cause the strength of a trust relationship to decay over time if the trust relationship is not affirmed by the user, for example by visiting the entity’s web site and activating the trust button 112.”

Trust Relationship Editor User Interface

Directly after the above paragraph is a section about enabling users to edit their trust relationships through a user interface. There has never been such a thing, just like the non-existent trust button.

This is possibly a stand-in for something else. Could this trusted sites dashboard be Chrome browser bookmarks or sites that are followed in Discover? This is a matter for speculation.

Here’s what the patent says:

“The search engine system 100 may also expose a user interface to the trust database 190 by which the user can edit the user trust relationships, including adding or removing trust relationships with selected entities.

The trust information in the trust database 190 is also periodically updated by crawling of web sites, including sites of entities with trust information (e.g., trust lists, vanity lists); trust ranks are recomputed based on the updated trust information.”

What Google’s Trust Patent Is About

Google’s Search Result Ranking Based On Trust patent describes a way of leveraging user-behavior signals to understand which sites are trustworthy. The system then identifies sites that are trusted by the user-trusted sites and uses that information as a ranking signal. There is no actual trust rank metric, but there are ranking signals related to what users trust. Those signals can decay or increase based on factors like whether a user still visits those sites.

The larger takeaway is that this patent is an example of how Google is focused on user signals as a ranking source, so that they can feed that back into ranking sites that meet their needs. This means that instead of doing things because “this is what Google likes,” it’s better to go even deeper and do things because users like it. That will feed back to Google through these kinds of algorithms that measure user behavior patterns, something we all know Google uses.

Featured Image by Shutterstock/samsulalam

How Co-Citations Drive AI SEO

“Co-citations” in academia refer to a single research document that cites two or more sources. Yet web pages contain co-citations, too. Search engine optimizers have long suspected that Google relies on co-citations to identify similar sites.

We see evidence of co-citations on Google’s entity-based search results, such as lists of vendors and service providers.

Co-Citation and AI

With the launch of AI answers, co-citation is critical because all AI platforms — AI Mode, ChatGPT, Gemini, others — heavily on lists for brand and product recommendations. For the search “best CRM solutions,” for example, AI Overviews cite five sources. All are lists.

AI platforms rely on external lists for brand and product recommendations, such as this example of “top CRM solutions” in Google’s AI Overviews.

The sources do not have to link to their recommendations for large language models to cite them. Hence for AI optimization, co-occurrences (i.e., unlinked mentions) are as important as co-citations (e.g., linked mentions).

The less your brand appears on external websites, the lower its visibility in AI answers (and search). And the brands most commonly listed alongside yours — linked or not — define its relevance and visibility.

Search engines and LLMs may source different sites. Gauge your site’s visibility by finding, say, 20 listicles that reference your competitors. Check:

  • Organic search results,
  • Sources in AI Mode and AI Overviews,
  • References in ChatGPT.

Many of these may overlap. But examining 20 lists will reveal your business’s relative visibility.

Several tools can help identify co-citation opportunities.

InTheMix.ai

InTheMix.ai runs related prompts in Gemini based on an initial user-generated query, analyzes the answers, and lists the sources in them. The tool, which is free, displays the number of answers for each URL (to identify the most popular).

InTheMix.ai runs prompts in Gemini and displays the number of answers for each source.

Otterly.ai

Otterly.ai is a premium tool (with a free trial) that pulls citations for any prompt from Google’s AI Overviews, ChatGPT, and Perplexity. It also provides weekly tracking of those citations to discover opportunities.

Screenshot of an Otterly.ai list of citations

Otterly.ai pulls citations for any prompt from Google’s AI Overviews, ChatGPT, and Perplexity.

Reddit

Reddit, a top-cited source in Google and ChatGPT, is handy for researching citations of your business or competitors to know which subreddits mention your brand and competitors in the same threads.

Use AI Brand Rank’s Reddit section to analyze citations of well-established competitors.

Screenshot of AI Brand Rank's list of Udemy mentions on Reddit

AI Brand Rank displays citations on multiple platforms, including Reddit, shown here for mentions of “Udemy.”

Gauge Visibility

Access top platforms to compare mentions of your business or product with those of competitors. This will provide insight into your brand’s visibility in AI training data.

Google’s New MUVERA Algorithm Improves Search via @sejournal, @martinibuster

Google announced a new multi-vector retrieval algorithm called MUVERA that speeds up retrieval and ranking, and improves accuracy. The algorithm can be used for search, recommender systems (like YouTube), and for natural language processing (NLP).

Although the announcement did not explicitly say that it is being used in search, the research paper makes it clear that MUVERA enables efficient multi-vector retrieval at web scale, particularly by making it compatible with existing infrastructure (via MIPS) and reducing latency and memory footprint.

Vector Embedding In Search

Vector embedding is a multidimensional representation of the relationships between words, topics and phrases. It enables machines to understand similarity through patterns such as words that appear within the same context or phrases that mean the same things. Words and phrases that are related occupy spaces that are closer to each other.

  • The words “King Lear” will be close to the phrase “Shakespeare tragedy.”
  • The words “A Midsummer Night’s Dream” will occupy a space close to “Shakespeare comedy.”
  • Both “King Lear” and “A Midsummer Night’s Dream” will be located in a space close to Shakespeare.

The distances between words, phrases and concepts (technically a mathematical similarity measure) define how closely related each one is to the other. These patterns enable a machine to infer similarities between them.

MUVERA Solves Inherent Problem Of Multi-Vector Embeddings

The MUVERA research paper states that neural embeddings have been a feature of information retrieval for ten years and cites the ColBERT multi-vector model research paper from 2020 as a breakthrough but that says that it suffers from a bottleneck that makes it less than ideal.

“Recently, beginning with the landmark ColBERT paper, multi-vector models, which produce a set of embedding per data point, have achieved markedly superior performance for IR tasks. Unfortunately, using these models for IR is computationally expensive due to the increased complexity of multi-vector retrieval and scoring.”

Google’s announcement of MUVERA echoes those downsides:

“… recent advances, particularly the introduction of multi-vector models like ColBERT, have demonstrated significantly improved performance in IR tasks. While this multi-vector approach boosts accuracy and enables retrieving more relevant documents, it introduces substantial computational challenges. In particular, the increased number of embeddings and the complexity of multi-vector similarity scoring make retrieval significantly more expensive.”

Could Be A Successor To Google’s RankEmbed Technology?

The United States Department of Justice (DOJ) antitrust lawsuit resulted in testimony that revealed that one of the signals used to create the search engine results pages (SERPs) is called RankEmbed, which was described like this:

“RankEmbed is a dual encoder model that embeds both query and document into embedding space. Embedding space considers semantic properties of query and document in addition to other signals. Retrieval and ranking are then a dot product (distance measure in the embedding space)… Extremely fast; high quality on common queries but can perform poorly for tail queries…”

MUVERA is a technical advancement that addresses the performance and scaling limitations of multi-vector systems, which themselves are a step beyond dual-encoder models (like RankEmbed), providing greater semantic depth and handling of tail query performance.

The breakthrough is a technique called Fixed Dimensional Encoding (FDE), which divides the embedding space into sections and combines the vectors that fall into each section to create a single, fixed-length vector, making it faster to search than comparing multiple vectors. This allows multi-vector models to be used efficiently at scale, improving retrieval speed without sacrificing the accuracy that comes from richer semantic representation.

According to the announcement:

“Unlike single-vector embeddings, multi-vector models represent each data point with a set of embeddings, and leverage more sophisticated similarity functions that can capture richer relationships between datapoints.

While this multi-vector approach boosts accuracy and enables retrieving more relevant documents, it introduces substantial computational challenges. In particular, the increased number of embeddings and the complexity of multi-vector similarity scoring make retrieval significantly more expensive.

In ‘MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings’, we introduce a novel multi-vector retrieval algorithm designed to bridge the efficiency gap between single- and multi-vector retrieval.

…This new approach allows us to leverage the highly-optimized MIPS algorithms to retrieve an initial set of candidates that can then be re-ranked with the exact multi-vector similarity, thereby enabling efficient multi-vector retrieval without sacrificing accuracy.”

Multi-vector models can provide more accurate answers than dual-encoder models but this accuracy comes at the cost of intensive compute demands. MUVERA solves the complexity issues of multi-vector models, thereby creating a way to achieve greater accuracy of multi-vector approaches without the the high computing demands.

What Does This Mean For SEO?

MUVERA shows how modern search ranking increasingly depends on similarity judgments rather than old-fashioned keyword signals that SEO tools and SEOs are often focused on. SEOs and publishers may wish to shift their attention from exact phrase matching toward aligning with the overall context and intent of the query. For example, when someone searches for “corduroy jackets men’s medium,” a system using MUVERA-like retrieval is more likely to rank pages that actually offer those products, not pages that simply mention “corduroy jackets” and include the word “medium” in an attempt to match the query.

Read Google’s announcement:

MUVERA: Making multi-vector retrieval as fast as single-vector search

Featured Image by Shutterstock/bluestork