The Classifier Layer: Spam, Safety, Intent, Trust Stand Between You And The Answer via @sejournal, @DuaneForrester

Most people still think visibility is a ranking problem. That worked when discovery lived in 10 blue links. It breaks down when discovery happens inside an answer layer.

Answer engines have to filter aggressively. They are assembling responses, not returning a list. They are also carrying more risk. A bad result can become harmful advice, a scam recommendation, or a confident lie delivered in a friendly tone. So the systems that power search and LLM experiences rely on classification gates long before they decide what to rank or what to cite.

If you want to be visible in the answer layer, you need to clear those gates.

SSIT is a simple way to name what’s happening. Spam, Safety, Intent, Trust. Four classifier jobs sit between your content and the output a user sees. They sort, route, and filter long before retrieval, ranking, or citation.

The Classifier Layer: Spam, Safety, Intent, Trust Stand Between You and the Answer

Spam: The Manipulation Gate

Spam classifiers exist to catch scaled manipulation. They are upstream and unforgiving, and if you trip them, you can be suppressed before relevance even enters the conversation.

Google is explicit that it uses automated systems to detect spam and keep it out of search results. It also describes how those systems evolve over time and how manual review can complement automation.

Google has also named a system directly in its spam update documentation. SpamBrain is described as an AI-based spam prevention system that it continually improves to catch new spam patterns.

For SEOs, spam detection behaves like pattern recognition at scale. Your site gets judged as a population of pages, not a set of one-offs. Templates, footprints, link patterns, duplication, and scaling behavior all become signals. That’s why spam hits often feel unfair. Single pages look fine; the aggregate looks engineered.

If you publish a hundred pages that share the same structure, phrasing, internal links, and thin promise, classifiers see the pattern.

Google’s spam policies are a useful map of what the spam gate tries to prevent. Read them like a spec for failure modes, then connect each policy category to a real pattern on your site that you can remove.

Manual actions remain part of this ecosystem. Google documents that manual actions can be applied when a human reviewer determines a site violates its spam policies.

There is an uncomfortable SEO truth hiding in this. If your growth play relies on behaviors that resemble manipulation, you are betting your business on a classifier not noticing, not learning, and not adapting. That is not a stable bet.

Safety: The Harm And Fraud Gate

Safety classifiers are about user protection. They focus on harm, deception, and fraud. They do not care if your keyword targeting is perfect, but they do care if your experience looks risky.

Google has made public claims about major improvements in scam detection using AI, including catching more scam pages and reducing specific forms of impersonation scams.

Even if you ignore the exact numbers, the direction is clear. Safety classification is a core product priority, and it shapes visibility hardest where users can be harmed financially, medically, or emotionally.

This is where many legitimate sites accidentally look suspicious. Safety classifiers are conservative, and they work at the level of pattern and context. Monetization-heavy layouts, thin lead gen pages, confusing ownership, aggressive outbound pushes, and inflated claims can all resemble common scam patterns when they show up at scale.

If you operate in affiliate, lead gen, local services, finance, health, or any category where scams are common, you should assume the safety gate is strict. Then build your site so it reads as legitimate without effort.

That comes down to basic trust hygiene.

Make ownership obvious. Use consistent brand identifiers across the site. Provide clear contact paths. Be transparent about monetization. Avoid claims that cannot be defended. Include constraints and caveats in the content itself, not hidden in a footer.

If your site has ever been compromised, or if you operate in a neighborhood of risky outbound links, you also inherit risk. Safety classifiers treat proximity as a signal because threat actors cluster. Cleaning up your link ecosystem and site security is no longer only a technical responsibility; it’s visibility defense.

Intent: The Routing Gate

Intent classification determines what the system believes the user is trying to accomplish. That decision shapes the retrieval path, the ranking behavior, the format of the answer, and which sources get pulled into the response.

This matters more as search shifts from browsing sessions to decision sessions. In a list-based system, the user can correct the system by clicking a different result. In an answer system, the system makes more choices on the user’s behalf.

Intent classification is also broader than the old SEO debates about informational versus transactional. Modern systems try to identify local intent, freshness intent, comparative intent, procedural intent, and high-stakes intent. These intent classes change what the system considers helpful and safe. In fact, if you deep-dive into “intents,” you’ll find that so many more don’t even fit into our crisply defined, marketing-designed boxes. Most marketers build for maybe three to four intents. The systems you’re trying to win in often operate with more, and research taxonomies already show how intent explodes into dozens of types when you measure real tasks instead of neat categories.

If you want consistent visibility, make intent alignment obvious and commit each page to a primary task.

  • If a page is a “how to,” make it procedural. Lead with the outcome. Present steps. Include requirements and failure modes early.
  • If a page is a “best options” piece, make it comparative. Define your criteria. Explain who each option fits and who it does not.
  • If a page is local, behave like a local result. Include real local proof and service boundaries. Remove generic filler that makes the page look like a template.
  • If a page is high-stakes, be disciplined. Avoid sweeping guarantees. Include evidence trails. Use precise language. Make boundaries explicit.

Intent clarity also helps across classic ranking systems, and it can help reduce pogo behavior and improve satisfaction signals. More importantly for the answer layer, it gives the system clean blocks to retrieve and use.

Trust: The “Should We Use This” Gate

Trust is the gate that decides whether content is used, how much it is used, and whether it is cited. You can be retrieved and still not make the cut. You can be used and still not be cited. You can show up one day and disappear the next because the system saw slightly different context and made different selections.

Trust sits at the intersection of source reputation, content quality, and risk.

At the source level, trust is shaped by history. Domain behavior over time, link graph context, brand footprint, author identity, consistency, and how often the site is associated with reliable information.

At the content level, trust is shaped by how safe it is to quote. Specificity matters. Internal consistency matters. Clear definitions matter. Evidence trails matter. So does writing that makes it hard to misinterpret.

LLM products also make classification gates explicit in their developer tooling. OpenAI’s moderation guide documents classification of text and images for safety purposes, so developers can filter or intervene.

Even if you are not building with APIs, the existence of this tooling reflects the reality of modern systems. Classification happens before output, and policy compliance influences what can be surfaced. For SEOs, the trust gate is where most AI optimization advice gets exposed. Sounding authoritative is easy, but being safe to use takes precision, boundaries, evidence, and plain language.

It also comes in blocks that can stand alone.

Answer engines extract. They reassemble, and they summarize. That means your best asset is a self-contained unit that still makes sense when it is pulled out of the page and placed into a response.

A good self-contained block typically includes a clear statement, a short explanation, a boundary condition, and either an example or a source reference. When your content has those blocks, it becomes easier for the system to use it without introducing risk.

How SSIT Flows Together In The Real World

In practice, the gates stack.

First, the system evaluates whether a site and its pages look spammy or manipulative. This can affect crawl frequency, indexing behavior, and ranking potential. Next, it evaluates whether the content or experience looks risky. In some categories, safety checks can suppress visibility even when relevance is high. Then it evaluates intent. It decides what the user wants and routes retrieval accordingly. If your page does not match the intent class cleanly, it becomes less likely to be selected.

Finally, it evaluates trust for usage. That is where decisions get made about quoting, citing, summarizing, or ignoring. The key point for AI optimization is not that you should try to game these gates. The point is that you should avoid failing them.

Most brands lose visibility in the answer layer for boring reasons. They look like scaled templates. They hide important legitimacy signals. They publish vague content that is hard to quote safely. They try to cover five intents in one page and satisfy none of them fully.

If you address those issues, you are doing better “AI optimization” than most teams chasing prompt hacks.

Where “Classifiers Inside The Model” Fit, Without Turning This Into A Computer Science Lecture

Some classification happens inside model architectures as routing decisions. Mixture of Experts approaches are a common example, where a routing mechanism selects which experts process a given input to improve efficiency and capability. NVIDIA also provides a plain-language overview of Mixture of Experts as a concept.

This matters because it reinforces the broader mental model. Modern AI systems rely on routing and gating at multiple layers. Not every gate is directly actionable for SEO, but the presence of gates is the point. If you want predictable visibility, you build for the gates you can influence.

What To Do With This, Practical Moves For SEOs

Start by treating SSIT like a diagnostic framework. When visibility drops in an answer engine, do not jump straight to “ranking.” Ask where you might have failed in the chain.

Spam Hygiene Improvements

Audit at the template level. Look for scaled patterns that resemble manipulation when aggregated. Remove doorway clusters and near-duplicate pages that do not add unique value. Reduce internal link patterns that exist only to sculpt anchors. Identify pages that exist only to rank and cannot defend their existence as a user outcome.

Use Google’s spam policy categories as the baseline for this audit, because they map to common classifier objectives.

Safety Hygiene Improvements

Assume conservative filtering in categories where scams are common. Strengthen legitimacy signals on every page that asks for money, personal data, a phone call, or a lead. Make ownership and contact information easy to find. Use transparent disclosures. Avoid inflated claims. Include constraints inside the content.

If you publish in YMYL-adjacent categories, tighten your editorial standards. Add sourcing. Track updates. Remove stale advice. Safety gates punish stale content because it can become harmful.

Intent Hygiene Improvements

Choose the primary job of the page and make it obvious in the first screen. Align the structure to the task. A procedural page should read like a procedure. A comparison page should read like a comparison. A local page should prove locality.

Do not rely on headers and keywords to communicate this. Make it obvious in sentences that a system can extract.

Trust Hygiene Improvements

Build citeable blocks that stand on their own. Use tight definitions. Provide evidence trails. Include boundaries and constraints. Avoid vague, sweeping statements that cannot be defended. If your content is opinion-led, label it as such and support it with rationale. If your content is claim-led, cite sources or provide measurable examples.

This is also where authorship and brand footprint matter. Trust is not only on-page. It is the broader set of signals that tell systems you exist in the world as a real entity.

SSIT As A Measurement Mindset

If you are building or buying “AI visibility” reporting, SSIT changes how you interpret what you see.

  • A drop in presence can mean a spam classifier dampened you.
  • A drop in citations can mean a trust classifier avoided quoting you.
  • A mismatch between retrieval and usage can mean intent misalignment.
  • A category-level invisibility can mean safety gating.

That diagnostic framing matters because it leads to fixes you can execute. It also stops teams from thrashing, rewriting everything, and hoping the next version sticks.

SSIT also keeps you grounded. It is tempting to treat AI optimization as a new discipline with new hacks. Most of it is not hacks. It is hygiene, clarity, and trust-building, applied to systems that filter harder than the old web did. That’s the real shift.

The answer layer is not only ranking content, but it’s also selecting content. That selection happens through classifiers that are trained to reduce risk and improve usefulness. When you plan for Spam, Safety, Intent, and Trust, you stop guessing. You start designing content and experiences that survive the gates.

That is how you earn a place in the answer layer, and keep it.

More Resources:


This post was originally published on Duane Forrester Decodes.


Featured Image: Olga_TG/Shutterstock

From Performance SEO To Demand SEO via @sejournal, @TaylorDanRW

AI is fundamentally changing what doing SEO means. Not just in how results are presented, but in how brands are discovered, understood, and trusted inside the very systems people now rely on to learn, evaluate, and make decisions. This forces a reassessment of our role as SEOs, the tools and frameworks we use, and the way success is measured beyond legacy reporting models that were built for a very different search environment.

Continuing to rely on vanity metrics rooted in clicks and rankings no longer reflects reality, particularly as people increasingly encounter and learn about brands without ever visiting a website.

For most of its history, SEO focused on helping people find you within a static list of results. Keywords, content, and links existed primarily to earn a click from someone who already recognized a need and was actively searching for a solution.

AI disrupts that model by moving discovery into the answer itself, returning a single synthesized response that references only a small number of brands, which naturally reduces overall clicks while simultaneously increasing the number of brand touchpoints and moments of exposure that shape perception and preference. This is not a traffic loss problem, but a demand creation opportunity. Every time a brand appears inside an AI-generated answer, it is placed directly into the buyer’s mental shortlist, building mental availability even when the user has never encountered the brand before.

Why AI Visibility Creates Demand, Not Just Traffic

Traditional SEO excelled at capturing existing demand by supporting users as they moved through a sequence of searches that refined and clarified a problem before leading them towards a solution.

AI now operates much earlier in that journey, shaping how people understand categories, options, and tradeoffs before they ever begin comparing vendors, effectively pulling what we used to think of as middle and bottom-of-funnel activity further upstream. People increasingly use AI to explore unfamiliar spaces, weigh alternatives, and design solutions that fit their specific context, which means that when a brand is repeatedly named, explained, or referenced, it begins to influence how the market defines what good looks like.

This repeated exposure builds familiarity over time, so that when a decision moment eventually arrives, the brand feels known and credible rather than new and untested, which is demand generation playing out inside the systems people already trust and use daily.

Unlike above-the-line advertising, this familiarity is built natively within tools that have become deeply embedded in everyday life through smartphones, assistants, and other connected devices, making this shift not only technical but behavioral, rooted in how people now access and process information.

How This Changes The Role Of SEO

As AI systems increasingly summarize, filter, and recommend on behalf of users, SEO has to move beyond optimizing individual pages and instead focus on making a brand easy for machines to understand, trust, and reuse across different contexts and queries.

This shift is most clearly reflected in the long-running move from keywords to entities, where keywords still matter but are no longer the primary organizing principle, because AI systems care more about who a brand is, what it does, where it operates, and which problems it solves.

That pushes modern SEO towards clearly defined and consistently expressed brand boundaries, where category, use cases, and differentiation are explicit across the web, even when that creates tension with highly optimized commercial landing pages.

AI systems rely heavily on trust signals such as citations, consensus, reviews, and verifiable facts, which means traditional ranking factors still play a role, but increasingly as proof points that an AI system can safely rely on when constructing answers. When an AI cannot confidently answer basic questions about a brand, it hesitates to recommend it, whereas when it can, that brand becomes a dependable component it can repeatedly draw upon.

This changes the questions SEO teams need to ask, shifting focus away from rankings alone and toward whether content genuinely shapes category understanding, whether trusted publishers reference the brand, and whether information about the brand remains consistent wherever it appears.

Narrative control also changes, because where brands once shaped their story through pages in a list of results, AI now tells the story itself, requiring SEOs to work far more closely with brand and communication teams to reinforce simple, consistent language and a small number of clear value propositions that AI systems can easily compress into accurate summaries.

What Brands Need To Do Differently

Brands need to stop starting their strategies with keywords and instead begin by assessing their strength and clarity as an entity, looking at what search engines and other systems already understand about them and how consistent that understanding really is.

The most valuable AI moments occur long before a buyer is ready to compare vendors, at the point where they are still forming opinions about the problem space, which means appearing by name in those early exploratory questions allows a brand to influence how the problem itself is framed and to build mental availability before any shortlist exists.

Achieving that requires focus rather than breadth, because trying to appear in every possible conversation dilutes clarity, whereas deliberately choosing which problems and perspectives to own creates stronger and more coherent signals for AI systems to work with.

This represents a move away from chasing as many keywords as possible in favor of standardizing a simple brand story that uses clear language everywhere, so that what you do, who it is for, and why it matters can be expressed in one clean, repeatable sentence.

This shift also demands a fundamental change in how SEO success is measured and reported, because if performance continues to be judged primarily through rankings and clicks, AI visibility will always look underwhelming, even though its real impact happens upstream by shaping preference and intent over time.

Instead, teams need to look at patterns across branded search growth, direct traffic, lead quality, and customer outcomes, because when reporting reflects that broader reality, it becomes clear that as AI visibility grows, demand follows, repositioning SEO from a purely tactical channel into a strategic lever for long-term growth.

More Resources:


Featured Image: Roman Samborskyi/Shutterstock

What The Data Shows About Local Rankings In 2026 [Webinar] via @sejournal, @hethr_campbell

Reputation Signals Now Matter More Than Reviews Alone

Positive reviews are no longer the primary fast path to the top of local search results. 

As Google Local Pack and Maps continue to evolve, reputation signals are playing a much larger role in how businesses earn visibility. At the same time, AI tools are emerging as a new entry point for local discovery, changing how brands are cited, mentioned, and recommended.

Join Alexia Platenburg, Senior Product Marketing Manager at GatherUp, for a data-driven look at the local SEO signals shaping visibility today. In this session, she will break down how modern reputation signals influence rankings and what scalable, defensible reputation programs look like for local SEO agencies and multi-location brands.

You will walk away with a clear framework for using reputation as a true visibility and ranking lever, not just a step toward conversion. The session connects reviews, owner responses, and broader reputation signals to measurable outcomes across Google Local Pack, Maps, and AI-powered discovery.

What You’ll Learn

  • How review volume, velocity, ratings, and owner responses influence Local Pack and Maps rankings
  • The reputation signals AI tools use to cite or mention local businesses
  • How to protect your brand from fake reviews before they impact trust at scale

Why Attend?

This webinar offers a practical, evidence-based view of how reputation management is shaping local visibility in 2026. You will gain clear guidance on what matters now, what to prioritize, and how to build trust signals that support long-term local growth.

Register now to learn how reputation is driving local visibility, trust, and growth in 2026.

🛑 Can’t attend live? Register anyway, and we’ll send you the on-demand recording after the webinar.

Ask an Expert: Should Merchants Block AI Bots?

“Ask an Expert” is an occasional series where we pose questions to seasoned ecommerce pros. For this installment, we’ve turned to Scot Wingo, a serial ecommerce entrepreneur most recently of ReFiBuy, a generative engine optimization platform, and before that, ChannelAdvisor, the marketplace management firm.

He addresses tactics for managing genAI bots.

Practical Ecommerce: Should ecommerce merchants monitor and even block AI agents that crawl their sites?

Scot Wingo: It’s a nuanced and strategic decision essential to every merchant.

Scot Wingo

Scot Wingo

The four agentic commerce experiences — ChatGPT (Instant Checkout, Agentic Commerce Protocol), Google Gemini (Universal Commerce Protocol), Microsoft Copilot (Copilot Checkout, ACP), and Perplexity (PayPal, Instant Buy) — have nearly 1 billion combined monthly active users. With Google transitioning from traditional search to AI Mode, that number will dramatically increase.

For merchants, the opportunity is as big or bigger than Amazon or any other marketplace.

Merchants should embrace AI agents and ensure access to the entire product catalog.

But genAI models need more than access. Agentic commerce thrives not just on extensive attributes but also on the products’ applications and use cases. Merchants should expand attributes beyond what’s shown on product detail pages and provide essential context via a deep and wide question-and-answer section that includes common shopper queries. It enables the models to match consumer prompts with relevant recommendations, driving sales to those merchants.

The time for action is now. Gemini’s shift to AI Mode means zero-click searches will increase, likely producing 20-30% fewer clicks (and revenue) in 2026.

Synthetic Personas For Better Prompt Tracking via @sejournal, @Kevin_Indig

Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

We all know prompt tracking is directional. The most effective way to reduce noise is to track prompts based on personas.

This week, I’m covering:

  • Why AI personalization makes traditional “track the SERP” models incomplete, and how synthetic personas fill the gap.
  • The Stanford validation data showing 85% accuracy at one-third the cost, and how Bain cut research time by 50-70%.
  • The five-field persona card structure and how to generate 15-30 trackable prompts per segment across intent levels.
The best way to make your prompt tracking much more accurate is to base it on personas. Synthetic Personas speed you up at a fraction of the price. (Image Credit: Kevin Indig)

A big difference between classic and AI search is that the latter delivers highly personalized results.

  • Every user gets different answers based on their context, history, and inferred intent.
  • The average AI prompt is ~5x longer than classic search keywords (23 words vs. 4.2 words), conveying much richer intent signals that AI models use for personalization.
  • Personalization creates a tracking problem: You can’t monitor “the” AI response anymore because each prompt is essentially unique, shaped by individual user context.

Traditional persona research solves this – you map different user segments and track responses for each – but it creates new problems. It takes weeks to conduct interviews and synthesize findings.

By the time you finish, the AI models have changed. Personas become stale documentation that never gets used for actual prompt tracking.

Synthetic personas fill the gap by building user profiles from behavioral and profiling data: analytics, CRM records, support tickets, review sites. You can spin up hundreds of micro-segment variants and interact with them in natural language to test how they’d phrase questions.

Most importantly: They are the key to more accurate prompt tracking because they simulate actual information needs and constraints.

The shift: Traditional personas are descriptive (who the user is), synthetic personas are predictive (how the user behaves). One documents a segment, the other simulates it.

Image Credit: Kevin Indig

Example: Enterprise IT buyer persona with job-to-be-done “evaluate security compliance” and constraint “need audit trail for procurement” will prompt differently than an individual user with the job “find cheapest option” and constraint “need decision in 24 hours.”

  • First prompt: “enterprise project management tools SOC 2 compliance audit logs.”
  • Second prompt: “best free project management app.”
  • Same product category, completely different prompts. You need both personas to track both prompt patterns.

Build Personas With 85% Accuracy For One-Third Of The Price

Stanford and Google DeepMind trained synthetic personas on two-hour interview transcripts, then tested whether the AI personas could predict how those same real people would answer survey questions later.

  • The method: Researchers conducted follow-up surveys with the original interview participants, asking them new questions. The synthetic personas answered the same questions.
  • Result: 85% accuracy. The synthetic personas replicated what the actual study participants said.
  • For context, that’s comparable to human test-retest consistency. If you ask the same person the same question two weeks apart, they’re about 85% consistent with themselves.

The Stanford study also measured how well synthetic personas predicted social behavior patterns in controlled experiments – things like who would cooperate in trust games, who would follow social norms, and who would share resources fairly.

The correlation between synthetic persona predictions and actual participant behavior was 98%. This means the AI personas didn’t just memorize interview answers; they captured underlying behavioral tendencies that predicted how people would act in new situations.

Bain & Company ran a separate pilot that showed comparable insight quality at one-third the cost and one-half the time of traditional research methods. Their findings: 50-70% time reduction (days instead of weeks) and 60-70% cost savings (no recruiting fees, incentives, transcription services).

The catch: These results depend entirely on input data quality. The Stanford study used rich, two-hour interview transcripts. If you train on shallow data (just pageviews or basic demographics), you get shallow personas. Garbage in, garbage out.

How To Build Synthetic Personas For Better Prompt Tracking

Building a synthetic persona has three parts:

  1. Feed it with data from multiple sources about your real users: call transcripts, interviews, message logs, organic search data.
  2. Fill out the Persona Card – the five fields that capture how someone thinks and searches.
  3. Add metadata to track the persona’s quality and when it needs updating.

The mistake most teams make: trying to build personas from prompts. This is circular logic – you need personas to understand what prompts to track, but you’re using prompts to build personas. Instead, start with user information needs, then let the persona translate those needs into likely prompts.

Data Sources To Feed Synthetic Personas

The goal is to understand what users are trying to accomplish and the language they naturally use:

  1. Support tickets and community forums: Exact language customers use when describing problems. Unfiltered, high-intent signal.
  2. CRM and sales call transcripts: Questions they ask, objections they raise, use cases that close deals. Shows the decision-making process.
  3. Customer interviews and surveys: Direct voice-of-customer on information needs and research behavior.
  4. Review sites (G2, Trustpilot, etc.): What they wish they’d known before buying. Gap between expectation and reality.
  5. Search Console query data: Questions they ask Google. Use regex to filter for question-type queries:
    (?i)^(who|what|why|how|when|where|which|can|does|is|are|should|guide|tutorial|course|learn|examples?|definition|meaning|checklist|framework|template|tips?|ideas?|best|top|lists?|comparison|vs|difference|benefits|advantages|alternatives)b.*

    (I like to use the last 28 days, segment by target country)

Persona card structure (five fields only – more creates maintenance debt):

These five fields capture everything needed to simulate how someone would prompt an AI system. They’re minimal by design. You can always add more later, but starting simple keeps personas maintainable.

  1. Job-to-be-done: What’s the real-world task they’re trying to accomplish? Not “learn about X” but “decide whether to buy X” or “fix problem Y.”
  2. Constraints: What are their time pressures, risk tolerance levels, compliance requirements, budget limits, and tooling restrictions? These shape how they search and what proof they need.
  3. Success metric: How do they judge “good enough?” Executives want directional confidence. Engineers want reproducible specifics.
  4. Decision criteria: What proof, structure, and level of detail do they require before they trust information and act on it?
  5. Vocabulary: What are the terms and phrases they naturally use? Not “churn mitigation” but “keeping customers.” Not “UX optimization” but “making the site easier to use.”

Specification Requirements

This is the metadata that makes synthetic personas trustworthy; it prevents the “black box” problem.

When someone questions a persona’s outputs, you can trace back to the evidence.

These requirements form the backbone of continuous persona development. They keep track of changes, sources, and confidence in the weighting.

  • Provenance: Which data sources, date ranges, and sample sizes were used (e.g., “Q3 2024 Support Tickets + G2 Reviews”).
  • Confidence score per field: A High/Medium/Low rating for each of the five Persona Card fields, backed by evidence counts. (e.g., “Decision Criteria: HIGH confidence, based on 47 sales calls vs. Vocabulary: LOW confidence, based on 3 internal emails”).
  • Coverage notes: Explicitly state what the data misses (e.g., “Overrepresents enterprise buyers, completely misses users who churned before contacting support”).
  • Validation benchmarks: Three to five reality checks against known business truths to spot hallucinations. (e.g., “If the persona claims ‘price’ is the top constraint, does that match our actual deal cycle data?”).
  • Regeneration triggers: Pre-defined signals that it’s time to re-run the script and refresh the persona (e.g., a new competitor enters the market, or vocabulary in support tickets shifts significantly).

Where Synthetic Personas Work Best

Before you build synthetic personas, understand where they add value and where they fall short.

High-Value Use Cases

  • Prompt design for AI tracking: Simulate how different user segments would phrase questions to AI search engines (the core use case covered in this article).
  • Early-stage concept testing: Test 20 messaging variations, narrow to the top five before spending money on real research.
  • Micro-segment exploration: Understand behavior across dozens of different user job functions (enterprise admin vs. individual contributor vs. executive buyer) or use cases without interviewing each one.
  • Hard-to-reach segments: Test ideas with executive buyers or technical evaluators without needing their time.
  • Continuous iteration: Update personas as new support tickets, reviews, and sales calls come in.

Crucial Limitations Of Synthetic Personas You Need To Understand

  • Sycophancy bias: AI personas are overly positive. Real users say, “I started the course but didn’t finish.” Synthetic personas say, “I completed the course.” They want to please.
  • Missing friction: They’re more rational and consistent than real people. If your training data includes support tickets describing frustrations or reviews mentioning pain points, the persona can reference these patterns when asked – it just won’t spontaneously experience new friction you haven’t seen before.
  • Shallow prioritization: Ask what matters, and they’ll list 10 factors as equally important. Real users have a clear hierarchy (price matters 10x more than UI color).
  • Inherited bias: Training data biases flow through. If your CRM underrepresents small business buyers, your personas will too.
  • False confidence risk: The biggest danger. Synthetic personas always have coherent answers. This makes teams overconfident and skip real validation.

Operating rule: Use synthetic personas for exploration and filtering, not for final decisions. They narrow your option set. Real users make the final call.

Solving The Cold Start Problem For Prompt Tracking

Synthetic personas are a filter tool, not a decision tool. They narrow your option set from 20 ideas to five finalists. Then, you validate those five with real users before shipping.

For AI prompt tracking specifically, synthetic personas solve the cold-start problem. You can’t wait to accumulate six months of real prompt volume before you start optimizing. Synthetic personas let you simulate prompt behavior across user segments immediately, then refine as real data comes in.

Where they’ll cause you to fail is if you use them as an excuse to skip real validation. Teams love synthetic personas because they’re fast and always give answers. That’s also what makes them dangerous. Don’t skip the validation step with real customers.


Featured Image: Paulo Bobita/Search Engine Journal

Google Can Now Monitor Search For Your Government IDs via @sejournal, @MattGSouthern
  • Google’s “Results about you” tool now lets you find and request removal of search results containing government-issued IDs.
  • This includes IDs like passports, driver’s licenses, and Social Security numbers.
  • The expansion is rolling out in the U.S. over the coming days, with additional regions planned.

Google’s Results about you tool now monitors Search results for government-issued IDs like passports, driver’s licenses, and Social Security numbers.

New Data Shows Googlebot’s 2 MB Crawl Limit Is Enough via @sejournal, @martinibuster

New data based on real-world actual web pages demonstrates that Googlebot’s crawl limit of two megabytes is more than adequate. New SEO tools provide an easy way to check how much the HTML of a web page weighs.

Data Shows 2 Megabytes Is Plenty

Raw HTML is basically just a text file. For a text file to get to two megabytes it would require over two million characters.

The HTTPArchive explains what’s in the HTML weight measurement:

“HTML bytes refers to the pure textual weight of all the markup on the page. Typically it will include the document definition and commonly used on page tags such as

or . However it also contains inline elements such as the contents of script tags or styling added to other tags. This can rapidly lead to bloating of the HTML doc.”

That is the same thing that Googlebot is downloading as HTML, just the on-page markup, not the links to JavaScript or CSS.

According to the HTTPArchive’s latest report, the real-world median average size of raw HTML is 33 kilobytes. The heaviest page weight at the 90th percentile is 155 kilobytes, meaning that the HTML for 90% of sites are less than or approximately equal to 155 kilobytes in size. Only at the 100th percentile does the size of HTML explode to way beyond two megabytes, which means that pages weighing two megabytes or more are extreme outliers.

The HTTPArchive report explains:

“HTML size remained uniform between device types for the 10th and 25th percentiles. Starting at the 50th percentile, desktop HTML was slightly larger.

Not until the 100th percentile is a meaningful difference when desktop reached 401.6 MB and mobile came in at 389.2 MB.”

The data separates the home page measurements from the inner page measurements and surprisingly shows that there is little difference between the weights of either. The data is explained:

“There is little disparity between inner pages and the home page for HTML size, only really becoming apparent at the 75th and above percentile.

At the 100th percentile, the disparity is significant. Inner page HTML reached an astounding 624.4 MB—375% larger than home page HTML at 166.5 MB.”

Mobile And Desktop HTML Sizes Are Similar

Interestingly, the page sizes between mobile and desktop versions were remarkably similar, regardless of whether HTTPArchive was measuring the home page or one of the inner pages.

HTTPArchive explains:

“The size difference between mobile and desktop is extremely minor, this implies that most websites are serving the same page to both mobile and desktop users.

This approach dramatically reduces the amount of maintenance for developers but does mean that overall page weight is likely to be higher as effectively two versions of the site are deployed into one page.”

Though the overall page weight might be higher since the mobile and desktop HTML exists simultaneously in the code, as noted earlier, the actual weight is still far below the two-megabyte threshold all the way up until the 100th percentile.

Given that it takes about two million characters to push the website HTML to two megabytes and that the HTTPArchive data based on actual websites shows that the vast majority of sites are well under Googlebot’s 2 MB limit, it’s safe to say it’s okay to scratch off HTML size from the list of SEO things to worry about.

Tame The Bots

Dave Smart of Tame The Bots recently posted that they updated their tool so that it now will stop crawling at the two megabyte limit for those whose sites are extreme outliers, showing at what point Googlebot would stop crawling a page.

Smart posted:

“At the risk of overselling how much of a real world issue this is (it really isn’t for 99.99% of sites I’d imagine), I added functionality to tamethebots.com/tools/fetch-… to cap text based files to 2 MB to simulate this.”

Screenshot Of Tame The Bots Interface

The tool will show what the page will look like to Google if the crawl is limited to two megabytes of HTML. But it doesn’t show whether the tested page exceeds two megabytes, nor does it show how much the web page weighs. For that, there are other tools.

Tools That Check Web Page Size

There are a few tool sites that show the HTML size but here are two that just show the web page size. I tested the same page on each tool and they both showed roughly the same page weight, give or take a few kilobytes.

Toolsaday Web Page Size Checker

The interestingly named Toolsaday web page size checker enables users to test one URL at a time. This specific tool just does the one thing, making it easy to get a quick reading of how much a web page weights in kilobytes (or higher if the page is in the 100th percentile).

Screenshot Of Toolsaday Test Results

Small SEO Tools Website Page Size Checker

The Small SEO Tools Website Page Size Checker differs from the Toolsaday tool in that Small SEO Tools enables users to test ten URLs at a time.

Not Something To Worry About

The bottom line about the two megabyte Googlebot crawl limit is that it’s not something the average SEO needs to worry about. It literally affects a very small percentage of outliers. But if it makes you feel better, give one of the above SEO tools a try to reassure yourself or your clients.

Featured Image by Shutterstock/Fathur Kiwon

Traffic Impact of Google Discover Update

Google Discover has become a reliable traffic source for some publications. Last week, Google launched a core update to Discover in the U.S., with the global rollout coming.

Google’s Search Central blog has included “Get on Discover” guidelines since 2019, explaining its content quality requirements and traffic recovery strategies. Google revised the guidelines last week, alongside the core update.

Some requirements have not changed:

  • Titles and headlines must clearly “capture the essence of the content.”
  • Include “compelling, high-quality images,” especially those 1,200 pixels wide.
  • Address “current interests [that] tells a story well, or provides unique insights.”

Yet two requirements — clickbait avoidance and page experience — are new.

New Guidelines

Avoid clickbait

The previous guideline versions warned against “misleading or exaggerated details in preview content.” The revision moved this recommendation to the top, presumably to emphasize its importance as reflected in the core update.

The guidelines state that “clickbait” can prevent would-be readers from understanding the content and manipulate them into clicking a link.

The guidelines separately warn publishers from using “sensationalism tactics… by catering to morbid curiosity, titillation, or outrage.”

Page experience

“Provide a great page experience” is new, although it’s in keeping with Google’s traditional search algorithm, which rewards sites with stong user engagement.

Google collects page experience metrics from its Chrome browser and retains them only for high-traffic pages. Search Console shows no Core Web Vitals data for sites with little traffic.

Sites with 50% or more losses in Discover traffic should audit the user experience:

  • In Search Console, look for URLs marked “poor” in the Core Web Vitals report.
  • Evaluate how those pages load, especially on mobile devices. The headings and body text should load first, allowing users to start reading immediately.
  • Look for elements, such as ads or pop-ups, that block the content.

Traffic Impact

The revised guidelines do not address “topic authority,” yet Google’s announcement of Discover’s core update does:

Since many sites demonstrate deep knowledge across a wide range of subjects, our systems are designed to identify expertise on a topic-by-topic basis.

The focus on topical expertise suggests the update will elevate niche, authoritative sites.

Finally, the announcement states that Discover will show more local and personalized content.

Nonetheless, most ecommerce blogs have modest Discover traffic and will therefore experience little (if any) impact from the core update. Still, keep an eye on the Discover section in Search Console; switch to “weekly” stats for a current overview.

Screenshot of the Discover section in Search Console

In Search Console’s Discover section, switch to “weekly” stats for a current overview.

7 Insights From Washington Post’s Strategy To Win Back Traffic via @sejournal, @martinibuster

The Washington Post’s recent announcement of staffing cuts is a story with heroes, villains, and victims, but buried beneath the headlines is the reality of a big brand publisher confronting the same changes with Google Search that SEOs, publishers, and ecommerce stores are struggling with. The following are insights into their strategy to claw back traffic and income that could be useful for everyone seeking to stabilize traffic and grow.

Disclaimer

The Washington Post is proposing the following strategies in response to steep drops in search traffic, the rise of multi-modal content consumption, and many other factors that are fragmenting online audiences. The strategies have yet to be proven.

The value lies in analyzing what they are doing and understanding if there are any useful ideas for others.

Problem That Is Being Solved

The reasons given for the announced changes are similar to what SEOs, online stores, and publishers are going through right now because of the decline of search and the hyper-diversification of sources of information.

The memo explains:

“Platforms like Search that shaped the previous era of digital news, and which once helped The Post thrive, are in serious decline. Our organic search has fallen by nearly half in the last three years.

And we are still in the early days of AI-generated content, which is drastically reshaping user experiences and expectations.”

Those problems are the exact same ones affecting virtually all online businesses. This makes The Washington Post’s solution of interest to everyone beyond just news sites.

Problems Specific To The Washington Post

Recent reporting on The Washington Post tended to narrowly frame it in the context of politics, concerns about the concentration of wealth, and how it impacts coverage of sports, international news, and the performing arts, in addition to the hundreds of staff and reporters who lost their jobs.

The job cuts in particular are a highly specific solution applied by The Washington Post and are highly controversial. An opinion can be made that cutting some of the lower performing topics removes the very things that differentiate the website. As you will see next, Executive Editor Matt Murray justifies the cuts as listening to readers’ signals.

Challenges Affecting Everyone

If you zoom out, there is a larger pattern of how many organizations are struggling to understand where the audience has gone and how best to bring them back.

Shared Industry Challenges

  • Changes in content consumption habits
  • Decline of search
  • Rise of the creator economy
  • Growth of podcasts and video shows
  • Social media competing for audience attention
  • Rise of AI search and chat

A recent podcast interview (link to Spotify) with the executive editor of The Washington Post, Matt Murray, revealed a years-long struggle to restructure the organization’s workflow into one that:

  • Was responsive to audience signals
  • Could react in real time instead of the rigid print-based news schedule
  • Explored emerging content formats so as to evolve alongside readers
  • Produced content that is perceived as indispensable

The issues affecting the Washington Post are similar to issues affecting everyone else from recipe bloggers to big brand review sites. A key point Murray made was the changes were driven by audience signals.

Matt Murray said the following about reader signals:

“Readers in today’s world tell you what they want and what they don’t want. They have more power. …And we weren’t picking up enough of the reader signals.”

Then a little later on he again emphasized the importance of understanding reader signals:

“…we are living in a different kind of a world that is a data reader centric world. Readers send us signals on what they want. We have to meet them more where they are. That is going to drive a lot of our success.”

Whether listening to audience signals justifies cutting staff or ends up removing the things that differentiate The Washington Post remains to be seen.

For example, I used to subscribe to the print edition of The New Yorker for the articles, not for the restaurant or theater reviews yet they were still of interest to me as I liked to keep track of trends in live theater and dining. The New Yorker cartoons rarely had anything to do with the article topics and yet they were a value add. Would something like that show up in audience signals?

Build A Base Then Adapt

The memo paints what they’re doing as a foundation for building a strategy that is still evolving, not as a proven strategy. In my opinion that reflects the uncertainty introduced by the rapid decline of classic search and the knowledge that there are no proven strategies.

That uncertainty makes it more interesting to examine what a big brand organization like The Washington Post is doing to create a base strategy to start from and adapt it based on outcomes. That, in itself, is a strategy for coping with a lack of proven tactics.

Three concrete goals they are focusing on are:

  1. Attracting readers
  2. Create content that leads to subscriptions
  3. Increase engagement.

They write:

“From this foundation, we aim to build on what is working, and grow with discipline and intent, to experiment, to measure and deepen what resonates with customers.”

In the podcast interview, Murray also described the stability of a foundation as a way to nurture growth, explaining that it creates the conditions for talent to do its best work. He explains that building the foundation gives the staff the space to focus on things that work.

He explained:

“One of the reasons I wanted to get to stability, as I want room for that talent to thrive and flourish.

I also want us to develop it in a more modern multi-modal way with those that we’ve been able to do.”

A Path To Becoming Indispensable

The Washington Post memo offered insights about their strategy, with the goal stated that the brand must become indispensable to readers, naming three criteria that articles must validate against.

According to the memo:

“We can’t be everything to everyone. But we must be indispensable where we compete. That means continually asking why a story matters, who it serves and how it gives people a clearer understanding of the world and an advantage in navigating it.”

Three Criteria For Content

  1. Content must matter to site visitors.
  2. Content must have an identifiable audience.
  3. Content must provide understanding and also be applicable (useful).

Content Must Matter
Regardless of whether the content is about a product, a service, or informational, the Washington Post’s strategy states that content must strongly fulfill a specific need. For SEOs, creators, ecommerce stores, and informational content publishers, “mattering” is one of the pillars that support making a business indispensable to a site visitor and provides an advantage.

Identifiable Audience
Information doesn’t exist in a vacuum, but traditional SEO has strongly focused on keyword volume and keyword relevance, essentially treating information as existing in a space devoid of human relevance. Keyword relevance is not the same as human relevance. Keyword relevance is relevance to a keyword phrase, not relevance to a human.

This point matters because AI Chat and Search destroys the concept of keywords, because people are no longer typing in keyword phrases but are instead engaging in goal-oriented discussions.

When SEOs talk about keyword relevance, they are talking about relevance to an algorithm. Put another way, they are essentially defining the audience as an algorithm.

So, point two is really about stepping back and asking, “Why does a person need this information?”

Provide Understanding And Be Applicable
Point three states that it’s not enough for content to provide an understanding of what happened (facts). It requires that the information must make the world around the reader navigable (application of the facts).

This is perhaps the most interesting pillar of the strategy because it acknowledges that information vomit is not enough. It must be information that is utilitarian. Utilitarian in this context means that content must have some practical use.

In my opinion, an example of this principle in the context of an ecommerce site is product data. The other day I was on a fishing lure site, and the site assumed that the consumer understood how each lure is supposed to be used. It just had the name of the lure and a photo. In every case, the name of the lure was abstract and gave no indication of how the lure was to be used, under what circumstances, and what tactic it was for.

Another example is a clothing site where clothing is described as small, medium, large, and extra large, which are subjective measurements because every retailer defines small and large differently. One brand I shop at consistently labels objectively small-sized jackets as medium. Fortunately, that same retailer also provides chest, shoulder, and length measurements, which enable a user to understand exactly whether that clothing fits.

I think that’s part of what the Washington Post memo means when it says that the information should provide understanding but also be applicable. It’s that last part that makes the understanding part useful.

Three Pillars To Thriving In A Post-Search Information Economy

All three criteria are pillars that support the mandate to be indispensable and provide an advantage. Satisfying those goals help content differentiate it from information vomit, AI slop. Their strategy supports becoming a navigational entity, a destination that users specifically seek out and it helps the publisher, ecommerce store, and SEOs build an audience in order to claw back what classic search no longer provides.

Featured Image by Shutterstock/Roman Samborskyi

Google Discover for Ecommerce

As AI Overviews and shopping agents divert clicks away from traditional search results, Google Discover may provide a new and growing source of organic traffic for ecommerce merchants.

Discover is Google’s personalized, query-less content feed similar to those on X and Facebook. The Discover feed appears in Google’s mobile applications and on the main screens of Android devices. It shows articles, videos, and content that presumably interests users.

How Google selects a given article or video to appear in the Discover feed is something of a mystery, with some marketers stating that Google Discover Optimization — GDO, if you need another three-letter acronym — is significantly different from traditional organic search.

Google Discover web page

Discover is a personalized, query-less content feed similar to those on X and Facebook. Image: Google. 

Core Update

Google’s February 2026 Discover Core Update marks the first time the search engine giant changed its algorithm for Discover alone.

Google says the update improved quality. It aimed to reduce the presence of clickbait and low-value content while surfacing more in-depth, original, and timely material from sites with demonstrated expertise.

Some published reports speculated that the update devalued AI-generated content, yet Google’s concern is probably not artificial intelligence per se. Rather, it is scaled, thin, or risky AI-generated content that degrades trust.

Discover’s content is not in response to a query. Google chooses what to show folks. That choice raises the bar for accuracy, usefulness, and credibility in ways that differ from classic search results.

In a sense, the Discover update is less about ranking tweaks and more about editorial standards. Google may be limiting sensational, misleading, or mass-produced content to protect the tool’s long-term viability.

Therein lies the content marketing opportunity.

Discover’s Future

Discover launched in 2018. Until recently, it has been, for most marketers, a secondary way to boost traffic.

News publishers in particular could see significant traffic spikes when an item made its way into the feed. But optimizing for Discover did not compare to the steady, regular flow of traffic that organic search could deliver.

As AI Overviews have siphoned off that traffic, some marketers have emphasized Discover.

Google’s apparent focus has prompted widespread speculation about Discover’s future.

Discover as a home feed. Discover could become a personalized home feed for the Google ecosystem. Imagine something akin to an individualized MSN or Yahoo home page.

This home feed might include articles, videos, social content, and even data from other Google products, such as Gmail or Docs. The goal might be to keep users engaged across Google properties.

What’s more, both MSN and Yahoo have shown that such pages can drive significant ad revenue.

Personal and local experience. In its February update, Google noted that Discover would favor local or regional content. Users in the United States will see content from domestic publishers.

That could benefit retailers with physical stores, as very local content might beat out similar articles from nationwide competitors.

Multi-format, creator-centric. The Discover feed has recently featured relatively more video and creator content, especially from YouTube and social platforms.

While publishers often frame this as competition, ecommerce marketers could benefit. Product explainers, buying guides, and similar content already perform well in video and visual formats. Discover’s expansion beyond text may favor brands and retailers that invest in rich, creator-led content.

Yet merchants without creators can mimic the style and potentially win on Discover.

An interest graph, not just a feed. Some have suggested that Google treats Discover as part of a broader interest graph that informs search, recommendations, and AI-assisted experiences.

Thus content that performs well in Discover may shape Google’s understanding of user intent over time beyond the feed itself.

Discover could be upstream from traditional and AI-driven search. GDO may precede and inform SEO, GEO (generative engine optimization), and AEO (answer engine optimization).

Optimize

Google Discover deserves attention if it’s becoming a meaningful traffic channel.

Start with Google’s recommendations, which include descriptive headlines, large images, and “people-first” content. From there, marketers can experiment.

A practical approach is a testing framework. Publish consistently and track Discover performance separately in Search Console. Over time, look for editorial traits, formats, or topics that predictably earn Discover visibility and thus inform a long-term strategy.