Wix Introduces Harmony AI Website Builder via @sejournal, @martinibuster

Wix announced the launch of Wix Harmony, a new AI-powered website builder that combines natural language site creation with Wix’s existing visual editor. The company introduced the product in New York and said it will begin rolling out in English to users in the coming weeks. Wix positions Harmony as a tool designed to produce fully functional, production-ready websites rather than quick demos, addressing common tradeoffs between fast site creation and developing deployment-ready websites.

Wix has steadily expanded its use of artificial intelligence across its product line, aiming to streamline the process of maintaining an online presence while keeping users in control of how their sites look and function. Wix Harmony represents the company’s most direct effort to integrate AI into the core website-building workflow rather than treating it as a separate feature.

Aria Is Wix Harmony’s Interface

Wix Harmony’s interface is Aria, an AI agent that responds to natural language instructions and applies changes directly within the Wix editor. Users can ask Aria to perform tasks ranging from visual changes such as adjusting colors or layouts, to adding commerce features or redesigning entire pages. Because Aria operates within Wix’s existing architecture, Wix says changes made in one area of a site will not disrupt other sections or introduce unintended behavior.

Video Of Wix Harmony

Switch Between Manual And AI Workflows

An interesting feature of Wix Harmony is that it enables users to move back and forth between AI-assisted creation and manual editing without rebuilding elements from scratch. A user can generate a page or section through a prompt, then manually fine-tune spacing, layout, and content using drag-and-drop controls. Wix describes this as a way to speed up site creation while keeping design decisions in human control, rather than locking users into AI-generated outputs. This is a truly thoughtful implementation of AI that is flexible to how their users may want to use AI.

Delivers Sites That Are Ready To Deploy

Wix is positioning Harmony as a solution that is capable of delivering websites that are ready to deploy to a live environment. By running Harmony sites on the same infrastructure as all Wix websites, the company says it can support live traffic, ongoing updates, and business operations without requiring users to migrate to a different platform.

Websites created with Wix Harmony include access to Wix’s existing business features, including online commerce, scheduling, transactions, and payments. Wix also says these sites include built-in accessibility monitoring, search optimization tools, performance support, and privacy-focused infrastructure designed to meet regulatory requirements such as GDPR. These capabilities are intended to let users launch and operate websites without adding third-party services to cover basic operational needs.

Wix co-founder and CEO Avishai Abrahami said:

“Our focus is on combining the best new technologies with modern design, and this is the power of Wix. With Wix Harmony, now anyone can create a beautiful website, design easily with prompts and natural language without sacrificing scalability, security, reliability and performance. This is the benchmark of what a website builder should be.”

Harmony is part of Wix’s thoughtful integration of AI, providing users with tools that businesses can use to their benefit. Read more about Wix Harmony.

Featured Image by Wix

How Recommender Systems Like Google Discover May Work via @sejournal, @martinibuster

Google Discover is largely a mystery to publishers and the search marketing community even though Google has published official guidance about what it is and what they feel publishers should know about it. Nevertheless, it’s so mysterious that it’s generally not even considered as a recommender system, yet that is what it is. This is a review of a classic research paper that shows how to scale a recommender system. Although it’s for YouTube, it’s not hard to imagine how this kind of system can be adapted to Google Discover.

Recommender Systems

Google Discover belongs to the class of systems known as a recommender systems. A classic recommender system I remember is the MovieLens system from way back in 1997. It is a university science department project that allowed users to rate movies and it would use those ratings to recommend movies to watch. The way it worked is like, people who tend to like these kinds of movies tend to also like these other kinds of movies. But these kinds of algorithms have limitations that make them fall short for the scale necessary to personalize recommendations for YouTube or Google Discover.

Two-Tower Recommender System Model

The modern style of recommender systems are sometimes referred to as the Two-Tower architecture or the Two-Tower model. The Two-Tower model came about as a solution for YouTube, even though the original research paper (Deep Neural Networks for YouTube Recommendations) does not use this term.

It may seem counterintuitive to look to YouTube to understand how the Google Discover algorithm works, but the fact is that the system Google developed for YouTube became the foundation for how to scale a recommender system for an environment where massive amounts of content are generated every hour of the day, 24 hours a day.

It’s called the Two-Tower architecture because there are two representations that are matched against each other, like two towers.

In this model, which handles the initial “retrieval” of content from the database, a neural network processes user information to produce a user embedding, while content items are represented by their own embeddings. These two representations are matched using similarity scoring rather than being combined inside a single network.

I’m going to repeat that the research paper does not refer to the architecture as a Two-Tower architecture, it’s a description for this kind of approach that was created later. So, while the research paper doesn’t use the word tower, I’m going to continue using it as it makes it easier to visualize what’s going on in this kind of recommender system.

User Tower
The User Tower processes things like a user’s watch history, search tokens, location, and basic demographics. It uses this data to create a vector representation that maps the user’s specific interests in a mathematical space.

Item Tower
The Item Tower represents content using learned embedding vectors. In the original YouTube implementation, these were trained alongside the user model and stored for fast retrieval. This allows the system to compare a user’s “coordinates” against millions of video “coordinates” instantly, without having to run a complex analysis on every single video each time you refresh your feed.

The Fresh Content Problem

Google’s research paper offers an interesting take on freshness. The problem of freshness is described as a tradeoff between exploitation and exploration. The YouTube recommendation system has to balance between showing users content that is already known to be popular (exploitation) versus exposing them to new and unproven content (exploration). What motivates Google to show new but unproven content, at least for the context of YouTube, is that users show a strong preference for new and fresh content.

The research paper explains why fresh content is important:

“Many hours worth of videos are uploaded each second to YouTube. Recommending this recently uploaded (“fresh”) content is extremely important for YouTube as a product. We consistently observe that users prefer fresh content, though not at the expense of relevance.”

This tendency to show fresh content seems to hold true for Google Discover, where Google tends to show fresh content on topics that users are personally trending with. Have you ever noticed how Google Discover tends to favor fresh content? The insights that the researchers had about user preferences probably carry over to the Google Discover recommendation system. The takeaway here is that producing content on a regular basis could be helpful for getting web pages surfaced in Google Discover.

An interesting insight in this research paper, and I don’t know if it’s still true but it’s still interesting, is that the researchers state that machine learning algorithms show an implicit biased toward older existing content because they are trained on historical data.

They explain:

“Machine learning systems often exhibit an implicit bias towards the past because they are trained to predict future behavior from historical examples.”

The neural network is trained on past videos and they learn that things from one or two days ago were popular. But this creates a bias for things that happened in the past. The way they solved the freshness issue is when the system is recommending videos to a user (serving), this time-based feature is set to zero days ago (or slightly negative). This signals to the model that it is making a prediction at the very end of the training window, essentially forcing it to predict what is popular right now rather than what was popular on average in the past.

Accuracy Of Click Data

Google’s foundational research paper also provides insights about implicit user feedback signals, which is a reference to click data. The researchers say that this kind of data rarely provides accurate user satisfaction information.

The researchers write:

“Noise: Historical user behavior on YouTube is inherently difficult to predict due to sparsity and a variety of unobservable external factors. We rarely obtain the ground truth of user satisfaction and instead model noisy implicit feedback signals. Furthermore, metadata associated with content is poorly structured without a well defined ontology. Our algorithms need
to be robust to these particular characteristics of our training data.”

The researchers conclude the paper by stating that this approach to recommender systems helped increase user watch time and proved to be more effective than other systems.

They write:

“We have described our deep neural network architecture for recommending YouTube videos, split into two distinct problems: candidate generation and ranking.
Our deep collaborative filtering model is able to effectively assimilate many signals and model their interaction with layers of depth, outperforming previous matrix factorization approaches used at YouTube.

We demonstrated that using the age of the training example as an input feature removes an inherent bias towards the past and allows the model to represent the time-dependent behavior of popular of videos. This improved offline holdout precision results and increased the watch time dramatically on recently uploaded videos in A/B testing.

Ranking is a more classical machine learning problem yet our deep learning approach outperformed previous linear and tree-based methods for watch time prediction. Recommendation systems in particular benefit from specialized features describing past user behavior with items. Deep neural networks require special representations of categorical and continuous features which we transform with embeddings and quantile normalization, respectively.”

Although this research paper is ten years old, it still offers insights into how recommender systems work and takes a little of the mystery out of recommender systems like Google Discover. Read the original research paper: Deep Neural Networks for YouTube Recommendations

Featured Image by Shutterstock/Andrii Iemelianenko

NotificationX WordPress WooCommerce Plugin Vulnerabilities Impact 40k Sites via @sejournal, @martinibuster

A vulnerability advisory was published for the NotificationX FOMO plugin for WordPress and WooCommerce sites, affecting more than 40,000 websites. The vulnerability, which is rated at a 7.2 (High) severity level, enables unauthenticated attackers to inject malicious JavaScript that can execute in a visitor’s browser when specific conditions are met.

NotificationX – FOMO Plugin

The NotificationX FOMO plugin is used by WordPress and WooCommerce site owners to display notification bars, popups, and real-time alerts such as recent sales, announcements, and promotional messages. The plugin is commonly deployed on marketing and e-commerce sites to create urgency and draw visitor attention through notifications.

Exposure Level

The vulnerability does not require any authentication or acquire any user role before launching an attack. Attackers do not need a WordPress account or any prior access to the site to trigger the vulnerability. Exploitation relies on getting a victim to visit a specially crafted page that interacts with the vulnerable site.

Root Cause Of The Vulnerability

The issue is a DOM-based Cross-Site Scripting (XSS) vulnerability tied to how the plugin processes preview data. In the context of a WordPress plugin vulnerability, DOM-based Cross-Site Scripting (XSS) vulnerability happens when a WordPress plugin contains client-side JavaScript that processes data from an untrusted source (the “source”) in an unsafe way, usually by writing the data to the web page (the “sink”).

In the context of the NotificationX plugin, the vulnerability exists because the plugin’s scripts accepts input through the nx-preview POST parameter, but does not properly sanitize the input or escape the output before it is rendered in the browser. Security checks that are supposed to check that user-supplied data is treated as plain text are missing. This allows an attacker to create a malicious web page that automatically submits a form to the victim’s site, forcing the victim’s browser to execute harmful scripts injected via that parameter.

The end result is that an attacker-controlled input can be interpreted as executable JavaScript instead of harmless preview content.

What Attackers Can Do

If exploited, the vulnerability enables attackers to execute arbitrary JavaScript in the context of the affected site. The injected script executes when a user visits a malicious page that automatically submits a form to the vulnerable NotificationX site.

This can allow attackers to:

  • Hijack logged-in administrator or editor sessions
  • Perform actions on behalf of authenticated users
  • Redirect visitors to malicious or fraudulent websites
  • Access sensitive information available through the browser

The official Wordfence advisory explains:

“The NotificationX – FOMO, Live Sales Notification, WooCommerce Sales Popup, GDPR, Social Proof, Announcement Banner & Floating Notification Bar plugin for WordPress is vulnerable to DOM-Based Cross-Site Scripting via the ‘nx-preview’ POST parameter in all versions up to, and including, 3.2.0. This is due to insufficient input sanitization and output escaping when processing preview data. This makes it possible for unauthenticated attackers to inject arbitrary web scripts in pages that execute when a user visits a malicious page that auto-submits a form to the vulnerable site.”

Affected Versions

All versions of NotificationX up to and including 3.2.0 are vulnerable. A patch is available and the vulnerability was addressed in NotificationX version 3.2.1, which includes security enhancements related to this issue.

Recommended Action

Site owners using NotificationX are recommended to update their plugin immediately to version 3.2.1 or later. Sites that cannot update should disable the plugin until the patched version can be applied. Leaving vulnerable versions active exposes visitors and logged-in users to client-side attacks that can be difficult to detect and mitigate.

One More Vulnerability

This plugin has another vulnerability that is rated 4.3 medium threat level.  The Wordfence advisory for this one describes it like this:

“The NotificationX plugin for WordPress is vulnerable to unauthorized modification of data due to a missing capability check on the ‘regenerate’ and ‘reset’ REST API endpoints in all versions up to, and including, 3.1.11. This makes it possible for authenticated attackers, with Contributor-level access and above, to reset analytics for any NotificationX campaign, regardless of ownership.”

The NotificationX WordPress plugin includes two REST API endpoints called “regenerate” and “reset.” These endpoints are used to manage campaign analytics, such as resetting or rebuilding the stats that show how a notification is performing.

The problem is that these endpoints do not properly check user permissions for modifying data. In this case, the plugin only checks whether a user is logged in with Contributor-level access or higher, not whether they are actually allowed to perform the action. Even though users with the Contributor level role normally have very limited permissions, this flaw lets them perform actions they should not be able to do.

In this case, the damage that an attacker can do is limited. For example, an attacker can’t take over a site. Updated to version 3.2.1 or higher (same as the other vulnerability) will patch this vulnerability.

An attacker can:

  • Reset analytics for any NotificationX campaign
  • Do this even if they did not create or own the campaign
  • Repeatedly wipe or regenerate campaign statistics

Featured Image by Shutterstock/Art Furnace

WordPress Advanced Custom Fields Extended Plugin Vulnerability via @sejournal, @martinibuster

An advisory was published about a vulnerability in the popular Advanced Custom Fields: Extended WordPress plugin that is rated 9.8, affecting up to 100,000 installations.

The flaw enables unauthenticated attackers to register themselves with administrator privileges and gain full control of a website and all settings.

Advanced Custom Fields: Extended Plugin

The Advanced Custom Fields: Extended plugin is an add-on to the popular Advanced Custom Fields Pro plugin. It is used by WordPress site owners and developers to extend how custom fields work, manage front-end forms, create options pages, define custom post types and taxonomies, and customize the WordPress admin experience.

The plugin is widely used, with more than 100,000 active installations, and is commonly deployed on sites that rely on front-end forms and advanced content management workflows.

Who Can Exploit This Vulnerability

This vulnerability can be exploited by unauthenticated attackers, which means there is no barrier of first having to attain a higher permission level before launching an attack. If the affected version of the plugin is present with a specific configuration in place, anyone on the internet can attempt to exploit the flaw. That kind of exposure significantly increases risk because it removes the need for compromised credentials or insider access.

Privilege Escalation Exposure

The vulnerability is a privilege escalation flaw caused by missing role restrictions during user registration.

Specifically, the plugin’s insert_user function does not limit which user roles can be assigned when a new user account is created by anyone. Under normal circumstances, WordPress should strictly control which roles users can select or be assigned during registration.

Because this check is missing, an attacker can submit a registration request that explicitly assigns the administrator role to the new account.

This issue only occurs when the site’s form configuration maps a custom field directly to the WordPress role field. When that condition is met, the plugin accepts the supplied role value without verifying that it is safe or permitted.

The flaw appears to be due to insufficient server-side validation of the form field “Choices.” The plugin seems to have relied on the the HTML form to restrict which roles a user could select. For example, the developer could create a user sign up form with only the “subscriber” role as an option. But there was no verification on the backend to check if the user role the subscriber was signing up with matched the user roles that the form is supposed to be limited to.

What was probably happening is that an unauthenticated attacker could inspect the form’s HTML, see the field responsible for the user role, and intercept the HTTP request so that, for example, instead of sending role=subscriber, the attacker could change the value to role=administrator. The code responsible for the insert_user action took this input and passed it directly to WordPress user creation functions. It did not check if “administrator” was actually one of the allowed options in the field’s “Choices” list.

The Changelog for the plugin lists the following entry as one of the patches to the plugin:

“Enforced front-end fields validation against their respective “Choices” settings.”

That entry in the changelog means the plugin now actively checks front-end form submissions to ensure the submitted value matches the field’s defined “Choices”, rather than trusting whatever value is posted.

There is also this entry in the changelog:

“Module: Forms – Added security measure for forms allowing user role selection”

This entry means the plugin added server-side protections to prevent abuse when a front-end form is allowed to set or select a WordPress user role.

Overall, the patches to the plugin added stronger validation controls for front-end forms plus made them more configurable.

What Attackers Can Gain

If successfully exploited, the attacker gains administrator-level access to the WordPress site.

That level of access allows attackers to:

  • Install or modify plugins and themes
  • Inject malicious code
  • Create backdoor administrator accounts
  • Steal or manipulate site data
  • Redirect visitors or distribute malware

Gaining administrator access is a full site takeover.

The Wordfence advisory describes the issue as follows:

“The Advanced Custom Fields: Extended plugin for WordPress is vulnerable to Privilege Escalation in all versions up to, and including, 0.9.2.1. This is due to the ‘insert_user’ function not restricting the roles with which a user can register. This makes it possible for unauthenticated attackers to supply the ‘administrator’ role during registration and gain administrator access to the site.”

As Wordfence describes, the plugin trusts user-supplied input for account roles when it should not. That trust allows attackers to bypass WordPress’s normal protections and grant themselves the highest possible permission level.

Wordfence also reports having blocked active exploitation attempts targeting this vulnerability, indicating that attackers are already probing sites for exposure.

Conditions Required For Exploitation

The vulnerability is not automatically exploitable on every site running the plugin.

Exploitation requires that:

  • The site uses a front-end form provided by the plugin
  • The form maps a custom field directly to the WordPress user role

Patch Status and What Site Owners Should Do

The vulnerability affects all versions up to and including 0.9.2.1. The issue is addressed in version 0.9.2.2, which introduces additional validation and security checks around front-end forms and user role handling.

The entry for the official changelog for ACF Extended Basic 0.9.2.2:

  • Module: Forms – Enforced front-end fields validation against their respective “Choices” settings
  • Module: Forms – Added security measure for forms allowing user role selection
  • Module: Forms – Added acfe/form/validate_value hook to validate fields individually on front
  • Module: Forms – Added acfe/form/pre_validate_value hook to bypass enforced validation

Site owners using this plugin should update immediately to the latest patched version. If updating is not possible, the plugin should be disabled until the fix can be applied.

Given the severity of the flaw and the lack of authentication required to exploit it, delaying action leaves affected sites exposed to a complete takeover.

Featured Image by Shutterstock/Art Furnace

OpenAI Search Crawler Passes 55% Coverage In Hostinger Study via @sejournal, @MattGSouthern

Hostinger analyzed 66 billion bot requests across more than 5 million websites and found that AI crawlers are following two different paths.

LLM training bots are losing access to the web as more sites block them. Meanwhile, AI assistant bots that power search tools like ChatGPT are expanding their reach.

The analysis draws on anonymized server logs from three 6-day windows, with bot classification mapped to AI.txt project classifications.

Training Bots Are Getting Blocked

The starkest finding involves OpenAI’s GPTBot, which collects data for model training. Its website coverage dropped from 84% to 12% over the study period.

Meta’s ExternalAgent was the largest training-category crawler by request volume in Hostinger’s data. Hostinger says this training-bot group shows the strongest declines overall, driven in part by sites blocking AI training crawlers.

These numbers align with patterns I’ve tracked through multiple studies. BuzzStream found that 79% of top news publishers now block at least one training bot. Cloudflare’s Year in Review showed GPTBot, ClaudeBot, and CCBot had the highest number of full disallow directives across top domains.

The data quantifies what those studies suggested. Hostinger interprets the drop in training-bot coverage as a sign that more sites are blocking those crawlers, even when request volumes remain high.

Assistant Bots Tell a Different Story

While training bots face resistance, the bots that power AI search tools are expanding access.

OpenAI’s OAI-SearchBot, which fetches content for ChatGPT’s search feature, reached 55.67% average coverage. TikTok’s bot grew to 25.67% coverage with 1.4 billion requests. Apple’s bot reached 24.33% coverage.

These assistant crawls are user-triggered and more targeted. They serve users directly rather than collecting training data, which may explain why sites treat them differently.

Classic Search Remains Stable

Traditional search engine crawlers held steady throughout the study. Googlebot maintained 72% average coverage with 14.7 billion requests. Bingbot stayed at 57.67% coverage.

The stability contrasts with changes in the AI category. Google’s main crawler faces a unique position since blocking it affects search visibility.

SEO Tools Show Decline

SEO and marketing crawlers saw declining coverage. Ahrefs maintained the largest footprint at 60% coverage, but the category overall shrank. Hostinger attributes this to two factors. These tools increasingly focus on sites actively doing SEO work. And website owners are blocking resource-intensive crawlers.

I reported on the resource concerns when Vercel data showed GPTBot generating 569 million requests in a single month. For some publishers, the bandwidth costs became a business problem.

Why This Matters

The data confirms a pattern that’s been building over the past year. Site operators are drawing a line between AI crawlers they’ll allow and those they won’t.

The decision comes down to function. Training bots collect content to improve models without sending traffic back. Assistant bots fetch content to answer specific user questions, which means they can surface your content in AI search results.

Hostinger suggests a middle path: block training bots while allowing assistant bots that drive discovery. This lets you participate in AI search without contributing to model training.

Looking Ahead

OpenAI recommends allowing OAI-SearchBot if you want your site to appear in ChatGPT search results, even if you block GPTBot.

OpenAI’s documentation clarifies the difference. OAI-SearchBot controls inclusion in ChatGPT search results and respects robots.txt. ChatGPT-User handles user-initiated browsing and may not be governed by robots.txt in the same way.

Hostinger recommends checking server logs to see what’s actually hitting your site, then making blocking decisions based on your goals. If you’re concerned about server load, you can use CDN-level blocking. If you want to potentially increase your AI visibility, review current AI crawler user agents and allow only the specific bots that support your strategy.


Featured Image: BestForBest/Shutterstock

Perplexity AI Interview Explains How AI Search Works via @sejournal, @martinibuster

I recently spoke with Jesse Dwyer of Perplexity about SEO and AI search about what SEOs should be focusing on in terms of optimizing for AI search. His answers offered useful feedback about what publishers and SEOs should be focusing on right now.

AI Search Today

An important takeaway that Jesse shared is that personalization is completely changing

“I’d have to say the biggest/simplest thing to remember about AEO vs SEO is it’s no longer a zero sum game. Two people with the same query can get a different answer on commercial search, if the AI tool they’re using loads personal memory into the context window (Perplexity, ChatGPT).

A lot of this comes down to the technology of the index (why there actually is a difference between GEO and AEO). But yes, it is currently accurate to say (most) traditional SEO best practices still apply.”

The takeaway from Dwyer’s response is that search visibility is no longer about a single consistent search result. Personal context as a role in AI answers means that two users can receive significantly different answers to the same query with possibly different underlying content sources.

While the underlying infrastructure is still a classic search index, SEO still plays a role in determining whether content is eligible to be retrieved at all. Perplexity AI is said to use a form of PageRank, which is a link-based method of determining the popularity and relevance of websites, so that provides a hint about some of what SEOs should be focusing on.

However, as you’ll see, what is retrieved is vastly different than in classic search.

I followed up with the following question:

So what you’re saying (and correct me if I’m wrong or slightly off) is that Classic Search tends to reliably show the same ten sites for a given query. But for AI search, because of the contextual nature of AI conversations, they’re more likely to provide a different answer for each user.

Jesse answered:

“That’s accurate yes.”

Sub-document Processing: Why AI Search Is Different

Jesse continued his answer by talking about what goes on behind the scenes to generate an answer in AI search.

He continued:

“As for the index technology, the biggest difference in AI search right now comes down to whole-document vs. “sub-document” processing.

Traditional search engines index at the whole document level. They look at a webpage, score it, and file it.

When you use an AI tool built on this architecture (like ChatGPT web search), it essentially performs a classic search, grabs the top 10–50 documents, then asks the LLM to generate a summary. That’s why GPT search gets described as “4 Bing searches in a trenchcoat” —the joke is directionally accurate, because the model is generating an output based on standard search results.

This is why we call the optimization strategy for this GEO (Generative Engine Optimization). That whole-document search is essentially still algorithmic search, not AI, since the data in the index is all the normal page scoring we’re used to in SEO. The AI-first approach is known as “sub-document processing.”

Instead of indexing whole pages, the engine indexes specific, granular snippets (not to be confused with what SEO’s know as “featured snippets”). A snippet, in AI parlance, is about 5-7 tokens, or 2-4 words, except the text has been converted into numbers, (by the fundamental AI process known as a “transformer”, which is the T in GPT). When you query a sub-document system, it doesn’t retrieve 50 documents; it retrieves about 130,000 tokens of the most relevant snippets (about 26K snippets) to feed the AI.

Those numbers aren’t precise, though. The actual number of snippets always equals a total number of tokens that matches the full capacity of the specific LLM’s context window. (Currently they average about 130K tokens). The goal is to completely fill the AI model’s context window with the most relevant information, because when you saturate that window, you leave the model no room to ‘hallucinate’ or make things up.

In other words, it stops being a creative generator and delivers a more accurate answer. This sub-document method is where the industry is moving, and why it is more accurate to be called AEO (Answer Engine Optimization).

Obviously this description is a bit of an oversimplification. But the personal context that makes each search no longer a universal result for every user is because the LLM can take everything it knows about the searcher and use that to help fill out the full context window. Which is a lot more info than a Google user profile.

The competitive differentiation of a company like Perplexity, or any other AI search company that moves to sub-document processing, takes place in the technology between the index and the 26K snippets. With techniques like modulating compute, query reformulation, and proprietary models that run across the index itself, we can get those snippets to be more relevant to the query, which is the biggest lever for getting a better, richer answer.

Btw, this is less relevant to SEO’s, but this whole concept is also why Perplexity’s search API is so legit. For devs building search into any product, the difference is night and day.”

Dwyer contrasts two fundamentally different indexing and retrieval approaches:

  • Whole-document indexing, where pages are retrieved and ranked as complete units.
  • Sub-document indexing, where meaning is stored and retrieved as granular fragments.

In the first version, AI sits on top of traditional search and summarizes ranked pages. In the second, the AI system retrieves fragments directly and never reasons over full documents at all.

He also described that answer quality is constrained by context-window saturation, that accuracy emerges from filling the model’s entire context window with relevant fragments. When retrieval succeeds at saturating that window, the model has little capacity to invent facts or hallucinate.

Lastly, he says that “modulating compute, query reformulation, and proprietary models” is part of their secret sauce for retrieving snippets that are highly relevant to the search query.

Featured Image by Shutterstock/Summit Art Creations

Why Agentic AI May Flatten Brand Differentiators via @sejournal, @martinibuster

James LePage, Dir Engineering AI, co-lead of the WordPress AI Team, described the future of the Agentic AI Web, where websites become interactive interfaces and data sources and the value add that any site brings to their site becomes flattened. Although he describes a way out of brand and voice getting flattened, the outcome for informational, service, and media sites may be “complex.”

Evolution To Autonomy

One of the points that LePage makes is that of agentic autonomy and how that will impact what it means to have an online presence. He maintains that humans will still be in the loop but at a higher and less granular level, where agentic AI interactions with websites are at the tree level dealing with the details and the humans are at the forest level dictating the outcome they’re looking for.

LePage writes:

“Instead of approving every action, users set guidelines and review outcomes.”

He sees agentic AI progressing on an evolutionary course toward greater freedom with less external control, also known as autonomy. This evolution is in three stages.

He describes the three levels of autonomy:

  1. What exists now is essentially Perplexity-style web search with more steps: gather content, generate synthesis, present to user. The user still makes decisions and takes actions.
  2. Near-term, users delegate specific tasks with explicit specifications, and agents can take actions like purchases or bookings within bounded authority.
  3. Further out, agents operate more autonomously based on standing guidelines, becoming something closer to economic actors in their own right.”

AI Agents May Turn Sites Into Data Sources

LePage sees the web in terms of control, with Agentic AI experiences taking control of how the data is represented to the user. The user experience and branding is removed and the experience itself is refashioned by the AI Agent.

He writes:

“When an agent visits your website, that control diminishes. The agent extracts the information it needs and moves on. It synthesizes your content according to its own logic. It represents you to its user based on what it found, not necessarily how you’d want to be represented.

This is a real shift. The entity that creates the content loses some control over how that content is presented and interpreted. The agent becomes the interface between you and the user.

Your website becomes a data source rather than an experience.”

Does it sound problematic that websites will turn into data sources? As you’ll see in the next paragraph, LePage’s answer for that situation is to double down on interactions and personalization via AI, so that users can interact with the data in ways that are not possible with a static website.

These are important insights because they’re coming from the person who is the director of AI engineering at Automattic and co-leads the team in charge of coordinating AI integration within the WordPress core.

AI Will Redefine Website Interactions

LePage, who is the co-lead of WordPress’s AI Team, which coordinates AI-related contributions to the WordPress core, said that AI will enable websites to offer increasingly personalized and immersive experiences. Users will be able to interact with the website as a source of data refined and personalized for the individual’s goals, with website-side AI becoming the differentiator.

He explained:

“Humans who visit directly still want visual presentation. In fact, they’ll likely expect something more than just content now. AI actually unlocks this.

Sites can create more immersive and personalized experiences without needing a developer for every variation. Interactive data visualizations, product configurators, personalized content flows. The bar for what a “visit” should feel like is rising.

When AI handles the informational layer, the experiential layer becomes a differentiator.”

That’s an important point right there because it means that if AI can deliver the information anywhere (in an agent user interface, an AI generated comparison tool, a synthesized interactive application), then information alone stops separating you from everyone else.

In this kind of future, what becomes the differentiator, your value add, is the website experience itself.

How AI Agents May Negatively Impact Websites

LePage says that Agentic AI is a good fit for commercial websites because they are able to do comparisons and price checks and zip through the checkout. He says that it’s a different story for informational sites, calling it “more complex.”

Regarding the phrase “more complex,” I think that’s a euphemism that engineers use instead of what they really mean: “You’re probably screwed.”

Judge for yourself. Here’s how LePage explains websites lose control over the user experience:

“When an agent visits your website, that control diminishes. The agent extracts the information it needs and moves on. It synthesizes your content according to its own logic. It represents you to its user based on what it found, not necessarily how you’d want to be represented.

This is a real shift. The entity that creates the content loses some control over how that content is presented and interpreted. The agent becomes the interface between you and the user. Your website becomes a data source rather than an experience.

For media and services, it’s more complex. Your brand, your voice, your perspective, the things that differentiate you from competitors, these get flattened when an agent summarizes your content alongside everyone else’s.”

For informational websites, the website experience can be the value add but that advantage is eliminated by Agentic AI and unlike with ecommerce transactions where sales are the value exchange, there is zero value exchange since nobody is clicking on ads, much less viewing them.

Alternative To Flattened Branding

LePage goes on to present an alternative to brand flattening by imagining a scenario where websites themselves wield AI Agents so that users can interact with the information in ways that are helpful, engaging, and useful. This is an interesting thought because it represents what may be the biggest evolutionary step in website presence since responsive design made websites engaging regardless of device and browser.

He explains how this new paradigm may work:

“If agents are going to represent you to users, you might need your own agent to represent you to them.

Instead of just exposing static content and hoping the visiting agent interprets it well, the site could present a delegate of its own. Something that understands your content, your capabilities, your constraints, and your preferences. Something that can interact with the visiting agent, answer its questions, present information in the most effective way, and even negotiate.

The web evolves from a collection of static documents to a network of interacting agents, each representing the interests of their principal. The visiting agent represents the user. The site agent represents the entity. They communicate, they exchange information, they reach outcomes.

This isn’t science fiction. The protocols are being built. MCP is now under the Linux Foundation with support from Anthropic, OpenAI, Google, Microsoft, and others. Agent2Agent is being developed for agent-to-agent communication. The infrastructure for this kind of web is emerging.”

What do you think about the part where a site’s AI agent talks to a visitor’s AI agent and communicates “your capabilities, your constraints, and your preferences,” as well as how your information will be presented? There might be something here, and depending on how this is worked out, it may be something that benefits publishers and keeps them from becoming just a data source.

AI Agents May Force A Decision: Adaptation Versus Obsolescence

LePage insists that publishers, which he calls entities, that evolve along with the Agentic AI revolution will be the ones that will be able to have the most effective agent-to-agent interactions, while those that stay behind will become data waiting to be scraped .

He paints a bleak future for sites that decline to move forward with agent-to-agent interactions:

“The ones that don’t will still exist on the web. But they’ll be data to be scraped rather than participants in the conversation.”

What LePage describes is a future in which product and professional service sites can extract value from agent-to-agent interactions. But the same is not necessarily true for informational sites that users depend on for expert reviews, opinions, and news. The future for them looks “complex.”

Head Of WordPress AI Team Explains SEO For AI Agents via @sejournal, @martinibuster

James LePage, Director Engineering AI at Automattic, and the co-lead of the WordPress AI Team, shared his insights into things publishers should be thinking about in terms of SEO. He’s the founder and co-lead of the WordPress Core AI Team, which is tasked with coordinating AI-related projects within WordPress, including how AI agents will interact within the WordPress ecosystem. He shared insights into what’s coming to the web in the context of AI agents and some of the implications for SEO.

AI Agents And Infrastructure

The first observation that he made was that AI agents will use the same web infrastructure as search engines. The main point he makes is that the data that the agents are using comes from the regular classic search indexes.

He writes, somewhat provocatively:

“Agents will use the same infrastructure the web already has.

  • Search to discover relevant entities.
  • “Domain authority” and trust signals to evaluate sources.
  • Links to traverse between entities.
  • Content to understand what each entity offers.

I find it interesting how much money is flowing into AIO and GEO startups when the underlying way agents retrieve information is by using existing search indexes. ChatGPT uses Bing. Anthropic uses Brave. Google uses Google. The mechanics of the web don’t change. What changes is who’s doing the traversing.”

AI SEO = Longtail Optimization

LePage also said that schema structured data, semantic density, and interlinking between pages is essential for optimizing for AI agents. Notable is that he said that AI optimization that AIO and GEO companies are doing is just basic longtail query optimization.

He explained:

“AI intermediaries doing synthesis need structured, accessible content. Clear schemas, semantic density, good interlinking. This is the challenge most publishers are grappling with now. In fact there’s a bit of FUD in this industry. Billions of dollars flowing into AIO and GEO when much of what AI optimization really is is simply long-tail keyword search optimization.”

What Optimized Content Looks Like For AI Agents

LePage, who is involved in AI within the WordPress ecosystem, said that content should be organized in an “intentional” manner for agent consumption, by which he means structured markdown, semantic markup, and content that’s easy to understand.

A little further he explains what he believes content should look like for AI agent consumption:

“Presentations of content that prioritize what matters most. Rankings that signal which information is authoritative versus supplementary. Representations that progressively disclose detail, giving agents the summary first with clear paths to depth. All of this still static, not conversational, not dynamic, but shaped with agent traversal in mind.

Think of it as the difference between a pile of documents and a well-organized briefing. Both contain the same information. One is far more useful to someone trying to quickly understand what you offer.”

A little later in the article he offers a seemingly contradictory prediction of the role of content in an agentic AI future, reversing today’s formula of a well organized briefing over a pile of documents, saying that agentic AI will not need a website, just the content, a pile of documents.

Nevertheless, he recommends that content have structure so that the information is well organized at the page level with clear hierarchical structure and at the site level as well where interlinking makes the relationships between documents clearer. He emphasizes that the content must communicate what it’s for.

He then adds that in the future websites will have AI agents that communicate with external AI agents, which gets into the paradigm he mentioned of content being split off from the website so that the data can be displayed in ways that make sense for a user, completely separated from today’s concept of visiting a website.

He writes:

“Think of this as a progression. What exists now is essentially Perplexity-style web search with more steps: gather content, generate synthesis, present to user. The user still makes decisions and takes actions. Near-term, users delegate specific tasks with explicit specifications, and agents can take actions like purchases or bookings within bounded authority. Further out, agents operate more autonomously based on standing guidelines, becoming something closer to economic actors in their own right.

The progression is toward more autonomy, but that doesn’t mean humans disappear from the loop. It means the loop gets wider. Instead of approving every action, users set guidelines and review outcomes.

…Before full site delegates exist, there’s a middle ground that matters right now.

The content an agent has access to can be presented in a way that makes sense for how agents work today. Currently, that means structured markdown, clean semantic markup, content that’s easy to parse and understand. But even within static content, there’s room to be intentional about how information is organized for agent consumption.”

His article, titled Agents & The New Internet (3/5), provides useful ideas of how to prepare for the agentic AI future.

Featured Image by Shutterstock/Blessed Stock

Google’s Mueller: Free Subdomain Hosting Makes SEO Harder via @sejournal, @MattGSouthern

Google’s John Mueller warns that free subdomain hosting services create unnecessary SEO challenges, even for sites doing everything else right.

The advice came in response to a Reddit post from a publisher whose site shows up in Google but doesn’t appear in normal search results, despite using Digitalplat Domains, a free subdomain service on the Public Suffix List.

What’s Happening

Mueller told the site owner that they likely aren’t making technical mistakes. The problem is the environment they chose to publish in.

He wrote:

“A free subdomain hosting service attracts a lot of spam & low-effort content. It’s a lot of work to maintain a high quality bar for a website, which is hard to qualify if nobody’s getting paid to do that.”

The issue comes down to association. Sites on free hosting platforms share infrastructure with whatever else gets published there. Search engines struggle to differentiate quality content from the noise surrounding it.

Mueller added:

“For you, this means you’re basically opening up shop on a site that’s filled with – potentially – problematic ‘flatmates’. This makes it harder for search engines & co to understand the overall value of the site – is it just like the others, or does it stand out in a positive way?”

He also cautioned against cheap TLDs for similar reasons. The same dynamics apply when entire domain extensions become overrun with low-quality content.

Beyond domain choice, Mueller pointed to content competition as a factor. The site in question publishes on a topic already covered extensively by established publishers with years of work behind them.

“You’re publishing content on a topic that’s already been extremely well covered. There are sooo many sites out there which offer similar things. Why should search engines show yours?”

Why This Matters

Mueller’s advice here fits a pattern I’ve covered repeatedly over the years. Previously, Google’s Gary Illyes warned against cheap TLDs for the same reason. Illyes put it bluntly at the time, telling publishers that when a TLD is overrun by spam, search engines might not want to pick up sitemaps from those domains.

The free subdomain situation creates a unique problem. While the Public Suffix List theoretically tells Google to treat these subdomains as separate sites, the neighborhood signal remains strong. If the vast majority of subdomains on that host are spam, Google’s systems may struggle to identify your site as the one diamond in the rough.

This matters for anyone considering free hosting as a way to test an idea before investing in a real domain. The test environment itself becomes the test. Search engines evaluate your site in the context of everything else published under that same domain.

The competitive angle also deserves attention. New sites on well-covered topics face a high bar regardless of domain choice. Mueller’s point about established publishers having years of work behind them is a reality check about where the effort needs to go.

Looking Ahead

Mueller suggested that search visibility shouldn’t be the first priority for new publishers.

“If you love making pages with content like this, and if you’re sure that it hits what other people are looking for, then I’d let others know about your site, and build up a community around it directly. Being visible in popular search results is not the first step to becoming a useful & popular web presence, and of course not all sites need to be popular.”

For publishers starting out, focus on building direct traffic through promotion and community engagement. Search visibility tends to follow after a site establishes itself through other channels.


Featured Image: Jozef Micic/Shutterstock

Google On Phantom Noindex Errors In Search Console via @sejournal, @martinibuster

Google’s John Mueller recently answered a question about phantom noindex errors reported in Google Search Console. Mueller asserted that these reports may be real.

Noindex In Google Search Console

A noindex robots directive is one of the few commands that Google must obey, one of the few ways that a site owner can exercise control over Googlebot, Google’s indexer.

And yet it’s not totally uncommon for search console to report being unable to index a page because of a noindex directive that seemingly does not have a noindex directive on it, at least none that is visible in the HTML code.

When Google Search Console (GSC) reports “Submitted URL marked ‘noindex’,” it is reporting a seemingly contradictory situation:

  • The site asked Google to index the page via an entry in a Sitemap.
  • The page sent Google a signal not to index it (via a noindex directive).

It’s a confusing message from Search Console that a page is preventing Google from indexing it when that’s not something the publisher or SEO can observe is happening at the code level.

The person asking the question posted on Bluesky:

“For the past 4 months, the website has been experiencing a noindex error (in ‘robots’ meta tag) that refuses to disappear from Search Console. There is no noindex anywhere on the website nor robots.txt. We’ve already looked into this… What could be causing this error?”

Noindex Shows Only For Google

Google’s John Mueller answered the question, sharing that there were always a noindex showing to Google on the pages he’s examined where this kind of thing was happening.

Mueller responded:

“The cases I’ve seen in the past were where there was actually a noindex, just sometimes only shown to Google (which can still be very hard to debug). That said, feel free to DM me some example URLs.”

While Mueller didn’t elaborate on what can be going on, there are ways to troubleshoot this issue to find out what’s going on.

How To Troubleshoot Phantom Noindex Errors

It’s possible that there is a code somewhere that is causing a noindex to show just for Google. For example, it may have happened that a page at one time had a noindex on it and a server-side cache (like a caching plugin) or a CDN (like Cloudflare) has cached the HTTP headers from that time, which in turn would cause the old noindex header to be shown to Googlebot (because it frequently visits the site) while serving a fresh version to the site owner.

Checking the HTTP Header is easy, there are many HTTP header checkers like this one at KeyCDN or this one at SecurityHeaders.com.

A 520 server header response code is one that’s sent by Cloudflare when it’s blocking a user agent.

Screenshot: 520 Cloudflare Response Code

Screenshot showing a 520 error response code

Below is a screenshot of a 200 server response code generated by cloudflare:

Screenshot: 200 Server Response Code

I checked the same URL using two different header checkers, with one header checker returning a a 520 (blocked) server response code and the other header checker sending a 200 (OK) response code. That shows how differently Cloudflare can respond to something like a header checker. Ideally, try checking with several header checkers to see if there’s a consistent 520 response from Cloudflare.

In the situation where a web page is showing something exclusively to Google that is otherwise not visible to someone looking at the code, what you need to do is to get Google to look at the page for you using an actual Google crawler and from a Google IP address. The way to do this is by dropping the URL into Google’s Rich Results Test. Google will dispatch a crawler from a Google IP address and if there’s something on the server (or a CDN) that’s showing a noindex, this will catch it. In addition to the structured data, the Rich Results test will also provide the HTTP response and a snapshot of the web page showing exactly what the server shows to Google.

When you run a URL through the Google Rich Results Test, the request:

  • Originates from Google’s Data Centers: The bot uses an actual Google IP address.
  • Passes Reverse DNS Checks: If the server, security plugin, or CDN checks the IP, it will resolve back to googlebot.com or google.com.

If the page is blocked by noindex, the tool will be unable to provide any structured data results. It should provide a status saying “Page not eligible” or “Crawl failed”. If you see that, click a link for “View Details” or expand the error section. It should show something like “Robots meta tag: noindex” or ‘noindex’ detected in ‘robots’ meta tag”.

This approach does not send the GoogleBot user agent, it uses the Google-InspectionTool/1.0 user agent string. That means if the server block is by IP address then this method will catch it.

Another angle to check is for the situation where a rogue noindex tag is specifically written to block GoogleBot, you can still spoof (mimic) the GoogleBot user agent string with Google’s own User Agent Switcher extension for Chrome or configure an app like Screaming Frog set to identify itself with the GoogleBot user agent and that should catch it.

Screenshot: Chrome User Agent Switcher

Phantom Noindex Errors In Search Console

These kinds of errors can feel like a pain to diagnose but before you throw your hands up in the air take some time to see if any of the steps outlined here will help identify the hidden reason that’s responsible for this issue.

Featured Image by Shutterstock/AYO Production