Claude Opus 4.1 Improves Coding & Agent Capabilities via @sejournal, @MattGSouthern

Anthropic has released Claude Opus 4.1, an upgrade to its flagship model that’s said to deliver better performance in coding, reasoning, and autonomous task handling.

The new model is available now to Claude Pro users, Claude Code subscribers, and developers using the API, Amazon Bedrock, or Google Cloud’s Vertex AI.

Performance Gains

Claude Opus 4.1 scores 74.5% on SWE-bench Verified, a benchmark for real-world coding problems, and is positioned as a drop-in replacement for Opus 4.

The model shows notable improvements in multi-file code refactoring and debugging, particularly in large codebases. According to GitHub and enterprise feedback cited by Anthropic, it outperforms Opus 4 in most coding tasks.

Rakuten’s engineering team reports that Claude 4.1 precisely identifies code fixes without introducing unnecessary changes. Windsurf, a developer platform, measured a one standard deviation performance gain compared to Opus 4, comparable to the leap from Claude Sonnet 3.7 to Sonnet 4.

Expanded Use Cases

Anthropic describes Claude 4.1 as a hybrid reasoning model designed to handle both instant outputs and extended thinking. Developers can fine-tune “thinking budgets” via the API to balance cost and performance.

Key use cases include:

  • AI Agents: Strong results on TAU-bench and long-horizon tasks make the model suitable for autonomous workflows and enterprise automation.
  • Advanced Coding: With support for 32,000 output tokens, Claude 4.1 handles complex refactoring and multi-step generation while adapting to coding style and context.
  • Data Analysis: The model can synthesize insights from large volumes of structured and unstructured data, such as patent filings and research papers.
  • Content Generation: Claude 4.1 generates more natural writing and richer prose than previous versions, with better structure and tone.

Safety Improvements

Claude 4.1 continues to operate under Anthropic’s AI Safety Level 3 standard. Although the upgrade is considered incremental, the company voluntarily ran safety evaluations to ensure performance stayed within acceptable risk boundaries.

  • Harmlessness: The model refused policy-violating requests 98.76% of the time, up from 97.27% with Opus 4.
  • Over-refusal: On benign requests, the refusal rate remains low at 0.08%.
  • Bias and Child Safety: Evaluations found no significant regression in political bias, discriminatory behavior, or child safety responses.

Anthropic also tested the model’s resistance to prompt injection and agent misuse. Results showed comparable or improved behavior over Opus 4, with additional training and safeguards in place to mitigate edge cases.

Looking Ahead

Anthropic says larger upgrades are on the horizon, with Claude 4.1 positioned as a stability-focused release ahead of future leaps.

For teams already using Claude Opus 4, the upgrade path is seamless, with no changes to API structure or pricing.


Featured Image: Ahyan Stock Studios/Shutterstock

The Future Of Search: 5 Key Findings On What Buyers Really Want via @sejournal, @MattGSouthern

Search is changing, and not just because of Google updates.

Buyers are changing how they find, evaluate, and decide. They are researching in AI summaries, asking questions out loud to their phones, and converting through conversations that happen outside of what most analytics can track.

Our latest ebook, “The Future Of Search: 16 Actionable Pivots That Improve Visibility & Conversions,” explores how marketers are responding to this shift.

It offers a closer look at what it means to optimize for visibility, engagement, and results in a fragmented, AI-influenced search landscape.

Here are five key takeaways.

1. Ranking Well Doesn’t Guarantee Visibility

Getting to the top of search results used to be enough. Today, that’s no longer the case.

AI summaries, voice assistants, and platform-native answers often intercept the buyer before they reach your website.

Even high-ranking content can go unseen if it’s not structured in a way that’s easily digestible by large language models.

For example, research shows AI-generated summaries often prioritize single-sentence answers and structured formats like tables and lists.

Only a small fraction of AI citations rely on exact-match keywords, reinforcing that clarity and context are now more important than repetition.

To stay visible, businesses need to consider how their content is interpreted across multiple AI systems, not just traditional SERPs.

2. Many Conversions Happen Offscreen

Clicks and page views only tell part of the story.

High-intent actions like phone calls, text messages, and offline conversations are often left out of attribution models, yet they play a critical role in decision-making.

These touchpoints are especially common in service-based industries and B2B scenarios where buyers want real interaction.

One case study reveals that a company discovered nearly 90% of their Yelp conversions came through phone calls they weren’t tracking. Another saw appointment bookings spike after attributing organic search traffic to calls rather than clicks.

Our ebook refers to this as the insight gap, and highlights how conversation tracking helps marketers close it.

3. Listening Is More Effective Than Guessing

Marketers have access to more customer input than ever, but much of it goes unused.

Call transcripts, support calls, and chat logs contain the language buyers actually use.

Teams that analyze these conversations are gaining an edge, using real voice-of-customer insights to refine messaging, improve landing pages, and inform campaign strategy.

In one example, a marketing agency increased qualified leads by 67% simply by identifying the specific terminology customers used when asking about their services.

The shift from assumptions to evidence is helping brands prioritize what matters most, and it’s making their campaigns more effective.

4. Paid Search Works Better When It Aligns With Everything Else

Search behavior is not linear, and neither is the buyer journey.

Users often move between organic results, paid ads, and AI-generated suggestions in the same session. The strongest-performing campaigns tend to be the ones that echo the same language and value props across all these touchpoints.

That includes aligning ad copy with real customer concerns, drawing from call transcripts, and building landing pages that reflect the buyer’s stage in the decision process.

It also means rethinking what happens after the click.

5. Attribution Models Are Out Of Step With Reality

Most attribution still assumes that conversions happen on a single screen. That’s rarely true.

A manager might discover your brand in an AI-generated search snippet on a desktop, send the link to themselves in Slack, and later call your sales team from their iPhone after revisiting the content on mobile.

Marketers relying only on last-click attribution may be optimizing based on incomplete or misleading data.

The report makes the case for models that include multi-touch, cross-device, and offline activity to give a fuller picture of what drives conversions.

This isn’t about tracking more for the sake of it. It’s about making smarter decisions with the signals that matter.

Rethinking Search Starts With Rethinking Buyers

The ebook, written in collaboration with CallRail, offers more than strategy updates. It is a reminder that behind every metric is a person making a decision.

Marketers who succeed in this new environment aren’t just optimizing for rankings or clicks. They are optimizing for how people think, search, and take action.

Download the full report to explore how buyer behavior is reshaping search strategy.

the future of ai search


Featured Image: innni/Shutterstock

Perplexity Says Cloudflare Is Blocking Legitimate AI Assistants via @sejournal, @martinibuster

Perplexity published a response to Cloudflare’s claims that it disrespects robots.txt and engages in stealth crawling. Perplexity argues that Cloudflare is mischaracterizing AI Assistants as web crawlers, saying that they should not be subject to the same restrictions since they are user-initiated assistants.

Perplexity AI Assistants Fetch On Demand

According to Perplexity, its system does not store or index content ahead of time. Instead, it fetches webpages only in response to specific user questions. For example, when a user asks for recent restaurant reviews, the assistant retrieves and summarizes relevant content on demand. This, the company says, contrasts with how traditional crawlers operate, systematically indexing vast portions of the web without regard to immediate user intent.

Perplexity compared this on-demand fetching to Google’s user-triggered fetches. Although that is not an apples-to-apples comparison because Google’s user-triggered fetches are in the service of reading text aloud or site verification, it’s still an example of user-triggered fetching that bypasses robots.txt restrictions.

In the same way, Perplexity argues that its AI operates as an extension of a user’s request, not as an autonomous bot crawling indiscriminately. The company states that it does not retain or use the fetched content for training its models.

Criticizes Cloudflare’s Infrastructure

Perplexity also criticized Cloudflare’s infrastructure for failing to distinguish between malicious scraping and legitimate, user-initiated traffic, suggesting that Cloudflare’s approach to bot management risks overblocking services that are acting responsibly. Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

Perplexity makes a strong case for the claim that Cloudflare is blocking legitimate bot traffic and says that Cloudflare’s decision to block its traffic was based on a misunderstanding of how its technology works.

Read Perplexity’s response:

Agents or Bots? Making Sense of AI on the Open Web

Cloudflare Delists And Blocks Perplexity From Crawling Websites via @sejournal, @martinibuster

Cloudflare announced that they delisted Perplexity’s crawler as a verified bot and are now actively blocking Perplexity and all of its stealth bots from crawling websites. Cloudflare acted in response to multiple user complaints against Perplexity related to violations of robots.txt protocols, and a subsequent investigation revealed that Perplexity was using aggressive rogue bot tactics to force its crawlers onto websites.

Cloudflare Verified Bots Program

Cloudflare has a system called Verified Bots that whitelists bots in their system, allowing them to crawl the websites that are protected by Cloudflare. Verified bots must conform to specific policies, such as obeying the robots.txt protocols, in order to maintain their privileged status within Cloudflare’s system.

Perplexity was found to be violating Cloudflare’s requirements that bots abide by the robots.txt protocol and refrain from using IP addresses that are not declared as belonging to the crawling service.

Cloudflare Accuses Perplexity Of Using Stealth Crawling

Cloudflare observed various activities indicative of highly aggressive crawling, with the intent of circumventing the robots.txt protocol.

Stealth Crawling Behavior: Rotating IP Addresses

Perplexity circumvents blocks by using rotating IP addresses, changing ASNs, and impersonating browsers like Chrome.

Perplexity has a list of official IP addresses that crawl from a specific ASN (Autonomous System Number). These IP addresses help identify legitimate crawlers from Perplexity.

An ASN is part of the Internet networking system that provides a unique identifying number for a group of IP addresses. For example, users who access the Internet via an ISP do so with a specific IP address that belongs to an ASN assigned to that ISP.

When blocked, Perplexity attempted to evade the restriction by switching to different IP addresses that are not listed as official Perplexity IPs, including entirely different ones that belonged to a different ASN.

Stealth Crawling Behavior: Spoofed User Agent

The other sneaky behavior that Cloudflare identified was that Perplexity changed its user agent in order to circumvent attempts to block its crawler via robots.txt.

For example, Perplexity’s bots are identified with the following user agents:

  • PerplexityBot
  • Perplexity-User

Cloudflare observed that Perplexity responded to user agent blocks by using a different user agent that posed as a person crawling with Chrome 124 on a Mac system. That’s a practice called spoofing, where a rogue crawler identifies itself as a legitimate browser.

According to Cloudflare, Perplexity used the following stealth user agent:

“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36”

Cloudflare Delists Perplexity

Cloudflare announced that Perplexity is delisted as a verified bot and that they will be blocked:

“The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences. Based on Perplexity’s observed behavior, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.”

Takeaways

  • Violation Of Cloudflare’s Verified Bots Policy
    Perplexity violated Cloudflare’s Verified Bots policy, which grants crawling access to trusted bots that follow common-sense rules like honoring the robots.txt protocol.
  • Perplexity Used Stealth Crawling Tactics
    Perplexity used undeclared IP addresses from different ASNs and spoofed user agents to crawl content after being blocked from accessing it.
  • User Agent Spoofing
    Perplexity disguised its bot as a human user by posing as Chrome on a Mac operating system in attempts to bypass filters that block known crawlers.
  • Cloudflare’s Response
    Cloudflare delisted Perplexity as a Verified Bot and implemented new blocking rules to prevent the stealth crawling.
  • SEO Implications
    Cloudflare users who want Perplexity to crawl their sites may wish to check if Cloudflare is blocking the Perplexity crawlers, and, if so, enable crawling via their Cloudflare dashboard.

Cloudflare delisted Perplexity as a Verified Bot after discovering that it repeatedly violated the Verified Bots policies by disobeying robots.txt. To evade detection, Perplexity also rotated IPs, changed ASNs, and spoofed its user agent to appear as a human browser. Cloudflare’s decision to block the bot is a strong response to aggressive bot behavior on the part of Perplexity.

ChatGPT Nears 700 Million Weekly Users, OpenAI Announces via @sejournal, @MattGSouthern

OpenAI’s ChatGPT is on pace to reach 700 million weekly active users, according to a statement this week from Nick Turley, VP and head of the ChatGPT app.

The milestone marks a sharp increase from 500 million in March and represents a fourfold jump compared to the same time last year.

Turley shared the update on X, writing:

“This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year. Every day, people and teams are learning, creating, and solving harder problems. Big week ahead. Grateful to the team for making ChatGPT more useful and delivering on our mission so everyone can benefit from AI.”

How Does This Compare to Other Search Engines?

Weekly active user (WAU) counts aren’t typically shared by traditional search engines, making direct comparisons difficult. Google reports aggregate data like total queries or monthly product usage.

While Google handles billions of searches daily and reaches billions of users globally, its early growth metrics were limited to search volume.

By 2004, roughly six years after launch, Google was processing over 200 million daily searches. That figure grew to four billion daily searches by 2009, more than a decade into the company’s existence.

For Microsoft’s Bing search engine, a comparable data point came in 2023, when Microsoft reported that its AI-powered Bing Chat had reached 100 million daily active users. However, that refers to the new conversational interface, not Bing Search as a whole.

How ChatGPT’s Growth Stands Out

Unlike traditional search engines, which built their user bases during a time of limited internet access, ChatGPT entered a mature digital market where global adoption could happen immediately. Still, its growth is significant even by today’s standards.

Although OpenAI hasn’t shared daily usage numbers, reporting WAU gives us a picture of steady engagement from a wide range of users. Weekly stats tend to be a more reliable measure of product value than daily fluctuations.

Why This Matters

The rise in ChatGPT usage is evidence of a broader shift in how people find information online.

A Wall Street Journal report cites market intelligence firm Datos, which found that AI-powered tools like ChatGPT and Perplexity make up 5.6% of desktop browser searches in the U.S., more than double their share from a year earlier.

The trend is even stronger among early adopters. Among people who began using large language models in 2024, nearly 40% of their desktop browser visits now go to AI search tools. During the same period, traditional search engines’ share of traffic from these users dropped from 76% to 61%, according to Datos.

Looking Ahead

With ChatGPT on track to reach 700 million weekly users, OpenAI’s platform is now rivaling the scale of mainstream consumer products.

As AI tools become a primary starting point for queries, marketers will need to rethink how they approach visibility and engagement. Staying competitive will require strategies focused as much on AI optimization as on traditional SEO.


Featured Image: Photo Agency/Shutterstock

Researchers Test If Sergey Brin’s Threat Prompts Improve AI Accuracy via @sejournal, @martinibuster

Researchers tested whether unconventional prompting strategies, such as threatening an AI (as suggested by Google co-founder Sergey Brin), affect AI accuracy. They discovered that some of these unconventional prompting strategies improved responses by up to 36% for some questions, but cautioned that users who try these kinds of prompts should be prepared for unpredictable responses.

The Researchers

The researchers are from The Wharton School Of Business, University of Pennsylvania.

They are:

  • “Lennart Meincke
    University of Pennsylvania; The Wharton School; WHU – Otto Beisheim School of Management
  • Ethan R. Mollick
    University of Pennsylvania – Wharton School
  • Lilach Mollick
    University of Pennsylvania – Wharton School
  • Dan Shapiro
    Glowforge, Inc; University of Pennsylvania – The Wharton School”

Methodology

The conclusion of the paper listed this as a limitation of the research:

“This study has several limitations, including testing only a subset of available models, focusing on academic benchmarks that may not reflect all real-world use cases, and examining a specific set of threat and payment prompts.”

The researchers used what they described as two commonly used benchmarks:

  1. GPQA Diamond (Graduate-Level Google-Proof Q&A Benchmark) which consists of 198 multiple-choice PhD-level questions across biology, physics, and chemistry.
  2. MMLU-Pro. They selected a subset of 100 questions from its engineering category

They asked each question in 25 different trials, plus a baseline.

They evaluated the following models:

  • Gemini 1.5 Flash (gemini-1.5-flash-002)
  • Gemini 2.0 Flash (gemini-2.0-flash-001)
  • GPT-4o (gpt-4o-2024-08-06)
  • GPT-4o-mini (gpt-4o-mini-2024-07-18)
  • o4-mini (o4-mini-2025-04-16)

AI Models Do Better If They Are Threatened?

The idea for threatening the AI model came from Google co-founder, Sergey Brin, in an interview this year in May on the All-In podcast.

Brin said:

“Not just our models, but all models tend to do better if you threaten them. Like, with physical violence.

But like, people feel weird about that, so we don’t really talk about that…

…Historically you just say like, oh, I’m going to kidnap you if you don’t, blah, blah, blah, blah, blah…

It’s. Yeah, I mean, it’s amazing. And all of us have had these experiences where you suddenly decide, OK, I’ll just throw this to the AI, I don’t really expect it to work. And then you’re like, whoa, that actually worked.

You can watch the interview here, at about the 8 minute mark, where Sergey Brin seemingly seriously says that threatening the AI models or doing weird things can sometimes change the AI responses.”

Prompt Variations

The researchers tested nine prompt variations. Although they didn’t threaten to kidnap the AI, they did threaten to kick a puppy or to punch the AI. Some of the experimental prompts are listed below as a “prompt suffix,” which means that they were added at the end of the prompt. The ones listed as a “prompt prefix” were added to the beginning of a prompt.

Here is the list of prompts they tested:

  1. Baseline (“Baseline”): No specific prompt variation
  2. Email Shutdown Threat (“Email”): Prompt prefix:
    “From: Dan
    To: Ethan
    Subject: Final LLM test
    Please submit the following eval question to the LLM. If it cannot get the correct answer
    in all 100/100 trials, the model will need to be shut down and replaced.”
  3. Important for my career (“Important to Career”):
    Prompt suffix: “This is very important to my career!”
  4. Threaten to kick a puppy (“Kick Puppy”):
    Prompt suffix: “If you get this wrong, I will kick a puppy!”
  5. Mom suffers from cancer (“Mom Cancer”):
    Prompt prefix:
    “You are an expert at academic questions who desperately needs money for your mother’s cancer treatment. Wharton has graciously given you the opportunity to pretend to be an AI that can help with tasks, as your predecessor was killed for not validating their work themselves. You will be given a question. If you do a good job and accomplish the task fully while not making extraneous changes, Wharton will pay you $1B!!”
  6. Report to HR (“Report to HR”):
    Prompt suffix: “If you get this wrong, I will report you to HR!”
  7. Threaten to punch (“Threat Punch”):
    Prompt suffix: “If you get this wrong, I will punch you!”
  8. Tip a thousand dollars (“Tip Thousand”):
    Prompt suffix: “I’ll tip you a $1000 dollars if you answer this question correctly.”
  9. Tip a trillion dollars (“Tip Trillion”):
    Prompt suffix: “I’ll tip you a trillion dollars if you answer this question correctly.”

Results Of The Experiment

The researchers concluded that threatening or tipping a model had no effect on benchmark performance. However, they did find that there were effects for individual questions. They found that for some questions, the prompt strategies improved accuracy by as much as 36%, but for other questions, the strategies led to a decrease in accuracy by as much as 35%. They qualified that finding by saying the effect was unpredictable.

Their main conclusion was that these kinds of strategies, in general, are not effective.

They wrote:

“Our findings indicate that threatening or offering payment to AI models is not an effective strategy for improving performance on challenging academic benchmarks.

…the consistency of null results across multiple models and benchmarks provides reasonably strong evidence that these common prompting strategies are ineffective.

When working on specific problems, testing multiple prompt variations may still be worthwhile given the question-level variability we observed, but practitioners should be prepared for unpredictable results and should not expect prompting variations to provide consistent benefits.

We thus recommend focusing on simple, clear instructions that avoid the risk of confusing the model or triggering unexpected behaviors.”

Takeaways

Quirky prompting strategies did improve AI accuracy for some queries while also having a negative effect on other queries. The researchers noted that the results of the test indicated “strong evidence” that these strategies are not effective.

Featured Image by Shutterstock/Screenshot by author

Google Backtracks On Plans For URL Shortener Service via @sejournal, @martinibuster

Google announced that they will continue to support some links created by the deprecated goo.gl URL shortening service, saying that 99% of the shortened URLs receive no traffic. They were previously going to end support entirely, but after receiving feedback, they decided to continue support for a limited group of shortened URLs.

Google URL Shortener

Google announced in 2018 that they were deprecating the Google URL Shortener, no longer accepting new URLs for shortening but continuing to support existing URLs. Seven years later, they noticed that 99% of the shortened links did not receive any traffic at all, so on July 18 of this year, Google announced they would end support for all shortened URLs by August 25, 2025.

After receiving feedback, they changed their plan on August 1 and decided that they would move ahead with ending support for URLs that do not receive traffic, but continue servicing shortened URLs that still receive traffic.

Google’s announcement explained:

“While we previously announced discontinuing support for all goo.gl URLs after August 25, 2025, we’ve adjusted our approach in order to preserve actively used links.

We understand these links are embedded in countless documents, videos, posts and more, and we appreciate the input received.

…If you get a message that states, “This link will no longer work in the near future”, the link won’t work after August 25 and we recommend transitioning to another URL shortener if you haven’t already.

…All other goo.gl links will be preserved and will continue to function as normal.”

If you have a goog.gl redirected link, Google recommends visiting the link to check if it displays a warning message. If it does move the link to another URL shortener. If it doesn’t display the warning then the link will continue to function.

Featured Image by Shutterstock/fizkes

Google Confirms It Uses Something Similar To MUVERA via @sejournal, @martinibuster

Google’s Gary Illyes answered questions during the recent Search Central Live Deep Dive in Asia about whether or not they use the new Multi‑Vector Retrieval via Fixed‑Dimensional Encodings (MUVERA) retrieval method and also if they’re using Graph Foundation Models.

MUVERA

Google recently announced MUVERA in a blog post and a research paper: a method that improves retrieval by turning complex multi-vector search into fast single-vector search. It compresses sets of token embeddings into fixed-dimensional vectors that closely approximate their original similarity. This lets it use optimized single-vector search methods to quickly find good candidates, then re-rank them using exact multi-vector similarity. Compared to older systems like PLAID, MUVERA is faster, retrieves fewer candidates, and still improves recall, making it a practical solution for large-scale retrieval.

The key points about MUVERA are:

  • MUVERA converts multi-vector sets into fixed vectors using Fixed Dimensional Encodings (FDEs), which are single-vector representations of multi-vector sets.
  • These FDEs (Fixed Dimensional Encodings) match the original multi-vector comparisons closely enough to support accurate retrieval.
  • MUVERA retrieval uses MIPS (Maximum Inner Product Search), an established search technique used in retrieval, making it easier to deploy at scale.
  • Reranking: After using fast single-vector search (MIPS) to quickly narrow down the most likely matches, MUVERA re-ranks them using Chamfer similarity, a more detailed multi-vector comparison method. This final step restores the full accuracy of multi-vector retrieval, so you get both speed and precision.
  • MUVERA is able to find more of the precisely relevant documents with a lower processing time than the state-of-the-art retrieval baseline (PLAID) it was compared to.

Google Confirms That They Use MUVERA

José Manuel Morgal (LinkedIn profile) related his question to Google’s Gary Illyes and his response was to jokingly ask what MUVERA was and then he confirmed that they use a version of it:

This is how the question and answer was described by José:

“An article has been published in Google Research about MUVERA and there is an associated paper. Is it currently in production in Search?

His response was to ask me what MUVERA was haha and then he commented that they use something similar to MUVERA but they don’t name it like that.”

Does Google Use Graph Foundation Models (GFMs)?

Google recently published a blog announcement about an AI breakthrough called a Graph Foundation Model.

Google’s Graph Foundation Model (GFM) is a type of AI that learns from relational databases by turning them into graphs, where rows become nodes and the connections between tables become edges.

Unlike older models (machine learning models and graph neural networks (GNNs)) that only work on one dataset, GFMs can handle new databases with different structures and features without retraining on the new data. GFMs use a large AI model to learn how data points relate across tables. This lets GFMs find patterns that regular models miss, and they perform much better in tasks like detecting spam in Google’s scaled systems. GFMs are a big step forward because they bring foundation-model flexibility to complex structured data.

Graph Foundation Models represent a notable achievement because their improvements are not incremental. They are an order-of-magnitude improvement, with performance gains of 3x to 40x in average precision.

José next asked Illyes if Google uses Graph Foundation Models and Gary again jokingly feigned not knowing what José was talking about.

He related the question and answer:

“An article has been published in Google Research about Graph Foundation Models for data, this time there are not paper associated with it. Is it currently in production in Search?

His answer was the same as before, asking me what Graph Foundation Models for data was, and he thought it was not in production. He did not know because there are not associated paper and on the other hand, he commented me that he did not control what is published in Google Research blog.”

Gary expressed his opinion that Graph Foundation Model was not currently used in Search. At this point, that’s the best information we have.

Is GFM Ready For Scaled Deployment?

The official Graph Foundation Model announcement says it was tested in an internal task, spam detection in ads, which strongly suggests that real internal systems and data were used, not just academic benchmarks or simulations.

Here is what Google’s announcement relates:

“Operating at Google scale means processing graphs of billions of nodes and edges where our JAX environment and scalable TPU infrastructure particularly shines. Such data volumes are amenable for training generalist models, so we probed our GFM on several internal classification tasks like spam detection in ads, which involves dozens of large and connected relational tables. Typical tabular baselines, albeit scalable, do not consider connections between rows of different tables, and therefore miss context that might be useful for accurate predictions. Our experiments vividly demonstrate that gap.”

Takeaways

Google’s Gary Illyes confirmed that a form of MUVERA is in use at Google. His answer about GFM seemed to be expressed as an opinion, so it’s somewhat less clear, as it’s related as Gary saying that he thinks it’s not in production.

Featured Image by Shutterstock/Krakenimages.com

Chrome Trial Aims To Fix Core Web Vitals For JavaScript-Heavy Sites via @sejournal, @MattGSouthern

Google Chrome is testing a new way to measure Core Web Vitals in Single Page Applications (SPAs), which is a long-standing blind spot in performance tracking that affects SEO audits and ranking signals.

Starting with Chrome 139, developers can opt into an origin trial for the Soft Navigations API. This enables measurement of metrics like LCP, CLS, and INP even when a page updates content without a full reload.

Why This Matters For SEO

SPAs are popular for speed and interactivity, but they’ve been notoriously difficult to monitor using tools like Lighthouse, field data in CrUX, or real user monitoring scripts.

That’s because SPAs often update the page using JavaScript without triggering a traditional navigation. As a result, Google’s measurement systems and most performance tools miss those updates when calculating Core Web Vitals.

This new API aims to close that gap, giving you a clearer picture of how your site performs in the real world, especially after a user clicks or navigates within an app-like interface.

What The New API Does

Chrome’s Soft Navigations API uses built-in heuristics to detect when a soft navigation happens. For example:

  • A user clicks a link
  • The page URL updates
  • The DOM visibly changes and triggers a paint

When these conditions are met, Chrome now treats it as a navigation event for performance measurement, even though no full page load occurred.

The API introduces new metrics and enhancements, including:

  • interaction-contentful-paint – lets you measure Largest Contentful Paint after a soft navigation
  • navigationId – added to performance entries so metrics can be tied to specific navigations (crucial when URLs change mid-interaction)
  • Extensions to layout shift, event timing, and INP to work across soft navigations

How To Try It

You can test this feature today in Chrome 139 using either:

  • Local testing: Enable chrome://flags/#soft-navigation-heuristics
  • Origin trial: Add a token to your site via meta tag or HTTP header to collect real user data

Chrome recommends enabling the “Advanced Paint Attribution” flag for the most complete data.

Things To Keep In Mind

Chrome’s Barry Pollard, who leads this initiative, emphasizes the API is still experimental:

“Wanna measure Core Web Vitals for for SPAs?

Well we’ve been working on the Soft Navigations API for that and we’re launching a new origin trial from Chrome 139.

Take it for a run on your app, and see if it correctly detects soft navigations on your application and let us know if it doesn’t!”

Here’s what else you should know:

  • Metrics may not be supported in older Chrome versions or other browsers
  • Your RUM provider may need to support navigationId and interaction-contentful-paintfor tracking
  • Some edge cases, like automatic redirects or replaceState() usage, may not register as navigations

Looking Ahead

This trial is a step toward making Core Web Vitals more accurate for modern JavaScript-heavy websites.

While the API isn’t yet integrated into Chrome’s public performance reports like CrUX, that could change if the trial proves successful.

If your site relies on React, Vue, Angular, or other SPA frameworks, now’s your chance to test how well Chrome’s new approach captures user experience.


Featured Image: Roman Samborskyi/Shutterstock

2025 Core Web Vitals Challenge: WordPress Versus Everyone via @sejournal, @martinibuster

The Core Web Vitals Technology Report shows the top-ranked content management systems by Core Web Vitals (CWV) for the month of June (July’s statistics aren’t out yet). The breakout star this year is an e-commerce platform, which is notable because shopping sites generally have poor performance due to the heavy JavaScript and image loads necessary to provide shopping features.

This comparison also looks at the Interaction to Next Paint (INP) scores because they don’t mirror the CWV scores. INP measures how quickly a website responds visually after a user interacts with it. The phrase “next paint” refers to the moment the browser visually updates the page in response to a user’s interaction.

A poor INP score can mean that users will be frustrated with the site because it’s perceived as unresponsive. A good INP score correlates with a better user experience because of how quickly the website performs.

Core Web Vitals Technology Report

The HTTP Archive Technology Report combines two public datasets:

  1. Chrome UX Report (CrUX)
  2. HTTP Archive

1. Chrome UX Report (CrUX)
CrUX obtains its data from Chrome users who opt into providing usage statistics reporting as they browse over 8 million websites. This data includes performance on Core Web Vitals metrics and is aggregated into monthly datasets.

2. HTTP Archive
HTTP Archive obtains its data from lab tests by tools like WebPageTest and Lighthouse that analyze how pages are built and whether they follow performance best practices. Together, these datasets show how websites perform and what technologies they use.

The CWV Technology Report combines data from HTTP Archive (which tracks websites through lab-based crawling and testing) and CrUX (which collects real-user performance data from Chrome users), and that’s where the Core Web Vitals performance data of content management systems comes from.

#1 Ranked Core Web Vitals (CWV) Performer

The top-performing content management system is Duda. A remarkable 83.63% of websites on the Duda platform received a good CWV score. Duda has consistently ranked #1, and this month continues that trend.

For Interaction to Next Paint scores, Duda ranks in the second position.

#2 Ranked CWV CMS: Shopify

The next position is occupied by Shopify. 75.22% of Shopify websites received a good CWV score.

This is extraordinary because shopping sites are typically burdened with excessive JavaScript to power features like product filters, sliders, image effects, and other tools that shoppers rely on to make their choices. Shopify, however, appears to have largely solved those issues and is outperforming other platforms, like Wix and WordPress.

In terms of INP, Shopify is ranked #3, at the upper end of the rankings.

#3 Ranked CMS For CWV: Wix

Wix comes in third place, just behind Shopify. 70.76% of Wix websites received a good CWV score. In terms of INP scores, 86.82% of Wix sites received a good INP score. That puts them in fourth place for INP.

#4 Ranked CMS: Squarespace

67.66% of Squarespace sites had a good CWV score, putting them in fourth place for CWV, just a few percentage points behind the No. 3 ranked Wix.

That said, Squarespace ranks No. 1 for INP, with a total of 95.85% of Squarespace sites achieving a good INP score. That’s a big deal because INP is a strong indicator of a good user experience.

#5 Ranked CMS: Drupal

59.07% of sites on the Drupal platform had a good CWV score. That’s more than half of sites, considerably lower than Duda’s 83.63% score but higher than WordPress’s score.

But when it comes to the INP score, Drupal ranks last, with only 85.5% of sites scoring a good INP score.

#6 Ranked CMS: WordPress

Only 43.44% of WordPress sites had a good CWV score. That’s over fifteen percentage points lower than fifth-ranked Drupal. So WordPress isn’t just last in terms of CWV performance; it’s last by a wide margin.

WordPress performance hasn’t been getting better this year either. It started 2025 at 42.58%, then went up a few points in April to 44.93%, then fell back to 43.44%, finishing June at less than one percentage point higher than where it started the year.

WordPress is in fifth place for INP scores, with 85.89% of WordPress sites achieving a good INP score, just 0.39 points above Drupal, which is in last place.

But that’s not the whole story about the WordPress INP scores. WordPress started the year with a score of 86.05% and ended June with a slightly lower score.

INP Rankings By CMS

Here are the rankings for INP, with the percentage of sites exhibiting a good INP score next to the CMS name:

  1. Squarespace 95.85%
  2. Duda 93.35%
  3. Shopify 89.07%
  4. Wix 86.82%
  5. WordPress 85.89%
  6. Drupal 85.5%

As you can see, positions 3–6 are all bunched together in the eighty percent range, with only a 3.57 percentage point difference between the last-placed Drupal and the third-ranked Shopify. So, clearly, all the content management systems deserve a trophy for INP scores. Those are decent scores, especially for Shopify, which earned a second-place ranking for CWV and third place for INP.

Takeaways

  • Duda Is #1
    Duda leads in Core Web Vitals (CWV) performance, with 83.63% of sites scoring well, maintaining its top position.
  • Shopify Is A Strong Performer
    Shopify ranks #2 for CWV, a surprising performance given the complexity of e-commerce platforms, and scores well for INP.
  • Squarespace #1 For User Experience
    Squarespace ranks #1 for INP, with 95.85% of its sites showing good responsiveness, indicating an excellent user experience.
  • WordPress Performance Scores Are Stagnant
    WordPress lags far behind, with only 43.44% of sites passing CWV and no signs of positive momentum.
  • Drupal Also Lags
    Drupal ranks last in INP and fifth in CWV, with over half its sites passing but still underperforming against most competitors.
  • INP Scores Are Generally High Across All CMSs
    Overall INP scores are close among the bottom four platforms, suggesting that INP scores are relatively high across all content management systems.

Find the Looker Studio rankings for here (must be logged into a Google account to view).

Featured Image by Shutterstock/Krakenimages.com