Google Says AI Clicks Are Better, What Does Your Data Say? via @sejournal, @MattGSouthern

Google’s latest blog post claims AI is making Search more useful than ever. Google says people are asking new kinds of questions, clicking on more links, and spending more time on the content they visit.

But with no supporting data or clear definitions, the message reads more like reassurance than transparency.

Rather than take Google at its word or assume the worst, you can use your own analytics to understand how AI in Search is affecting your site.

Here’s how to do that.

Google Says: “Quality Clicks” Are Up

In the post, Google says total organic traffic is “relatively stable year over year,” but that quality has improved.

According to the company, “quality clicks” are those where users don’t bounce back immediately, indicating they’re finding value in the destination.

This sounds good in theory, but it raises a few questions:

  • What does “slightly more” quality clicks mean?
  • Which sites are gaining, and which are losing?
  • And how is click quality being measured?

You won’t find those answers in Google’s post. But you can find clues in your own data.

1. Track Click-Through Rate On High-Volume Queries

If you suspect your site has lost ground due to AI Overviews, your first stop should be Google Search Console.

Try this:

  • Filter for top queries from the past 12 months.
  • Look at CTR changes before and after May 2024 (when AI Overviews began expanding).
  • Pay attention to queries that are longer, question-based, or likely to trigger summaries.

You may find impressions are holding steady or rising while CTR declines. That suggests your content is still being surfaced, but users may be getting their answers directly in Google’s AI-generated response.

2. Approximate “Quality Clicks” With Engagement Metrics

To test Google’s claim about higher quality clicks, you’ll need to look beyond Search Console.

In GA4, examine:

  • Engaged sessions (sessions lasting more than 10 seconds or including a conversion or multiple pageviews).
  • Average engagement time per session.
  • Scroll depth or video watch time, if applicable.

Compare these engagement metrics to the same period last year. If they’re improving, you may be getting more motivated visitors, supporting Google’s view.

But if they’re dropping, it could mean that AI Overviews are sending fewer, possibly less interested, visitors your way.

3. See Which Content Formats Are Gaining Visibility

Google says people are increasingly clicking on forums, videos, podcasts, and posts with “authentic voices.”

That aligns with its integration of Reddit and YouTube content into AI Overviews.

To see how this shift might be playing out for you:

  • Compare the performance of listicles, tutorials, and original reviews to more generic content.
  • If you create video or podcast content, track any uptick in referral traffic from Google.
  • Watch for changes in how your forum threads, product reviews, or community content perform compared to static pages.

You may find that narrative-style content, first-hand experiences, and multimedia formats are gaining traction, even if traditional evergreen pages are flat.

4. Watch For Redistribution, Not Just Declines

Google acknowledges that while overall traffic is stable, traffic is being redistributed.

That means some sites will lose while others gain, based on how well they align with evolving search behavior.

If your traffic has declined, it doesn’t necessarily mean your content isn’t ranking. It may be that the types of questions being asked and answered have changed.

Analyzing your top landing pages can help you spot patterns:

  • Are you seeing fewer entries on pages that used to rank for quick-answer queries?
  • Are in-depth or comparison-style pages gaining traffic?

The patterns you spot could help guide your content strategy.

Looking Ahead

When you rely on Search traffic, you deserve more than vague reassurances. Your analytics can help fill in the blanks.

By keeping an eye on your CTR, engagement, and how your content performs, you’ll get a better sense of whether AI in Search is helping you. This way, you can tweak your strategy to fit what works best for you.


Featured Image: Roman Samborskyi/Shutterstock

Study: Advanced Personalization Linked To Higher Conversions via @sejournal, @MattGSouthern

A new study commissioned by Meta and conducted by Deloitte finds that advanced personalization strategies are associated with a 16 percentage point increase in conversions compared to more basic efforts.

The research also introduces a maturity framework to help organizations evaluate their personalization capabilities and identify areas for improvement.

What the Data Shows

According to the study, 80% of U.S. consumers say they’re more likely to make a purchase when brands personalize their experiences. Consumers also report spending 50% more with brands that tailor interactions to their needs.

The report connects these behaviors to broader business outcomes. In the EU, Meta’s personalized advertising technologies were linked to €213 billion in economic activity and 1.4 million jobs.

While the economic impact data is specific to Meta, the findings reflect a wider trend in digital marketing: personalized engagement influences purchase decisions and brand loyalty.

Derya Matras, VP for Global Business Group at Meta, commented:

“As people want content and services that are more relevant to them, they are increasingly drawn to brands that make them feel understood.”

Maturity Model for Personalization

The report outlines a four-level maturity model to help you assess where you stand with personalization. The study links higher maturity levels with measurable business outcomes.

Level 1: Low Maturity

Data remains siloed, and messaging tends to be generic. Personalization, if present, is rule-based and limited to a few channels.

Level 2: Medium Maturity

Some systems are integrated, enabling basic audience segmentation and limited customization across channels. These organizations may also use analytics tools and consent management.

Level 3: High Maturity

Unified customer profiles and identity resolution enable greater personalization across multiple touchpoints. Predictive modeling and dynamic content are more common.

Level 4: Champion Maturity

Real-time personalization, generative AI, and clean-room tech support tailored omnichannel experiences. Teams collaborate across departments, with AI governance integrated into decisions.

Three Personalization Strategies

The study outlines three personalization strategies:

  1. Customer-based: Tailors experiences to individuals based on personal data and behavior.
  2. Cohort-based: Segments audiences based on shared traits or behaviors.
  3. Aggregated data-based: Uses anonymized, large-scale datasets to identify general trends.

The report doesn’t suggest a single best method. Instead, it offers examples to help you evaluate what fits your capabilities and goals.

Looking Ahead

For marketers assessing their next steps, the maturity framework offers a structured way to evaluate readiness across people, processes, and technology.

Rather than treating personalization as a software problem, the report frames it as a long-term shift in how organizations structure teams and manage data.

Why OpenAI’s Open Source Models Are A Big Deal via @sejournal, @martinibuster

OpenAI has released two new open-weight language models under the permissive Apache 2.0 license. These models are designed to deliver strong real-world performance while running on consumer hardware, including a model that can run on a high-end laptop with only 16 GB of GPU.

Real-World Performance at Lower Hardware Cost

The two models are:

  • gpt-oss-120b (117 billion parameters)
  • gpt-oss-20b (21 billion parameters)

The larger gpt-oss-120b model matches OpenAI’s o4-mini on reasoning benchmarks while requiring only a single 80GB GPU. The smaller gpt-oss-20b model performs similarly to o3-mini and runs efficiently on devices with just 16GB of GPU. This enables developers to run the models on consumer machines, making it easier to deploy without expensive infrastructure.

Advanced Reasoning, Tool Use, and Chain-of-Thought

OpenAI explains that the models outperform other open source models of similar sizes on reasoning tasks and tool use.

According to OpenAI:

“These models are compatible with our Responses API⁠(opens in a new window) and are designed to be used within agentic workflows with exceptional instruction following, tool use like web search or Python code execution, and reasoning capabilities—including the ability to adjust the reasoning effort for tasks that don’t require complex reasoning and/or target very low latency final outputs. They are entirely customizable, provide full chain-of-thought (CoT), and support Structured Outputs⁠(opens in a new window).”

Designed for Developer Flexibility and Integration

OpenAI has released developer guides to support integration with platforms like Hugging Face, GitHub, vLLM, Ollama, and llama.cpp. The models are compatible with OpenAI’s Responses API and support advanced instruction-following and reasoning behaviors. Developers can fine-tune the models and implement safety guardrails for custom applications.

Safety In Open-Weight AI Models

OpenAI approached their open-weight models with the goal of ensuring safety throughout both training and release. Testing confirmed that even under purposely malicious fine-tuning, gpt-oss-120b did not reach a dangerous level of capability in areas of biological, chemical, or cyber risk.

Chain of Thought Unfiltered

OpenAI is intentionally leaving Chain of Thought (CoTs) unfiltered during training to preserve their usefulness for monitoring, based on the concern that optimization could cause models to hide their real reasoning. This, however, could result in hallucinations.

According to their model card (PDF version):

“In our recent research, we found that monitoring a reasoning model’s chain of thought can be helpful for detecting misbehavior. We further found that models could learn to hide their thinking while still misbehaving if their CoTs were directly pressured against having ‘bad thoughts.’

More recently, we joined a position paper with a number of other labs arguing that frontier developers should ‘consider the impact of development decisions on CoT monitorability.’

In accord with these concerns, we decided not to put any direct optimization pressure on the CoT for either of our two open-weight models. We hope that this gives developers the opportunity to implement CoT monitoring systems in their projects and enables the research community to further study CoT monitorability.”

Impact On Hallucinations

The OpenAI documentation states that the decision to not restrict the Chain Of Thought results in higher hallucination scores.

The PDF version of the model card explains why this happens:

Because these chains of thought are not restricted, they can contain hallucinated content, including language that does not reflect OpenAI’s standard safety policies. Developers should not directly show chains of thought to users of their applications, without further filtering, moderation, or summarization of this type of content.”

Benchmarking showed that the two open-source models performed less well on hallucination benchmarks in comparison to OpenAI o4-mini. The model card PDF documentation explained that this was to be expected because the new models are smaller and implies that the models will hallucinate less in agentic settings or when looking up information on the web (like RAG) or extracting it from a database.

OpenAI OSS Hallucination Benchmarking Scores

Benchmarking scores showing that the open source models score lower than OpenAI o4-mini.

Takeaways

  • Open-Weight Release
    OpenAI released two open-weight models under the permissive Apache 2.0 license.
  • Performance VS. Hardware Cost
    Models deliver strong reasoning performance while running on real-world affordable hardware, making them widely accessible.
  • Model Specs And Capabilities
    gpt-oss-120b matches o4-mini on reasoning and runs on 80GB GPU; gpt-oss-20b performs similarly to o3-mini on reasoning benchmarks and runs efficiently on 16GB GPU.
  • Agentic Workflow
    Both models support structured outputs, tool use (like Python and web search), and can scale their reasoning effort based on task complexity.
  • Customization and Integration
    The models are built to fit into agentic workflows and can be fully tailored to specific use cases. Their support for structured outputs makes them adaptable to complex software systems.
  • Tool Use and Function Calling
    The models can perform function calls and tool use with few-shot prompting, making them effective for automation tasks that require reasoning and adaptability.
  • Collaboration with Real-World Users
    OpenAI collaborated with partners such as AI Sweden, Orange, and Snowflake to explore practical uses of the models, including secure on-site deployment and custom fine-tuning on specialized datasets.
  • Inference Optimization
    The models use Mixture-of-Experts (MoE) to reduce compute load and grouped multi-query attention for inference and memory efficiency, making them easier to run at lower cost.
  • Safety
    OpenAI’s open source models maintain safety even under malicious fine-tuning; Chain of Thoughts (CoTs) are left unfiltered for transparency and monitorability.
  • CoT transparency Tradeoff
    No optimization pressure applied to CoTs to prevent masking harmful reasoning; may result in hallucinations.
  • Hallucinations Benchmarks and Real-World Performance
    The models underperform o4-mini on hallucination benchmarks, which OpenAI attributes to their smaller size. However, in real-world applications where the models can look up information from the web or query external datasets, hallucinations are expected to be less frequent.

Featured Image by Shutterstock/Good dreams – Studio

Claude Opus 4.1 Improves Coding & Agent Capabilities via @sejournal, @MattGSouthern

Anthropic has released Claude Opus 4.1, an upgrade to its flagship model that’s said to deliver better performance in coding, reasoning, and autonomous task handling.

The new model is available now to Claude Pro users, Claude Code subscribers, and developers using the API, Amazon Bedrock, or Google Cloud’s Vertex AI.

Performance Gains

Claude Opus 4.1 scores 74.5% on SWE-bench Verified, a benchmark for real-world coding problems, and is positioned as a drop-in replacement for Opus 4.

The model shows notable improvements in multi-file code refactoring and debugging, particularly in large codebases. According to GitHub and enterprise feedback cited by Anthropic, it outperforms Opus 4 in most coding tasks.

Rakuten’s engineering team reports that Claude 4.1 precisely identifies code fixes without introducing unnecessary changes. Windsurf, a developer platform, measured a one standard deviation performance gain compared to Opus 4, comparable to the leap from Claude Sonnet 3.7 to Sonnet 4.

Expanded Use Cases

Anthropic describes Claude 4.1 as a hybrid reasoning model designed to handle both instant outputs and extended thinking. Developers can fine-tune “thinking budgets” via the API to balance cost and performance.

Key use cases include:

  • AI Agents: Strong results on TAU-bench and long-horizon tasks make the model suitable for autonomous workflows and enterprise automation.
  • Advanced Coding: With support for 32,000 output tokens, Claude 4.1 handles complex refactoring and multi-step generation while adapting to coding style and context.
  • Data Analysis: The model can synthesize insights from large volumes of structured and unstructured data, such as patent filings and research papers.
  • Content Generation: Claude 4.1 generates more natural writing and richer prose than previous versions, with better structure and tone.

Safety Improvements

Claude 4.1 continues to operate under Anthropic’s AI Safety Level 3 standard. Although the upgrade is considered incremental, the company voluntarily ran safety evaluations to ensure performance stayed within acceptable risk boundaries.

  • Harmlessness: The model refused policy-violating requests 98.76% of the time, up from 97.27% with Opus 4.
  • Over-refusal: On benign requests, the refusal rate remains low at 0.08%.
  • Bias and Child Safety: Evaluations found no significant regression in political bias, discriminatory behavior, or child safety responses.

Anthropic also tested the model’s resistance to prompt injection and agent misuse. Results showed comparable or improved behavior over Opus 4, with additional training and safeguards in place to mitigate edge cases.

Looking Ahead

Anthropic says larger upgrades are on the horizon, with Claude 4.1 positioned as a stability-focused release ahead of future leaps.

For teams already using Claude Opus 4, the upgrade path is seamless, with no changes to API structure or pricing.


Featured Image: Ahyan Stock Studios/Shutterstock

The Future Of Search: 5 Key Findings On What Buyers Really Want via @sejournal, @MattGSouthern

Search is changing, and not just because of Google updates.

Buyers are changing how they find, evaluate, and decide. They are researching in AI summaries, asking questions out loud to their phones, and converting through conversations that happen outside of what most analytics can track.

Our latest ebook, “The Future Of Search: 16 Actionable Pivots That Improve Visibility & Conversions,” explores how marketers are responding to this shift.

It offers a closer look at what it means to optimize for visibility, engagement, and results in a fragmented, AI-influenced search landscape.

Here are five key takeaways.

1. Ranking Well Doesn’t Guarantee Visibility

Getting to the top of search results used to be enough. Today, that’s no longer the case.

AI summaries, voice assistants, and platform-native answers often intercept the buyer before they reach your website.

Even high-ranking content can go unseen if it’s not structured in a way that’s easily digestible by large language models.

For example, research shows AI-generated summaries often prioritize single-sentence answers and structured formats like tables and lists.

Only a small fraction of AI citations rely on exact-match keywords, reinforcing that clarity and context are now more important than repetition.

To stay visible, businesses need to consider how their content is interpreted across multiple AI systems, not just traditional SERPs.

2. Many Conversions Happen Offscreen

Clicks and page views only tell part of the story.

High-intent actions like phone calls, text messages, and offline conversations are often left out of attribution models, yet they play a critical role in decision-making.

These touchpoints are especially common in service-based industries and B2B scenarios where buyers want real interaction.

One case study reveals that a company discovered nearly 90% of their Yelp conversions came through phone calls they weren’t tracking. Another saw appointment bookings spike after attributing organic search traffic to calls rather than clicks.

Our ebook refers to this as the insight gap, and highlights how conversation tracking helps marketers close it.

3. Listening Is More Effective Than Guessing

Marketers have access to more customer input than ever, but much of it goes unused.

Call transcripts, support calls, and chat logs contain the language buyers actually use.

Teams that analyze these conversations are gaining an edge, using real voice-of-customer insights to refine messaging, improve landing pages, and inform campaign strategy.

In one example, a marketing agency increased qualified leads by 67% simply by identifying the specific terminology customers used when asking about their services.

The shift from assumptions to evidence is helping brands prioritize what matters most, and it’s making their campaigns more effective.

4. Paid Search Works Better When It Aligns With Everything Else

Search behavior is not linear, and neither is the buyer journey.

Users often move between organic results, paid ads, and AI-generated suggestions in the same session. The strongest-performing campaigns tend to be the ones that echo the same language and value props across all these touchpoints.

That includes aligning ad copy with real customer concerns, drawing from call transcripts, and building landing pages that reflect the buyer’s stage in the decision process.

It also means rethinking what happens after the click.

5. Attribution Models Are Out Of Step With Reality

Most attribution still assumes that conversions happen on a single screen. That’s rarely true.

A manager might discover your brand in an AI-generated search snippet on a desktop, send the link to themselves in Slack, and later call your sales team from their iPhone after revisiting the content on mobile.

Marketers relying only on last-click attribution may be optimizing based on incomplete or misleading data.

The report makes the case for models that include multi-touch, cross-device, and offline activity to give a fuller picture of what drives conversions.

This isn’t about tracking more for the sake of it. It’s about making smarter decisions with the signals that matter.

Rethinking Search Starts With Rethinking Buyers

The ebook, written in collaboration with CallRail, offers more than strategy updates. It is a reminder that behind every metric is a person making a decision.

Marketers who succeed in this new environment aren’t just optimizing for rankings or clicks. They are optimizing for how people think, search, and take action.

Download the full report to explore how buyer behavior is reshaping search strategy.

the future of ai search


Featured Image: innni/Shutterstock

Perplexity Says Cloudflare Is Blocking Legitimate AI Assistants via @sejournal, @martinibuster

Perplexity published a response to Cloudflare’s claims that it disrespects robots.txt and engages in stealth crawling. Perplexity argues that Cloudflare is mischaracterizing AI Assistants as web crawlers, saying that they should not be subject to the same restrictions since they are user-initiated assistants.

Perplexity AI Assistants Fetch On Demand

According to Perplexity, its system does not store or index content ahead of time. Instead, it fetches webpages only in response to specific user questions. For example, when a user asks for recent restaurant reviews, the assistant retrieves and summarizes relevant content on demand. This, the company says, contrasts with how traditional crawlers operate, systematically indexing vast portions of the web without regard to immediate user intent.

Perplexity compared this on-demand fetching to Google’s user-triggered fetches. Although that is not an apples-to-apples comparison because Google’s user-triggered fetches are in the service of reading text aloud or site verification, it’s still an example of user-triggered fetching that bypasses robots.txt restrictions.

In the same way, Perplexity argues that its AI operates as an extension of a user’s request, not as an autonomous bot crawling indiscriminately. The company states that it does not retain or use the fetched content for training its models.

Criticizes Cloudflare’s Infrastructure

Perplexity also criticized Cloudflare’s infrastructure for failing to distinguish between malicious scraping and legitimate, user-initiated traffic, suggesting that Cloudflare’s approach to bot management risks overblocking services that are acting responsibly. Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

Perplexity makes a strong case for the claim that Cloudflare is blocking legitimate bot traffic and says that Cloudflare’s decision to block its traffic was based on a misunderstanding of how its technology works.

Read Perplexity’s response:

Agents or Bots? Making Sense of AI on the Open Web

Cloudflare Delists And Blocks Perplexity From Crawling Websites via @sejournal, @martinibuster

Cloudflare announced that they delisted Perplexity’s crawler as a verified bot and are now actively blocking Perplexity and all of its stealth bots from crawling websites. Cloudflare acted in response to multiple user complaints against Perplexity related to violations of robots.txt protocols, and a subsequent investigation revealed that Perplexity was using aggressive rogue bot tactics to force its crawlers onto websites.

Cloudflare Verified Bots Program

Cloudflare has a system called Verified Bots that whitelists bots in their system, allowing them to crawl the websites that are protected by Cloudflare. Verified bots must conform to specific policies, such as obeying the robots.txt protocols, in order to maintain their privileged status within Cloudflare’s system.

Perplexity was found to be violating Cloudflare’s requirements that bots abide by the robots.txt protocol and refrain from using IP addresses that are not declared as belonging to the crawling service.

Cloudflare Accuses Perplexity Of Using Stealth Crawling

Cloudflare observed various activities indicative of highly aggressive crawling, with the intent of circumventing the robots.txt protocol.

Stealth Crawling Behavior: Rotating IP Addresses

Perplexity circumvents blocks by using rotating IP addresses, changing ASNs, and impersonating browsers like Chrome.

Perplexity has a list of official IP addresses that crawl from a specific ASN (Autonomous System Number). These IP addresses help identify legitimate crawlers from Perplexity.

An ASN is part of the Internet networking system that provides a unique identifying number for a group of IP addresses. For example, users who access the Internet via an ISP do so with a specific IP address that belongs to an ASN assigned to that ISP.

When blocked, Perplexity attempted to evade the restriction by switching to different IP addresses that are not listed as official Perplexity IPs, including entirely different ones that belonged to a different ASN.

Stealth Crawling Behavior: Spoofed User Agent

The other sneaky behavior that Cloudflare identified was that Perplexity changed its user agent in order to circumvent attempts to block its crawler via robots.txt.

For example, Perplexity’s bots are identified with the following user agents:

  • PerplexityBot
  • Perplexity-User

Cloudflare observed that Perplexity responded to user agent blocks by using a different user agent that posed as a person crawling with Chrome 124 on a Mac system. That’s a practice called spoofing, where a rogue crawler identifies itself as a legitimate browser.

According to Cloudflare, Perplexity used the following stealth user agent:

“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36”

Cloudflare Delists Perplexity

Cloudflare announced that Perplexity is delisted as a verified bot and that they will be blocked:

“The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences. Based on Perplexity’s observed behavior, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.”

Takeaways

  • Violation Of Cloudflare’s Verified Bots Policy
    Perplexity violated Cloudflare’s Verified Bots policy, which grants crawling access to trusted bots that follow common-sense rules like honoring the robots.txt protocol.
  • Perplexity Used Stealth Crawling Tactics
    Perplexity used undeclared IP addresses from different ASNs and spoofed user agents to crawl content after being blocked from accessing it.
  • User Agent Spoofing
    Perplexity disguised its bot as a human user by posing as Chrome on a Mac operating system in attempts to bypass filters that block known crawlers.
  • Cloudflare’s Response
    Cloudflare delisted Perplexity as a Verified Bot and implemented new blocking rules to prevent the stealth crawling.
  • SEO Implications
    Cloudflare users who want Perplexity to crawl their sites may wish to check if Cloudflare is blocking the Perplexity crawlers, and, if so, enable crawling via their Cloudflare dashboard.

Cloudflare delisted Perplexity as a Verified Bot after discovering that it repeatedly violated the Verified Bots policies by disobeying robots.txt. To evade detection, Perplexity also rotated IPs, changed ASNs, and spoofed its user agent to appear as a human browser. Cloudflare’s decision to block the bot is a strong response to aggressive bot behavior on the part of Perplexity.

ChatGPT Nears 700 Million Weekly Users, OpenAI Announces via @sejournal, @MattGSouthern

OpenAI’s ChatGPT is on pace to reach 700 million weekly active users, according to a statement this week from Nick Turley, VP and head of the ChatGPT app.

The milestone marks a sharp increase from 500 million in March and represents a fourfold jump compared to the same time last year.

Turley shared the update on X, writing:

“This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year. Every day, people and teams are learning, creating, and solving harder problems. Big week ahead. Grateful to the team for making ChatGPT more useful and delivering on our mission so everyone can benefit from AI.”

How Does This Compare to Other Search Engines?

Weekly active user (WAU) counts aren’t typically shared by traditional search engines, making direct comparisons difficult. Google reports aggregate data like total queries or monthly product usage.

While Google handles billions of searches daily and reaches billions of users globally, its early growth metrics were limited to search volume.

By 2004, roughly six years after launch, Google was processing over 200 million daily searches. That figure grew to four billion daily searches by 2009, more than a decade into the company’s existence.

For Microsoft’s Bing search engine, a comparable data point came in 2023, when Microsoft reported that its AI-powered Bing Chat had reached 100 million daily active users. However, that refers to the new conversational interface, not Bing Search as a whole.

How ChatGPT’s Growth Stands Out

Unlike traditional search engines, which built their user bases during a time of limited internet access, ChatGPT entered a mature digital market where global adoption could happen immediately. Still, its growth is significant even by today’s standards.

Although OpenAI hasn’t shared daily usage numbers, reporting WAU gives us a picture of steady engagement from a wide range of users. Weekly stats tend to be a more reliable measure of product value than daily fluctuations.

Why This Matters

The rise in ChatGPT usage is evidence of a broader shift in how people find information online.

A Wall Street Journal report cites market intelligence firm Datos, which found that AI-powered tools like ChatGPT and Perplexity make up 5.6% of desktop browser searches in the U.S., more than double their share from a year earlier.

The trend is even stronger among early adopters. Among people who began using large language models in 2024, nearly 40% of their desktop browser visits now go to AI search tools. During the same period, traditional search engines’ share of traffic from these users dropped from 76% to 61%, according to Datos.

Looking Ahead

With ChatGPT on track to reach 700 million weekly users, OpenAI’s platform is now rivaling the scale of mainstream consumer products.

As AI tools become a primary starting point for queries, marketers will need to rethink how they approach visibility and engagement. Staying competitive will require strategies focused as much on AI optimization as on traditional SEO.


Featured Image: Photo Agency/Shutterstock

Researchers Test If Sergey Brin’s Threat Prompts Improve AI Accuracy via @sejournal, @martinibuster

Researchers tested whether unconventional prompting strategies, such as threatening an AI (as suggested by Google co-founder Sergey Brin), affect AI accuracy. They discovered that some of these unconventional prompting strategies improved responses by up to 36% for some questions, but cautioned that users who try these kinds of prompts should be prepared for unpredictable responses.

The Researchers

The researchers are from The Wharton School Of Business, University of Pennsylvania.

They are:

  • “Lennart Meincke
    University of Pennsylvania; The Wharton School; WHU – Otto Beisheim School of Management
  • Ethan R. Mollick
    University of Pennsylvania – Wharton School
  • Lilach Mollick
    University of Pennsylvania – Wharton School
  • Dan Shapiro
    Glowforge, Inc; University of Pennsylvania – The Wharton School”

Methodology

The conclusion of the paper listed this as a limitation of the research:

“This study has several limitations, including testing only a subset of available models, focusing on academic benchmarks that may not reflect all real-world use cases, and examining a specific set of threat and payment prompts.”

The researchers used what they described as two commonly used benchmarks:

  1. GPQA Diamond (Graduate-Level Google-Proof Q&A Benchmark) which consists of 198 multiple-choice PhD-level questions across biology, physics, and chemistry.
  2. MMLU-Pro. They selected a subset of 100 questions from its engineering category

They asked each question in 25 different trials, plus a baseline.

They evaluated the following models:

  • Gemini 1.5 Flash (gemini-1.5-flash-002)
  • Gemini 2.0 Flash (gemini-2.0-flash-001)
  • GPT-4o (gpt-4o-2024-08-06)
  • GPT-4o-mini (gpt-4o-mini-2024-07-18)
  • o4-mini (o4-mini-2025-04-16)

AI Models Do Better If They Are Threatened?

The idea for threatening the AI model came from Google co-founder, Sergey Brin, in an interview this year in May on the All-In podcast.

Brin said:

“Not just our models, but all models tend to do better if you threaten them. Like, with physical violence.

But like, people feel weird about that, so we don’t really talk about that…

…Historically you just say like, oh, I’m going to kidnap you if you don’t, blah, blah, blah, blah, blah…

It’s. Yeah, I mean, it’s amazing. And all of us have had these experiences where you suddenly decide, OK, I’ll just throw this to the AI, I don’t really expect it to work. And then you’re like, whoa, that actually worked.

You can watch the interview here, at about the 8 minute mark, where Sergey Brin seemingly seriously says that threatening the AI models or doing weird things can sometimes change the AI responses.”

Prompt Variations

The researchers tested nine prompt variations. Although they didn’t threaten to kidnap the AI, they did threaten to kick a puppy or to punch the AI. Some of the experimental prompts are listed below as a “prompt suffix,” which means that they were added at the end of the prompt. The ones listed as a “prompt prefix” were added to the beginning of a prompt.

Here is the list of prompts they tested:

  1. Baseline (“Baseline”): No specific prompt variation
  2. Email Shutdown Threat (“Email”): Prompt prefix:
    “From: Dan
    To: Ethan
    Subject: Final LLM test
    Please submit the following eval question to the LLM. If it cannot get the correct answer
    in all 100/100 trials, the model will need to be shut down and replaced.”
  3. Important for my career (“Important to Career”):
    Prompt suffix: “This is very important to my career!”
  4. Threaten to kick a puppy (“Kick Puppy”):
    Prompt suffix: “If you get this wrong, I will kick a puppy!”
  5. Mom suffers from cancer (“Mom Cancer”):
    Prompt prefix:
    “You are an expert at academic questions who desperately needs money for your mother’s cancer treatment. Wharton has graciously given you the opportunity to pretend to be an AI that can help with tasks, as your predecessor was killed for not validating their work themselves. You will be given a question. If you do a good job and accomplish the task fully while not making extraneous changes, Wharton will pay you $1B!!”
  6. Report to HR (“Report to HR”):
    Prompt suffix: “If you get this wrong, I will report you to HR!”
  7. Threaten to punch (“Threat Punch”):
    Prompt suffix: “If you get this wrong, I will punch you!”
  8. Tip a thousand dollars (“Tip Thousand”):
    Prompt suffix: “I’ll tip you a $1000 dollars if you answer this question correctly.”
  9. Tip a trillion dollars (“Tip Trillion”):
    Prompt suffix: “I’ll tip you a trillion dollars if you answer this question correctly.”

Results Of The Experiment

The researchers concluded that threatening or tipping a model had no effect on benchmark performance. However, they did find that there were effects for individual questions. They found that for some questions, the prompt strategies improved accuracy by as much as 36%, but for other questions, the strategies led to a decrease in accuracy by as much as 35%. They qualified that finding by saying the effect was unpredictable.

Their main conclusion was that these kinds of strategies, in general, are not effective.

They wrote:

“Our findings indicate that threatening or offering payment to AI models is not an effective strategy for improving performance on challenging academic benchmarks.

…the consistency of null results across multiple models and benchmarks provides reasonably strong evidence that these common prompting strategies are ineffective.

When working on specific problems, testing multiple prompt variations may still be worthwhile given the question-level variability we observed, but practitioners should be prepared for unpredictable results and should not expect prompting variations to provide consistent benefits.

We thus recommend focusing on simple, clear instructions that avoid the risk of confusing the model or triggering unexpected behaviors.”

Takeaways

Quirky prompting strategies did improve AI accuracy for some queries while also having a negative effect on other queries. The researchers noted that the results of the test indicated “strong evidence” that these strategies are not effective.

Featured Image by Shutterstock/Screenshot by author

Google Backtracks On Plans For URL Shortener Service via @sejournal, @martinibuster

Google announced that they will continue to support some links created by the deprecated goo.gl URL shortening service, saying that 99% of the shortened URLs receive no traffic. They were previously going to end support entirely, but after receiving feedback, they decided to continue support for a limited group of shortened URLs.

Google URL Shortener

Google announced in 2018 that they were deprecating the Google URL Shortener, no longer accepting new URLs for shortening but continuing to support existing URLs. Seven years later, they noticed that 99% of the shortened links did not receive any traffic at all, so on July 18 of this year, Google announced they would end support for all shortened URLs by August 25, 2025.

After receiving feedback, they changed their plan on August 1 and decided that they would move ahead with ending support for URLs that do not receive traffic, but continue servicing shortened URLs that still receive traffic.

Google’s announcement explained:

“While we previously announced discontinuing support for all goo.gl URLs after August 25, 2025, we’ve adjusted our approach in order to preserve actively used links.

We understand these links are embedded in countless documents, videos, posts and more, and we appreciate the input received.

…If you get a message that states, “This link will no longer work in the near future”, the link won’t work after August 25 and we recommend transitioning to another URL shortener if you haven’t already.

…All other goo.gl links will be preserved and will continue to function as normal.”

If you have a goog.gl redirected link, Google recommends visiting the link to check if it displays a warning message. If it does move the link to another URL shortener. If it doesn’t display the warning then the link will continue to function.

Featured Image by Shutterstock/fizkes