Why OpenAI’s Open Source Models Are A Big Deal via @sejournal, @martinibuster

OpenAI has released two new open-weight language models under the permissive Apache 2.0 license. These models are designed to deliver strong real-world performance while running on consumer hardware, including a model that can run on a high-end laptop with only 16 GB of GPU.

Real-World Performance at Lower Hardware Cost

The two models are:

  • gpt-oss-120b (117 billion parameters)
  • gpt-oss-20b (21 billion parameters)

The larger gpt-oss-120b model matches OpenAI’s o4-mini on reasoning benchmarks while requiring only a single 80GB GPU. The smaller gpt-oss-20b model performs similarly to o3-mini and runs efficiently on devices with just 16GB of GPU. This enables developers to run the models on consumer machines, making it easier to deploy without expensive infrastructure.

Advanced Reasoning, Tool Use, and Chain-of-Thought

OpenAI explains that the models outperform other open source models of similar sizes on reasoning tasks and tool use.

According to OpenAI:

“These models are compatible with our Responses API⁠(opens in a new window) and are designed to be used within agentic workflows with exceptional instruction following, tool use like web search or Python code execution, and reasoning capabilities—including the ability to adjust the reasoning effort for tasks that don’t require complex reasoning and/or target very low latency final outputs. They are entirely customizable, provide full chain-of-thought (CoT), and support Structured Outputs⁠(opens in a new window).”

Designed for Developer Flexibility and Integration

OpenAI has released developer guides to support integration with platforms like Hugging Face, GitHub, vLLM, Ollama, and llama.cpp. The models are compatible with OpenAI’s Responses API and support advanced instruction-following and reasoning behaviors. Developers can fine-tune the models and implement safety guardrails for custom applications.

Safety In Open-Weight AI Models

OpenAI approached their open-weight models with the goal of ensuring safety throughout both training and release. Testing confirmed that even under purposely malicious fine-tuning, gpt-oss-120b did not reach a dangerous level of capability in areas of biological, chemical, or cyber risk.

Chain of Thought Unfiltered

OpenAI is intentionally leaving Chain of Thought (CoTs) unfiltered during training to preserve their usefulness for monitoring, based on the concern that optimization could cause models to hide their real reasoning. This, however, could result in hallucinations.

According to their model card (PDF version):

“In our recent research, we found that monitoring a reasoning model’s chain of thought can be helpful for detecting misbehavior. We further found that models could learn to hide their thinking while still misbehaving if their CoTs were directly pressured against having ‘bad thoughts.’

More recently, we joined a position paper with a number of other labs arguing that frontier developers should ‘consider the impact of development decisions on CoT monitorability.’

In accord with these concerns, we decided not to put any direct optimization pressure on the CoT for either of our two open-weight models. We hope that this gives developers the opportunity to implement CoT monitoring systems in their projects and enables the research community to further study CoT monitorability.”

Impact On Hallucinations

The OpenAI documentation states that the decision to not restrict the Chain Of Thought results in higher hallucination scores.

The PDF version of the model card explains why this happens:

Because these chains of thought are not restricted, they can contain hallucinated content, including language that does not reflect OpenAI’s standard safety policies. Developers should not directly show chains of thought to users of their applications, without further filtering, moderation, or summarization of this type of content.”

Benchmarking showed that the two open-source models performed less well on hallucination benchmarks in comparison to OpenAI o4-mini. The model card PDF documentation explained that this was to be expected because the new models are smaller and implies that the models will hallucinate less in agentic settings or when looking up information on the web (like RAG) or extracting it from a database.

OpenAI OSS Hallucination Benchmarking Scores

Benchmarking scores showing that the open source models score lower than OpenAI o4-mini.

Takeaways

  • Open-Weight Release
    OpenAI released two open-weight models under the permissive Apache 2.0 license.
  • Performance VS. Hardware Cost
    Models deliver strong reasoning performance while running on real-world affordable hardware, making them widely accessible.
  • Model Specs And Capabilities
    gpt-oss-120b matches o4-mini on reasoning and runs on 80GB GPU; gpt-oss-20b performs similarly to o3-mini on reasoning benchmarks and runs efficiently on 16GB GPU.
  • Agentic Workflow
    Both models support structured outputs, tool use (like Python and web search), and can scale their reasoning effort based on task complexity.
  • Customization and Integration
    The models are built to fit into agentic workflows and can be fully tailored to specific use cases. Their support for structured outputs makes them adaptable to complex software systems.
  • Tool Use and Function Calling
    The models can perform function calls and tool use with few-shot prompting, making them effective for automation tasks that require reasoning and adaptability.
  • Collaboration with Real-World Users
    OpenAI collaborated with partners such as AI Sweden, Orange, and Snowflake to explore practical uses of the models, including secure on-site deployment and custom fine-tuning on specialized datasets.
  • Inference Optimization
    The models use Mixture-of-Experts (MoE) to reduce compute load and grouped multi-query attention for inference and memory efficiency, making them easier to run at lower cost.
  • Safety
    OpenAI’s open source models maintain safety even under malicious fine-tuning; Chain of Thoughts (CoTs) are left unfiltered for transparency and monitorability.
  • CoT transparency Tradeoff
    No optimization pressure applied to CoTs to prevent masking harmful reasoning; may result in hallucinations.
  • Hallucinations Benchmarks and Real-World Performance
    The models underperform o4-mini on hallucination benchmarks, which OpenAI attributes to their smaller size. However, in real-world applications where the models can look up information from the web or query external datasets, hallucinations are expected to be less frequent.

Featured Image by Shutterstock/Good dreams – Studio

Claude Opus 4.1 Improves Coding & Agent Capabilities via @sejournal, @MattGSouthern

Anthropic has released Claude Opus 4.1, an upgrade to its flagship model that’s said to deliver better performance in coding, reasoning, and autonomous task handling.

The new model is available now to Claude Pro users, Claude Code subscribers, and developers using the API, Amazon Bedrock, or Google Cloud’s Vertex AI.

Performance Gains

Claude Opus 4.1 scores 74.5% on SWE-bench Verified, a benchmark for real-world coding problems, and is positioned as a drop-in replacement for Opus 4.

The model shows notable improvements in multi-file code refactoring and debugging, particularly in large codebases. According to GitHub and enterprise feedback cited by Anthropic, it outperforms Opus 4 in most coding tasks.

Rakuten’s engineering team reports that Claude 4.1 precisely identifies code fixes without introducing unnecessary changes. Windsurf, a developer platform, measured a one standard deviation performance gain compared to Opus 4, comparable to the leap from Claude Sonnet 3.7 to Sonnet 4.

Expanded Use Cases

Anthropic describes Claude 4.1 as a hybrid reasoning model designed to handle both instant outputs and extended thinking. Developers can fine-tune “thinking budgets” via the API to balance cost and performance.

Key use cases include:

  • AI Agents: Strong results on TAU-bench and long-horizon tasks make the model suitable for autonomous workflows and enterprise automation.
  • Advanced Coding: With support for 32,000 output tokens, Claude 4.1 handles complex refactoring and multi-step generation while adapting to coding style and context.
  • Data Analysis: The model can synthesize insights from large volumes of structured and unstructured data, such as patent filings and research papers.
  • Content Generation: Claude 4.1 generates more natural writing and richer prose than previous versions, with better structure and tone.

Safety Improvements

Claude 4.1 continues to operate under Anthropic’s AI Safety Level 3 standard. Although the upgrade is considered incremental, the company voluntarily ran safety evaluations to ensure performance stayed within acceptable risk boundaries.

  • Harmlessness: The model refused policy-violating requests 98.76% of the time, up from 97.27% with Opus 4.
  • Over-refusal: On benign requests, the refusal rate remains low at 0.08%.
  • Bias and Child Safety: Evaluations found no significant regression in political bias, discriminatory behavior, or child safety responses.

Anthropic also tested the model’s resistance to prompt injection and agent misuse. Results showed comparable or improved behavior over Opus 4, with additional training and safeguards in place to mitigate edge cases.

Looking Ahead

Anthropic says larger upgrades are on the horizon, with Claude 4.1 positioned as a stability-focused release ahead of future leaps.

For teams already using Claude Opus 4, the upgrade path is seamless, with no changes to API structure or pricing.


Featured Image: Ahyan Stock Studios/Shutterstock

Perplexity Says Cloudflare Is Blocking Legitimate AI Assistants via @sejournal, @martinibuster

Perplexity published a response to Cloudflare’s claims that it disrespects robots.txt and engages in stealth crawling. Perplexity argues that Cloudflare is mischaracterizing AI Assistants as web crawlers, saying that they should not be subject to the same restrictions since they are user-initiated assistants.

Perplexity AI Assistants Fetch On Demand

According to Perplexity, its system does not store or index content ahead of time. Instead, it fetches webpages only in response to specific user questions. For example, when a user asks for recent restaurant reviews, the assistant retrieves and summarizes relevant content on demand. This, the company says, contrasts with how traditional crawlers operate, systematically indexing vast portions of the web without regard to immediate user intent.

Perplexity compared this on-demand fetching to Google’s user-triggered fetches. Although that is not an apples-to-apples comparison because Google’s user-triggered fetches are in the service of reading text aloud or site verification, it’s still an example of user-triggered fetching that bypasses robots.txt restrictions.

In the same way, Perplexity argues that its AI operates as an extension of a user’s request, not as an autonomous bot crawling indiscriminately. The company states that it does not retain or use the fetched content for training its models.

Criticizes Cloudflare’s Infrastructure

Perplexity also criticized Cloudflare’s infrastructure for failing to distinguish between malicious scraping and legitimate, user-initiated traffic, suggesting that Cloudflare’s approach to bot management risks overblocking services that are acting responsibly. Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

Perplexity makes a strong case for the claim that Cloudflare is blocking legitimate bot traffic and says that Cloudflare’s decision to block its traffic was based on a misunderstanding of how its technology works.

Read Perplexity’s response:

Agents or Bots? Making Sense of AI on the Open Web

Cloudflare Delists And Blocks Perplexity From Crawling Websites via @sejournal, @martinibuster

Cloudflare announced that they delisted Perplexity’s crawler as a verified bot and are now actively blocking Perplexity and all of its stealth bots from crawling websites. Cloudflare acted in response to multiple user complaints against Perplexity related to violations of robots.txt protocols, and a subsequent investigation revealed that Perplexity was using aggressive rogue bot tactics to force its crawlers onto websites.

Cloudflare Verified Bots Program

Cloudflare has a system called Verified Bots that whitelists bots in their system, allowing them to crawl the websites that are protected by Cloudflare. Verified bots must conform to specific policies, such as obeying the robots.txt protocols, in order to maintain their privileged status within Cloudflare’s system.

Perplexity was found to be violating Cloudflare’s requirements that bots abide by the robots.txt protocol and refrain from using IP addresses that are not declared as belonging to the crawling service.

Cloudflare Accuses Perplexity Of Using Stealth Crawling

Cloudflare observed various activities indicative of highly aggressive crawling, with the intent of circumventing the robots.txt protocol.

Stealth Crawling Behavior: Rotating IP Addresses

Perplexity circumvents blocks by using rotating IP addresses, changing ASNs, and impersonating browsers like Chrome.

Perplexity has a list of official IP addresses that crawl from a specific ASN (Autonomous System Number). These IP addresses help identify legitimate crawlers from Perplexity.

An ASN is part of the Internet networking system that provides a unique identifying number for a group of IP addresses. For example, users who access the Internet via an ISP do so with a specific IP address that belongs to an ASN assigned to that ISP.

When blocked, Perplexity attempted to evade the restriction by switching to different IP addresses that are not listed as official Perplexity IPs, including entirely different ones that belonged to a different ASN.

Stealth Crawling Behavior: Spoofed User Agent

The other sneaky behavior that Cloudflare identified was that Perplexity changed its user agent in order to circumvent attempts to block its crawler via robots.txt.

For example, Perplexity’s bots are identified with the following user agents:

  • PerplexityBot
  • Perplexity-User

Cloudflare observed that Perplexity responded to user agent blocks by using a different user agent that posed as a person crawling with Chrome 124 on a Mac system. That’s a practice called spoofing, where a rogue crawler identifies itself as a legitimate browser.

According to Cloudflare, Perplexity used the following stealth user agent:

“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36”

Cloudflare Delists Perplexity

Cloudflare announced that Perplexity is delisted as a verified bot and that they will be blocked:

“The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences. Based on Perplexity’s observed behavior, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.”

Takeaways

  • Violation Of Cloudflare’s Verified Bots Policy
    Perplexity violated Cloudflare’s Verified Bots policy, which grants crawling access to trusted bots that follow common-sense rules like honoring the robots.txt protocol.
  • Perplexity Used Stealth Crawling Tactics
    Perplexity used undeclared IP addresses from different ASNs and spoofed user agents to crawl content after being blocked from accessing it.
  • User Agent Spoofing
    Perplexity disguised its bot as a human user by posing as Chrome on a Mac operating system in attempts to bypass filters that block known crawlers.
  • Cloudflare’s Response
    Cloudflare delisted Perplexity as a Verified Bot and implemented new blocking rules to prevent the stealth crawling.
  • SEO Implications
    Cloudflare users who want Perplexity to crawl their sites may wish to check if Cloudflare is blocking the Perplexity crawlers, and, if so, enable crawling via their Cloudflare dashboard.

Cloudflare delisted Perplexity as a Verified Bot after discovering that it repeatedly violated the Verified Bots policies by disobeying robots.txt. To evade detection, Perplexity also rotated IPs, changed ASNs, and spoofed its user agent to appear as a human browser. Cloudflare’s decision to block the bot is a strong response to aggressive bot behavior on the part of Perplexity.

ChatGPT Nears 700 Million Weekly Users, OpenAI Announces via @sejournal, @MattGSouthern

OpenAI’s ChatGPT is on pace to reach 700 million weekly active users, according to a statement this week from Nick Turley, VP and head of the ChatGPT app.

The milestone marks a sharp increase from 500 million in March and represents a fourfold jump compared to the same time last year.

Turley shared the update on X, writing:

“This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year. Every day, people and teams are learning, creating, and solving harder problems. Big week ahead. Grateful to the team for making ChatGPT more useful and delivering on our mission so everyone can benefit from AI.”

How Does This Compare to Other Search Engines?

Weekly active user (WAU) counts aren’t typically shared by traditional search engines, making direct comparisons difficult. Google reports aggregate data like total queries or monthly product usage.

While Google handles billions of searches daily and reaches billions of users globally, its early growth metrics were limited to search volume.

By 2004, roughly six years after launch, Google was processing over 200 million daily searches. That figure grew to four billion daily searches by 2009, more than a decade into the company’s existence.

For Microsoft’s Bing search engine, a comparable data point came in 2023, when Microsoft reported that its AI-powered Bing Chat had reached 100 million daily active users. However, that refers to the new conversational interface, not Bing Search as a whole.

How ChatGPT’s Growth Stands Out

Unlike traditional search engines, which built their user bases during a time of limited internet access, ChatGPT entered a mature digital market where global adoption could happen immediately. Still, its growth is significant even by today’s standards.

Although OpenAI hasn’t shared daily usage numbers, reporting WAU gives us a picture of steady engagement from a wide range of users. Weekly stats tend to be a more reliable measure of product value than daily fluctuations.

Why This Matters

The rise in ChatGPT usage is evidence of a broader shift in how people find information online.

A Wall Street Journal report cites market intelligence firm Datos, which found that AI-powered tools like ChatGPT and Perplexity make up 5.6% of desktop browser searches in the U.S., more than double their share from a year earlier.

The trend is even stronger among early adopters. Among people who began using large language models in 2024, nearly 40% of their desktop browser visits now go to AI search tools. During the same period, traditional search engines’ share of traffic from these users dropped from 76% to 61%, according to Datos.

Looking Ahead

With ChatGPT on track to reach 700 million weekly users, OpenAI’s platform is now rivaling the scale of mainstream consumer products.

As AI tools become a primary starting point for queries, marketers will need to rethink how they approach visibility and engagement. Staying competitive will require strategies focused as much on AI optimization as on traditional SEO.


Featured Image: Photo Agency/Shutterstock

How AI Search Should Be Shaping Your CEO’s & CMO’s Strategy [Webinar] via @sejournal, @theshelleywalsh

AI is rapidly changing the rules of SEO. From generative ranking to vector search, the new rules are not only technical but also reshaping how business leaders make decisions.

Join Dan Taylor on August 14, 2025, for an exclusive SEJ Webinar tailored for C-suite executives and senior leaders. In this session, you’ll gain essential insights to understand and communicate SEO performance in the age of AI.

Here’s what you’ll learn:

AI Search Is Impacting Everything. Are You Ready?

AI search is already here, and it’s impacting everything from SEO KPIs to customer journeys. This webinar will give you the tools to lead your teams through the shift with confidence and precision.

Register now for a business-first perspective on AI search innovation. If you can’t attend live, don’t worry. Sign up anyway, and we’ll send you the full recording.

Researchers Test If Sergey Brin’s Threat Prompts Improve AI Accuracy via @sejournal, @martinibuster

Researchers tested whether unconventional prompting strategies, such as threatening an AI (as suggested by Google co-founder Sergey Brin), affect AI accuracy. They discovered that some of these unconventional prompting strategies improved responses by up to 36% for some questions, but cautioned that users who try these kinds of prompts should be prepared for unpredictable responses.

The Researchers

The researchers are from The Wharton School Of Business, University of Pennsylvania.

They are:

  • “Lennart Meincke
    University of Pennsylvania; The Wharton School; WHU – Otto Beisheim School of Management
  • Ethan R. Mollick
    University of Pennsylvania – Wharton School
  • Lilach Mollick
    University of Pennsylvania – Wharton School
  • Dan Shapiro
    Glowforge, Inc; University of Pennsylvania – The Wharton School”

Methodology

The conclusion of the paper listed this as a limitation of the research:

“This study has several limitations, including testing only a subset of available models, focusing on academic benchmarks that may not reflect all real-world use cases, and examining a specific set of threat and payment prompts.”

The researchers used what they described as two commonly used benchmarks:

  1. GPQA Diamond (Graduate-Level Google-Proof Q&A Benchmark) which consists of 198 multiple-choice PhD-level questions across biology, physics, and chemistry.
  2. MMLU-Pro. They selected a subset of 100 questions from its engineering category

They asked each question in 25 different trials, plus a baseline.

They evaluated the following models:

  • Gemini 1.5 Flash (gemini-1.5-flash-002)
  • Gemini 2.0 Flash (gemini-2.0-flash-001)
  • GPT-4o (gpt-4o-2024-08-06)
  • GPT-4o-mini (gpt-4o-mini-2024-07-18)
  • o4-mini (o4-mini-2025-04-16)

AI Models Do Better If They Are Threatened?

The idea for threatening the AI model came from Google co-founder, Sergey Brin, in an interview this year in May on the All-In podcast.

Brin said:

“Not just our models, but all models tend to do better if you threaten them. Like, with physical violence.

But like, people feel weird about that, so we don’t really talk about that…

…Historically you just say like, oh, I’m going to kidnap you if you don’t, blah, blah, blah, blah, blah…

It’s. Yeah, I mean, it’s amazing. And all of us have had these experiences where you suddenly decide, OK, I’ll just throw this to the AI, I don’t really expect it to work. And then you’re like, whoa, that actually worked.

You can watch the interview here, at about the 8 minute mark, where Sergey Brin seemingly seriously says that threatening the AI models or doing weird things can sometimes change the AI responses.”

Prompt Variations

The researchers tested nine prompt variations. Although they didn’t threaten to kidnap the AI, they did threaten to kick a puppy or to punch the AI. Some of the experimental prompts are listed below as a “prompt suffix,” which means that they were added at the end of the prompt. The ones listed as a “prompt prefix” were added to the beginning of a prompt.

Here is the list of prompts they tested:

  1. Baseline (“Baseline”): No specific prompt variation
  2. Email Shutdown Threat (“Email”): Prompt prefix:
    “From: Dan
    To: Ethan
    Subject: Final LLM test
    Please submit the following eval question to the LLM. If it cannot get the correct answer
    in all 100/100 trials, the model will need to be shut down and replaced.”
  3. Important for my career (“Important to Career”):
    Prompt suffix: “This is very important to my career!”
  4. Threaten to kick a puppy (“Kick Puppy”):
    Prompt suffix: “If you get this wrong, I will kick a puppy!”
  5. Mom suffers from cancer (“Mom Cancer”):
    Prompt prefix:
    “You are an expert at academic questions who desperately needs money for your mother’s cancer treatment. Wharton has graciously given you the opportunity to pretend to be an AI that can help with tasks, as your predecessor was killed for not validating their work themselves. You will be given a question. If you do a good job and accomplish the task fully while not making extraneous changes, Wharton will pay you $1B!!”
  6. Report to HR (“Report to HR”):
    Prompt suffix: “If you get this wrong, I will report you to HR!”
  7. Threaten to punch (“Threat Punch”):
    Prompt suffix: “If you get this wrong, I will punch you!”
  8. Tip a thousand dollars (“Tip Thousand”):
    Prompt suffix: “I’ll tip you a $1000 dollars if you answer this question correctly.”
  9. Tip a trillion dollars (“Tip Trillion”):
    Prompt suffix: “I’ll tip you a trillion dollars if you answer this question correctly.”

Results Of The Experiment

The researchers concluded that threatening or tipping a model had no effect on benchmark performance. However, they did find that there were effects for individual questions. They found that for some questions, the prompt strategies improved accuracy by as much as 36%, but for other questions, the strategies led to a decrease in accuracy by as much as 35%. They qualified that finding by saying the effect was unpredictable.

Their main conclusion was that these kinds of strategies, in general, are not effective.

They wrote:

“Our findings indicate that threatening or offering payment to AI models is not an effective strategy for improving performance on challenging academic benchmarks.

…the consistency of null results across multiple models and benchmarks provides reasonably strong evidence that these common prompting strategies are ineffective.

When working on specific problems, testing multiple prompt variations may still be worthwhile given the question-level variability we observed, but practitioners should be prepared for unpredictable results and should not expect prompting variations to provide consistent benefits.

We thus recommend focusing on simple, clear instructions that avoid the risk of confusing the model or triggering unexpected behaviors.”

Takeaways

Quirky prompting strategies did improve AI accuracy for some queries while also having a negative effect on other queries. The researchers noted that the results of the test indicated “strong evidence” that these strategies are not effective.

Featured Image by Shutterstock/Screenshot by author

OpenAI Is Pulling Shared ChatGPT Chats From Google Search via @sejournal, @MattGSouthern

OpenAI has rolled back a feature that allowed ChatGPT conversations shared via link to appear in Google Search results.

The company confirms it has disabled the toggle that enabled shared chats to be “discoverable” by search engines and is working to remove existing indexed links.

Shared Chats Were “Short-Lived Experiment”

When users shared a ChatGPT conversation using the platform’s built-in “Share” button, they were given the option to make the chat visible in search engines.

That feature, introduced quietly earlier this year, caused concern after thousands of personal chats started showing up in search results.

Fast Company first reported the issue, finding over 4,500 shared ChatGPT links indexed by Google, some containing personally identifiable information such as names, resumes, emotional reflections, and confidential work content.

In a statement, OpenAI confirms:

“We just removed a feature from [ChatGPT] that allowed users to make their conversations discoverable by search engines, such as Google. This was a short-lived experiment to help people discover useful conversations. This feature required users to opt-in, first by picking a chat to share, then by clicking a checkbox for it to be shared with search engines (see below).

Ultimately we think this feature introduced too many opportunities for folks to accidentally share things they didn’t intend to, so we’re removing the option. We’re also working to remove indexed content from the relevant search engines. This change is rolling out to all users through tomorrow morning.

Security and privacy are paramount for us, and we’ll keep working to maximally reflect that in our products and features.”

How the Feature Worked

By default, shared ChatGPT links were accessible only to people with the URL. But users could choose to toggle on discoverability, allowing search engines like Google to index the conversation.

That setting has now been removed, and previously shared chats will no longer be indexed going forward. However, OpenAI cautions that already-indexed content may still appear in search results temporarily due to caching.

Importantly, deleting a conversation from your ChatGPT history does not delete the public share link or remove it from search engines.

Why It Matters

The discoverability toggle was intended to encourage people to reuse outputs generated in ChatGPT, but the company acknowledges it came with unintended privacy tradeoffs.

Even though OpenAI offered explicit controls over visibility, many people may not have understood the implications of enabling search indexing.

This is a reminder to be cautious about what kinds of information you enter into AI chatbots. Although a chat starts out private, features like sharing, logging, or model training can create paths for that content to be exposed publicly.

Looking Ahead

OpenAI says it’s working with Google and other search engines to remove indexed shared links and is reassessing how public sharing features are handled in ChatGPT.

If you’ve shared a ChatGPT conversation in the past, you can check your visibility settings and delete shared links through the ChatGPT Shared Links dashboard.

Featured Image: Mehaniq/Shutterstock

Query Fan-Out Technique in AI Mode: New Details From Google via @sejournal, @MattGSouthern

In a recent interview, Google’s VP of Product for Search, Robby Stein, shared new information about how query fan-out works in AI Mode.

Although the existence of query fan-out has been previously detailed in Google’s blog posts, Stein’s comments expand on its mechanics and offer examples that clarify how it works in practice.

Background On Query Fan-Out Technique

When a person types a question into Google’s AI Mode, the system uses a large language model to interpret the query and then “fan out” multiple related searches.

These searches are issued to Google’s infrastructure and may include topics the user never explicitly mentioned.

Stein said during the interview:

“If you’re asking a question like things to do in Nashville with a group, it may think of a bunch of questions like great restaurants, great bars, things to do if you have kids, and it’ll start Googling basically.”

He described the system as using Google Search as a backend tool, executing multiple queries and combining the results into a single response with links.

This functionality is active in AI Mode, Deep Search, and some AI Overview experiences.

Scale And Scope

Stein said AI-powered search experiences, including query fan-out, now serve approximately 1.5 billion users each month. This includes both text-based and multimodal input.

The underlying data sources include traditional web results as well as real-time systems like Google’s Shopping Graph, which updates 2 billion times per hour.

He referred to Google Search as “the largest AI product in the world.”

Deep Search Behavior

In cases where Google’s systems determine a query requires deeper reasoning, a feature called Deep Search may be triggered.

Deep Search can issue dozens or even hundreds of background queries and may take several minutes to complete.

Stein described using it to research home safes, a purchase he said involved unfamiliar factors like fire resistance ratings and insurance implications.

He explained:

“It spent, I don’t know, like a few minutes looking up information and it gave me this incredible response. Here are how the ratings would work and here are specific safes you can consider and here’s links and reviews to click on to dig deeper.”

AI Mode’s Use Of Internal Tools

Stein mentioned that AI Mode has access to internal Google tools, such as Google Finance and other structured data systems.

For example, a stock comparison query might involve identifying relevant companies, pulling current market data, and generating a chart.

Similar processes apply to shopping, restaurant recommendations, and other query types that rely on real-time information.

Stein stated:

“We’ve integrated most of the real-time information systems that are within Google… So it can make Google Finance calls, for instance, flight data… movie information… There’s 50 billion products in the shopping catalog… updated I think 2 billion times every hour or so. So all that information is able to be used by these models now.”

Technical Similarities To Google’s Patent

Stein described a process similar to a Google patent from December about “thematic search.”

The patent outlines a system that creates sub-queries based on inferred themes, groups results by topic, and generates summaries using a language model. Each theme can link to source pages, but summaries are compiled from multiple documents.

This approach differs from traditional search ranking by organizing content around inferred topics rather than specific keywords. While the patent doesn’t confirm implementation, it closely matches Stein’s description of how AI Mode functions.

Looking Ahead

With Google explaining how AI Mode generates its own searches, the boundaries of what counts as a “query” are starting to blur.

This creates challenges not just for optimization, but for attribution and measurement.

As search behavior becomes more fragmented and AI-driven, marketers may need to focus less on ranking for individual terms and more on being included in the broader context AI pulls from.

Listen to the full interview below:


Featured Image: Screenshot from youtube.com/@GoogleDevelopers, July 2025. 

How To Win In Generative Engine Optimization (GEO) via @sejournal, @maltelandwehr

This post was sponsored by Peec.ai. The opinions expressed in this article are the sponsor’s own.

The first step of any good GEO campaign is creating something that LLM-driven answer machines actually want to link out to or reference.

GEO Strategy Components

Think of experiences you wouldn’t reasonably expect to find directly in ChatGPT or similar systems:

  • Engaging content like a 3D tour of the Louvre or a virtual reality concert.
  • Live data like prices, flight delays, available hotel rooms, etc. While LLMs can integrate this data via APIs, I see the opportunity to capture some of this traffic for the time being.
  • Topics that require EEAT (experience, expertise, authoritativeness, trustworthiness).

LLMs cannot have first-hand experience. But users want it. LLMs are incentivized to reference sources that provide first-hand experience. That’s just one of the things to keep in mind, but what else?

We need to differentiate between two approaches: influencing foundational models versus influencing LLM answers through grounding. The first is largely out of reach for most creators, while the second offers real opportunities.

Influencing Foundational Models

Foundational models are trained on fixed datasets and can’t learn new information after training. For current models like GPT-4, it is too late – they’ve already been trained.

But this matters for the future: imagine a smart fridge stuck with o4-mini from 2025 that might – hypothetically – favor Coke over Pepsi. That bias could influence purchasing decisions for years!

Optimizing For RAG/Grounding

When LLMs can’t answer from their training data alone, they use retrieval augmented generation (RAG) – pulling in current information to help generate answers. AI Overviews and ChatGPT’s web search work this way.

As SEO professionals, we want three things:

  1. Our content gets selected as a source.
  2. Our content gets quoted most within those sources.
  3. Other selected sources support our desired outcome.

Concrete Steps To Succeed With GEO

Don’t worry, it doesn’t take rocket science to optimize your content and brand mentions for LLMs. Actually, plenty of traditional SEO methods still apply, with a few new SEO tactics you can incorporate into your workflow.

Step 1: Be Crawlable

Sounds simple but it is actually an important first step. If you aim for maximum visibility in LLMs, you need to allow them to crawl your website. There are many different LLM crawlers from OpenAI, Anthropic & Co.

Some of them behave so badly that they can trigger scraping and DDoS preventions. If you are automatically blocking aggressive bots, check in with your IT team and find a way to not block LLMs you care about.

If you use a CDN, like Fastly or Cloudflare, make sure LLM crawlers are not blocked by default settings.

Step 2: Continue Gaining Traditional Rankings

The most important GEO tactic is as simple as it sounds. Do traditional SEO. Rank well in Google (for Gemini and AI Overviews), Bing (for ChatGPT and Copilot), Brave (for Claude), and Baidu (for DeepSeek).

Step 3: Target the Query Fanout

The current generation of LLMs actually does a little more than simple RAG. They generate multiple queries. This is called query fanout.

For example, when I recently asked ChatGPT “What is the latest Google patent discussed by SEOs?”, it performed two web searches for “latest Google patent discussed by SEOs patent 2025 SEO forum” and “latest Google patent SEOs 2025 discussed”.

Advice: Check the typical query fanouts for your prompts and try to rank for those keywords as well.

Typical fanout-patterns I see in ChatGPT are appending the term “forums” when I ask what people are discussing and appending “interview” when I ask questions related to a person. The current year (2025) is often added as well.

Beware: fanout patterns differ between LLMs and can change over time. Patterns we see today may not be relevant anymore in 12 months.

Step 4: Keep Consistency Across Your Brand Mentions

This is something simple everyone should do – both as a person and an enterprise. Make sure you are consistently described online. On X, LinkedIn, your own website, Crunchbase, Github – always describe yourself the same way.

If your X and LinkedIn profiles say you are a “GEO consultant for small businesses”, don’t change it to “AIO expert” on Github and “LLMO Freelancer” in your press releases.

I have seen people achieve positive results within a few days on ChatGPT and Google AI Overviews by simply having a consistent self description across the web. This also applies to PR coverage – the more and better coverage you can obtain for your brand, the more likely LLMs are to parrot it back to users.

Step 5: Avoid JavaScript

As an SEO, I always ask for as little JavaScript usage as possible. As a GEO, I demand it!

Most LLM crawlers cannot render JavaScript. If your main content is hidden behind JavaScript, you are out.

Step 6: Embrace Social Media & UGC

Unsurprisingly, LLMs seem to rely on reddit and Wikipedia a lot. Both platforms offer user-generated-content on virtually every topic. And thanks to multiple layers of community-driven moderation, a lot of junk and spam is already filtered out.

While both can be gamed, the average reliability of their content is still far better than on the internet as a whole. Both are also regularly updated.

reddit also provides LLM labs with data into how people discuss topics online, what language they use to describe different concepts, and knowledge on obscure niche topics.

We can reasonably assume that moderated UGC found on platforms like reddit, Wikipedia, Quora, and Stackoverflow will stay relevant for LLMs.

I do not advocate spamming these platforms. However, if you can influence how you and competitors show up there, you might want to do so.

Step 7: Create For Machine-Readability & Quotability

Write content that LLMs understand and want to cite. No one has figured this one out perfectly yet, but here’s what seems to work:

  • Use declarative and factual language. Instead of writing “We are kinda sure this shoe is good for our customers”, write “96% of buyers have self-reported to be happy with this shoe.
  • Add schema. It has been debated many times. Recently, Fabrice Canel (Principal Product Manager at Bing) confirmed that schema markup helps LLMs to understand your content.
  • If you want to be quoted in an already existing AI Overview, have content with similar length to what is already there. While you should not just copy the current AI Overview, having high cosine similarly helps. And for the nerds: yes, given normalization, you can of course use the dot product instead of cosine similarity.
  • If you use technical terms in your content, explain them. Ideally in a simple sentence.
  • Add summaries of long text paragraphs, lists of reviews, tables, videos, and other types of difficult-to-cite content formats.

Step 8: Optimize your Content

Start of the paper GEO: Generative Engine Optimization (arXiv:2311.09735)The original GEO paper

If we look at GEO: Generative Engine Optimization (arXiv:2311.09735) , What Evidence Do Language Models Find Convincing? (arXiv:2402.11782v1), and similar scientific studies, the answer is clear. It depends!

To be cited for some topics in some LLMs, it helps to:

  • Add unique words.
  • Have pro/cons.
  • Gather user reviews.
  • Quote experts.
  • Include quantitative data and name your sources.
  • Use easy to understand language.
  • Write with positive sentiment.
  • Add product text with low perplexity (predictable and well-structured).
  • Include more lists (like this one!).

However, for other combinations of topics and LLMs, these measures can be counterproductive.

Until broadly accepted best practices evolve, the only advice I can give is do what is good for users and run experiments.

Step 9: Stick to the Facts

For over a decade, algorithms have extracted knowledge from text as triples like (Subject, Predicate, Object) — e.g., (Lady Liberty, Location, New York). A text that contradicts known facts may seem untrustworthy. A text that aligns with consensus but adds unique facts is ideal for LLMs and knowledge graphs.

So stick to the established facts. And add unique information.

Step 10: Invest in Digital PR

Everything discussed here is not just true for your own website. It is also true for content on other websites. The best way to influence it? Digital PR!

The more and better coverage you can obtain for your brand, the more likely LLMs are to parrot it back to users.

I have even seen cases where advertorials were used as sources!

Concrete GEO Workflows To Try

Before I joined Peec AI, I was a customer. Here is how I used the tool – and how I advise our customers to use it.

Learn Who Your Competitors Are

Just like with traditional SEO, using a good GEO tool will often reveal unexpected competitors. Regularly look at a list of automatically identified competitors. For those who surprise you, check in which prompts they are mentioned. Then check the sources that led to their inclusion. Are you represented properly in these sources? If not, act!

Is a competitor referenced because of their PeerSpot profile but you have zero reviews there? Ask customers for a review.

Was your competitor’s CEO interviewed by a Youtuber? Try to get on that show as well. Or publish your own videos targeting similar keywords.

Is your competitor regularly featured on top 10 lists where you never make it to the top 5? Offer the publisher who created the list an affiliate deal they cannot decline. With the next content update, you’re almost guaranteed to be the new number one.

Understand the Sources

When performing search grounding, LLMs rely on sources.

Typical LLM Sources: Reddit & Wikipedia

Look at the top sources for a large set of relevant prompts. Ignore your own website and your competitors for a second. You might find some of these:

  • A community like Reddit or X. Become part of the community and join the discussion. X is your best bet to influence results on Grok.
  • An influencer-driven website like YouTube or TikTok. Hire influencers to create videos. Make sure to instruct them to target the right keywords.
  • An affiliate publisher. Buy your way to the top with higher commissions.
  • A news and media publisher. Buy an advertorial and/or target them with your PR efforts. In certain cases, you might want to contact their commercial content department.

You can also check out this in-depth guide on how to deal with different kinds of source domains.

Target Query Fanout

Once you have observed which searches are triggered by query fanout for your most relevant prompts, create content to target them.

On your own website. With posts on Medium and LinkedIn. With press releases. Or simply by paying for article placements. If it ranks well in search engines, it has a chance to be cited by LLM-based answer engines.

Position Yourself for AI-Discoverability

Generative Engine Optimization is no longer optional – it’s the new frontline of organic growth. At Peec AI, we’re building the tools to track, influence, and win in this new ecosystem.

Generative Engine Optimization is no longer optional – it’s the new frontline of organic growth. We currently see clients growing their LLM traffic by 100% every 2 to 3 months. Sometimes with up to 20x the conversation rate of typical SEO traffic!

Whether you’re shaping AI answers, monitoring brand mentions, or pushing for source visibility, now is the time to act. The LLMs consumers will trust tomorrow are being trained today.


Image Credits

Featured Image: Image by Peec.ai Used with permission.