Hidden HTTP Page Can Cause Site Name Problems In Google via @sejournal, @MattGSouthern

Google’s John Mueller shared a case where a leftover HTTP homepage was causing unexpected site-name and favicon problems in search results.

The issue, which Mueller described on Bluesky, is easy to miss because Chrome can automatically upgrade HTTP requests to HTTPS, making the HTTP version easy to overlook.

What Happened

Mueller described the case as “a weird one.” The site used HTTPS, but a server-default HTTP homepage was still accessible at the HTTP version of the domain.

Mueller wrote:

“A hidden homepage causing site-name & favicon problems in Search. This was a weird one. The site used HTTPS, however there was a server-default HTTP homepage remaining.”

The tricky part is that Chrome can upgrade HTTP navigations to HTTPS, which makes the HTTP version easy to miss in normal browsing. Googlebot doesn’t follow Chrome’s upgrade behavior.

Mueller explained:

“Chrome automatically upgrades HTTP to HTTPS so you don’t see the HTTP page. However, Googlebot sees and uses it to influence the sitename & favicon selection.”

Google’s site name system pulls the name and favicon from the homepage to determine what to display in search results. The system reads structured data from the website, title tags, heading elements, og:site_name, and other signals on the homepage. If Googlebot is reading a server-default HTTP page instead of the actual HTTPS homepage, it’s working with the wrong signals.

How To Check For This

Mueller suggested two ways to see what Googlebot sees.

First, he joked that you could use AI. Then he corrected himself.

Mueller wrote:

“No wait, curl on the command line. Or a tool like the structured data test in Search Console.”

Running curl http://yourdomain.com from the command line would show the raw HTTP response without Chrome’s auto-upgrade. If the response returns a server-default page instead of your actual homepage, that’s the problem.

If you want to see what Google retrieved and rendered, use the URL Inspection tool in Search Console and run a Live Test. Google’s site name documentation also notes that site names aren’t supported in the Rich Results Test.

Why This Matters

The display of site names and favicons in search results is something we’ve been documenting since Google first replaced title tags with site names in 2022. Since then, the system has gone through multiple growing pains. Google expanded site name support to subdomains in 2023, then spent nearly a year fixing a bug where site names on internal pages didn’t match the homepage.

This case introduces a new complication. The problem wasn’t in the structured data or the HTTPS homepage itself. It was a ghost page in the HTTP version, which you’d have no reason to check because your browser never showed it.

Google’s site name documentation explicitly mentions duplicate homepages, including HTTP and HTTPS versions, and recommends using the same structured data for both. Mueller’s case shows what can go wrong when an HTTP version contains content different from the HTTPS homepage you intended to serve.

The takeaway for troubleshooting site-name or favicon problems in search results is to check the HTTP version of your homepage directly. Don’t rely on what Chrome shows you.

Looking Ahead

Google’s site name documentation specifies that WebSite structured data must be on “the homepage of the site,” defined as the domain-level root URI. For sites running HTTPS, that means the HTTPS homepage is the intended source.

If your site name or favicon looks wrong in search results and your HTTPS homepage has the correct structured data, check whether an HTTP version of the homepage still exists. Use curl or the URL Inspection tool’s Live Test to view it directly. If a server-default page is sitting there, removing it or redirecting HTTP to HTTPS at the server level should resolve the issue.

Google Can Now Monitor Search For Your Government IDs via @sejournal, @MattGSouthern
  • Google’s “Results about you” tool now lets you find and request removal of search results containing government-issued IDs.
  • This includes IDs like passports, driver’s licenses, and Social Security numbers.
  • The expansion is rolling out in the U.S. over the coming days, with additional regions planned.

Google’s Results about you tool now monitors Search results for government-issued IDs like passports, driver’s licenses, and Social Security numbers.

New Data Shows Googlebot’s 2 MB Crawl Limit Is Enough via @sejournal, @martinibuster

New data based on real-world actual web pages demonstrates that Googlebot’s crawl limit of two megabytes is more than adequate. New SEO tools provide an easy way to check how much the HTML of a web page weighs.

Data Shows 2 Megabytes Is Plenty

Raw HTML is basically just a text file. For a text file to get to two megabytes it would require over two million characters.

The HTTPArchive explains what’s in the HTML weight measurement:

“HTML bytes refers to the pure textual weight of all the markup on the page. Typically it will include the document definition and commonly used on page tags such as

or . However it also contains inline elements such as the contents of script tags or styling added to other tags. This can rapidly lead to bloating of the HTML doc.”

That is the same thing that Googlebot is downloading as HTML, just the on-page markup, not the links to JavaScript or CSS.

According to the HTTPArchive’s latest report, the real-world median average size of raw HTML is 33 kilobytes. The heaviest page weight at the 90th percentile is 155 kilobytes, meaning that the HTML for 90% of sites are less than or approximately equal to 155 kilobytes in size. Only at the 100th percentile does the size of HTML explode to way beyond two megabytes, which means that pages weighing two megabytes or more are extreme outliers.

The HTTPArchive report explains:

“HTML size remained uniform between device types for the 10th and 25th percentiles. Starting at the 50th percentile, desktop HTML was slightly larger.

Not until the 100th percentile is a meaningful difference when desktop reached 401.6 MB and mobile came in at 389.2 MB.”

The data separates the home page measurements from the inner page measurements and surprisingly shows that there is little difference between the weights of either. The data is explained:

“There is little disparity between inner pages and the home page for HTML size, only really becoming apparent at the 75th and above percentile.

At the 100th percentile, the disparity is significant. Inner page HTML reached an astounding 624.4 MB—375% larger than home page HTML at 166.5 MB.”

Mobile And Desktop HTML Sizes Are Similar

Interestingly, the page sizes between mobile and desktop versions were remarkably similar, regardless of whether HTTPArchive was measuring the home page or one of the inner pages.

HTTPArchive explains:

“The size difference between mobile and desktop is extremely minor, this implies that most websites are serving the same page to both mobile and desktop users.

This approach dramatically reduces the amount of maintenance for developers but does mean that overall page weight is likely to be higher as effectively two versions of the site are deployed into one page.”

Though the overall page weight might be higher since the mobile and desktop HTML exists simultaneously in the code, as noted earlier, the actual weight is still far below the two-megabyte threshold all the way up until the 100th percentile.

Given that it takes about two million characters to push the website HTML to two megabytes and that the HTTPArchive data based on actual websites shows that the vast majority of sites are well under Googlebot’s 2 MB limit, it’s safe to say it’s okay to scratch off HTML size from the list of SEO things to worry about.

Tame The Bots

Dave Smart of Tame The Bots recently posted that they updated their tool so that it now will stop crawling at the two megabyte limit for those whose sites are extreme outliers, showing at what point Googlebot would stop crawling a page.

Smart posted:

“At the risk of overselling how much of a real world issue this is (it really isn’t for 99.99% of sites I’d imagine), I added functionality to tamethebots.com/tools/fetch-… to cap text based files to 2 MB to simulate this.”

Screenshot Of Tame The Bots Interface

The tool will show what the page will look like to Google if the crawl is limited to two megabytes of HTML. But it doesn’t show whether the tested page exceeds two megabytes, nor does it show how much the web page weighs. For that, there are other tools.

Tools That Check Web Page Size

There are a few tool sites that show the HTML size but here are two that just show the web page size. I tested the same page on each tool and they both showed roughly the same page weight, give or take a few kilobytes.

Toolsaday Web Page Size Checker

The interestingly named Toolsaday web page size checker enables users to test one URL at a time. This specific tool just does the one thing, making it easy to get a quick reading of how much a web page weights in kilobytes (or higher if the page is in the 100th percentile).

Screenshot Of Toolsaday Test Results

Small SEO Tools Website Page Size Checker

The Small SEO Tools Website Page Size Checker differs from the Toolsaday tool in that Small SEO Tools enables users to test ten URLs at a time.

Not Something To Worry About

The bottom line about the two megabyte Googlebot crawl limit is that it’s not something the average SEO needs to worry about. It literally affects a very small percentage of outliers. But if it makes you feel better, give one of the above SEO tools a try to reassure yourself or your clients.

Featured Image by Shutterstock/Fathur Kiwon

Bing Webmaster Tools Adds AI Citation Performance Data via @sejournal, @MattGSouthern

Microsoft introduced an AI Performance dashboard in Bing Webmaster Tools, giving visibility into how content gets cited across Copilot and AI-generated answers in Bing.

The feature, now in public preview, shows citation counts, page-level activity, and trends over time. It covers AI experiences across Copilot, AI summaries in Bing, and select partner integrations.

Microsoft announced the feature on the Bing Webmaster Blog.

What’s New

The AI Performance dashboard provides four core metrics.

Total citations tracks how often your content appears as a source in AI-generated answers during a selected time period. Average cited pages shows the daily average of unique URLs from your site referenced across AI answers.

Page-level citation activity breaks down which specific URLs get cited most often. This lets you see which pages AI systems reference and how that activity changes over time.

The dashboard also introduces “grounding queries,” which Microsoft describes as the key phrases AI used when retrieving content for answers. The company notes this data represents a sample rather than complete citation activity.

A timeline view shows how citation patterns change over time across supported AI experiences.

Why This Matters

This is the first time Bing Webmaster Tools has shown how often content is cited in generative answers, including which URLs are referenced and how citation activity changes over time.

Google includes AI Overviews and AI Mode in Search Console’s overall Performance reporting, but it doesn’t offer a dedicated AI Overviews/AI Mode report or citation-style URL counts. AI Overviews also occupy a single position, with all links assigned that same position.

Bing’s dashboard goes further. It tracks which pages get cited, how often, and what phrases triggered the citation. That gives you data to work with instead of guesses.

Looking Ahead

AI Performance is available now in Bing Webmaster Tools as a public preview. Microsoft said it will continue refining metrics as more data is processed.

Bing has been building toward this for a while. The platform consolidated web search and chat metrics into a single dashboard and has added comparison features and content control tools since then.


Featured Image: Mijansk786/Shutterstock

OpenAI Begins Testing Ads In ChatGPT For Free And Go Users via @sejournal, @MattGSouthern

OpenAI is testing ads inside ChatGPT, bringing sponsored content to the product for the first time.

The test is live for logged-in adult users in the U.S. on the free and Go subscription tiers. Plus, Pro, Business, Enterprise, and Education subscribers won’t see ads.

OpenAI announced the launch with a brief blog post confirming that the principles it outlined in January are now in effect.

OpenAI’s post also adds Education to the list of ad-free tiers, which wasn’t included in the company’s initial plans.

How The Ads Work

Ads appear at the bottom of ChatGPT responses, visually separated from the answer and labeled as sponsored.

OpenAI says it selects ads by matching advertiser submissions with the topic of your conversation, your past chats, and past interactions with ads. If someone asks about recipes, they might see an ad for a meal kit or grocery delivery service.

Advertisers don’t see users’ conversations or personal details. They receive only aggregate performance data like views and clicks.

Users can dismiss ads, see why a specific ad appeared, turn off personalization, or clear all ad-related data. OpenAI also confirmed it won’t show ads in conversations about health, mental health, or politics, and won’t serve them to accounts identified as under 18.

Free users who don’t want ads have another option. OpenAI says you can opt out of ads in the Free tier in exchange for fewer daily free messages. Go users can avoid ads by upgrading to Plus or Pro.

The Path To Today

OpenAI first announced plans to test ads on January 16, alongside the U.S. launch of ChatGPT Go at $8 per month. The company laid out five principles. They cover mission alignment, answer independence, conversation privacy, choice and control, and long-term value.

The January post was careful to frame ads as supporting access rather than driving revenue. Altman wrote on X at the time:

“It is clear to us that a lot of people want to use a lot of AI and don’t want to pay, so we are hopeful a business model like this can work.”

That framing sits alongside OpenAI’s financial reality. Altman said in November that the company is considering infrastructure commitments totaling about $1.4 trillion over eight years. He also said OpenAI expects to end 2025 with an annualized revenue run rate above $20 billion. A source told CNBC that OpenAI expects ads to account for less than half of its revenue long term.

OpenAI has confirmed a $200,000 minimum commitment for early ChatGPT ads, Adweek reported. Digiday reported media buyers were quoted about $60 per 1,000 views for sponsored placements during the initial U.S. test.

Altman’s Evolving Position

The launch represents a notable turn from Altman’s earlier public statements on advertising.

In an October 2024 fireside chat at Harvard, Altman said he “hates” ads and called the idea of combining ads with AI “uniquely unsettling,” as CNN reported. He contrasted ChatGPT’s user-aligned model with Google’s ad-driven search, saying Google’s results depended on “doing badly for the user.”

By November 2025, Altman’s position had softened. He told an interviewer he wasn’t “totally against” ads but said they would “take a lot of care to get right.” He drew a line between pay-to-rank advertising, which he said would be “catastrophic,” and transaction fees or contextual placement that doesn’t alter recommendations.

The test rolling out today follows the contextual model Altman described. Ads sit below responses and don’t affect what ChatGPT recommends. Whether that distinction holds as ad revenue grows will be the longer-term question.

Where Competitors Stand

The timing puts OpenAI’s decision in sharp contrast with its two closest rivals.

Anthropic ran a Super Bowl campaign last week centered on the tagline “Ads are coming to AI. But not to Claude.” The spots showed fictional chatbots interrupting personal conversations with sponsored pitches.

Altman called the campaign “clearly dishonest,” writing on X that OpenAI “would obviously never run ads in the way Anthropic depicts them.”

Google has also kept distance from chatbot ads. DeepMind CEO Demis Hassabis said at Davos in January that Google has no current plans for ads in Gemini, calling himself “a little bit surprised” that OpenAI moved so early. He drew a distinction between assistants, where trust is personal, and search, where Google already shows ads in AI Overviews.

That was the second time in two months that Google leadership publicly denied plans for Gemini advertising. In December, Google Ads VP Dan Taylor disputed an Adweek report claiming advertisers were told to expect Gemini ads in 2026.

The three companies are now on distinctly different paths. OpenAI is testing conversational ads at scale. Anthropic is marketing its refusal to run them. Google is running ads in AI Overviews but holding off on its standalone assistant.

Why This Matters

OpenAI says ChatGPT is used by hundreds of millions of people. CNBC reported that Altman told employees ChatGPT has about 800 million weekly users. That creates pressure to find revenue beyond subscriptions, and advertising is the proven model for monetizing free users across consumer tech.

For practitioners, today’s launch opens a new ad channel for AI platform monetization. The targeting mechanism uses conversation context rather than search keywords, which creates a different kind of intent signal. Someone asking ChatGPT for help planning a trip is further along in the decision process than someone typing a search query.

The restrictions are also worth watching. No ads near health, politics, or mental health topics means the inventory is narrower than traditional search. Combined with reported $60 CPMs and a $200K minimum, this starts as a premium play for a limited set of advertisers rather than a self-serve marketplace.

Looking Ahead

OpenAI described today’s rollout as a test to “learn, listen, and make sure we get the experience right.” No timeline was given for expanding beyond the U.S. or beyond free and Go tiers.

Separately, CNBC reported that Altman told employees in an internal Slack message that ChatGPT is “back to exceeding 10% monthly growth” and that an “updated Chat model” is expected this week.

How users respond to ads in their ChatGPT conversations will determine whether this test scales or gets pulled back. It will also test whether the distinction Altman drew in November between trust-destroying ads and acceptable contextual ones holds up in practice.

7 Insights From Washington Post’s Strategy To Win Back Traffic via @sejournal, @martinibuster

The Washington Post’s recent announcement of staffing cuts is a story with heroes, villains, and victims, but buried beneath the headlines is the reality of a big brand publisher confronting the same changes with Google Search that SEOs, publishers, and ecommerce stores are struggling with. The following are insights into their strategy to claw back traffic and income that could be useful for everyone seeking to stabilize traffic and grow.

Disclaimer

The Washington Post is proposing the following strategies in response to steep drops in search traffic, the rise of multi-modal content consumption, and many other factors that are fragmenting online audiences. The strategies have yet to be proven.

The value lies in analyzing what they are doing and understanding if there are any useful ideas for others.

Problem That Is Being Solved

The reasons given for the announced changes are similar to what SEOs, online stores, and publishers are going through right now because of the decline of search and the hyper-diversification of sources of information.

The memo explains:

“Platforms like Search that shaped the previous era of digital news, and which once helped The Post thrive, are in serious decline. Our organic search has fallen by nearly half in the last three years.

And we are still in the early days of AI-generated content, which is drastically reshaping user experiences and expectations.”

Those problems are the exact same ones affecting virtually all online businesses. This makes The Washington Post’s solution of interest to everyone beyond just news sites.

Problems Specific To The Washington Post

Recent reporting on The Washington Post tended to narrowly frame it in the context of politics, concerns about the concentration of wealth, and how it impacts coverage of sports, international news, and the performing arts, in addition to the hundreds of staff and reporters who lost their jobs.

The job cuts in particular are a highly specific solution applied by The Washington Post and are highly controversial. An opinion can be made that cutting some of the lower performing topics removes the very things that differentiate the website. As you will see next, Executive Editor Matt Murray justifies the cuts as listening to readers’ signals.

Challenges Affecting Everyone

If you zoom out, there is a larger pattern of how many organizations are struggling to understand where the audience has gone and how best to bring them back.

Shared Industry Challenges

  • Changes in content consumption habits
  • Decline of search
  • Rise of the creator economy
  • Growth of podcasts and video shows
  • Social media competing for audience attention
  • Rise of AI search and chat

A recent podcast interview (link to Spotify) with the executive editor of The Washington Post, Matt Murray, revealed a years-long struggle to restructure the organization’s workflow into one that:

  • Was responsive to audience signals
  • Could react in real time instead of the rigid print-based news schedule
  • Explored emerging content formats so as to evolve alongside readers
  • Produced content that is perceived as indispensable

The issues affecting the Washington Post are similar to issues affecting everyone else from recipe bloggers to big brand review sites. A key point Murray made was the changes were driven by audience signals.

Matt Murray said the following about reader signals:

“Readers in today’s world tell you what they want and what they don’t want. They have more power. …And we weren’t picking up enough of the reader signals.”

Then a little later on he again emphasized the importance of understanding reader signals:

“…we are living in a different kind of a world that is a data reader centric world. Readers send us signals on what they want. We have to meet them more where they are. That is going to drive a lot of our success.”

Whether listening to audience signals justifies cutting staff or ends up removing the things that differentiate The Washington Post remains to be seen.

For example, I used to subscribe to the print edition of The New Yorker for the articles, not for the restaurant or theater reviews yet they were still of interest to me as I liked to keep track of trends in live theater and dining. The New Yorker cartoons rarely had anything to do with the article topics and yet they were a value add. Would something like that show up in audience signals?

Build A Base Then Adapt

The memo paints what they’re doing as a foundation for building a strategy that is still evolving, not as a proven strategy. In my opinion that reflects the uncertainty introduced by the rapid decline of classic search and the knowledge that there are no proven strategies.

That uncertainty makes it more interesting to examine what a big brand organization like The Washington Post is doing to create a base strategy to start from and adapt it based on outcomes. That, in itself, is a strategy for coping with a lack of proven tactics.

Three concrete goals they are focusing on are:

  1. Attracting readers
  2. Create content that leads to subscriptions
  3. Increase engagement.

They write:

“From this foundation, we aim to build on what is working, and grow with discipline and intent, to experiment, to measure and deepen what resonates with customers.”

In the podcast interview, Murray also described the stability of a foundation as a way to nurture growth, explaining that it creates the conditions for talent to do its best work. He explains that building the foundation gives the staff the space to focus on things that work.

He explained:

“One of the reasons I wanted to get to stability, as I want room for that talent to thrive and flourish.

I also want us to develop it in a more modern multi-modal way with those that we’ve been able to do.”

A Path To Becoming Indispensable

The Washington Post memo offered insights about their strategy, with the goal stated that the brand must become indispensable to readers, naming three criteria that articles must validate against.

According to the memo:

“We can’t be everything to everyone. But we must be indispensable where we compete. That means continually asking why a story matters, who it serves and how it gives people a clearer understanding of the world and an advantage in navigating it.”

Three Criteria For Content

  1. Content must matter to site visitors.
  2. Content must have an identifiable audience.
  3. Content must provide understanding and also be applicable (useful).

Content Must Matter
Regardless of whether the content is about a product, a service, or informational, the Washington Post’s strategy states that content must strongly fulfill a specific need. For SEOs, creators, ecommerce stores, and informational content publishers, “mattering” is one of the pillars that support making a business indispensable to a site visitor and provides an advantage.

Identifiable Audience
Information doesn’t exist in a vacuum, but traditional SEO has strongly focused on keyword volume and keyword relevance, essentially treating information as existing in a space devoid of human relevance. Keyword relevance is not the same as human relevance. Keyword relevance is relevance to a keyword phrase, not relevance to a human.

This point matters because AI Chat and Search destroys the concept of keywords, because people are no longer typing in keyword phrases but are instead engaging in goal-oriented discussions.

When SEOs talk about keyword relevance, they are talking about relevance to an algorithm. Put another way, they are essentially defining the audience as an algorithm.

So, point two is really about stepping back and asking, “Why does a person need this information?”

Provide Understanding And Be Applicable
Point three states that it’s not enough for content to provide an understanding of what happened (facts). It requires that the information must make the world around the reader navigable (application of the facts).

This is perhaps the most interesting pillar of the strategy because it acknowledges that information vomit is not enough. It must be information that is utilitarian. Utilitarian in this context means that content must have some practical use.

In my opinion, an example of this principle in the context of an ecommerce site is product data. The other day I was on a fishing lure site, and the site assumed that the consumer understood how each lure is supposed to be used. It just had the name of the lure and a photo. In every case, the name of the lure was abstract and gave no indication of how the lure was to be used, under what circumstances, and what tactic it was for.

Another example is a clothing site where clothing is described as small, medium, large, and extra large, which are subjective measurements because every retailer defines small and large differently. One brand I shop at consistently labels objectively small-sized jackets as medium. Fortunately, that same retailer also provides chest, shoulder, and length measurements, which enable a user to understand exactly whether that clothing fits.

I think that’s part of what the Washington Post memo means when it says that the information should provide understanding but also be applicable. It’s that last part that makes the understanding part useful.

Three Pillars To Thriving In A Post-Search Information Economy

All three criteria are pillars that support the mandate to be indispensable and provide an advantage. Satisfying those goals help content differentiate it from information vomit, AI slop. Their strategy supports becoming a navigational entity, a destination that users specifically seek out and it helps the publisher, ecommerce store, and SEOs build an audience in order to claw back what classic search no longer provides.

Featured Image by Shutterstock/Roman Samborskyi

Google Revises Discover Guidelines Alongside Core Update via @sejournal, @MattGSouthern

Google revised its “Get on Discover” documentation following the lauch of the February Discover core update.

On its documentation updates page, Google said it added more information on how sites can increase the likelihood of content appearing in Discover. Here’s what was added.

What Changed

Comparing the archived version with the current page shows Google rewrote its list of recommendations for Discover visibility.

The previous version combined title and clickbait guidance into a single bullet, saying to “Use page titles that capture the essence of the content, but in a non-clickbait fashion.”

Google split that into two items. The first now says “Use page titles and headlines that capture the essence of the content.” The second says “Avoid clickbait and similar tactics to artificially inflate engagement.”

That word “clickbait” is new. The previous version said “Avoid tactics to artificially inflate engagement” without naming the tactic.

The sensationalism guidance changed too. The old version said “Avoid tactics that manipulate appeal by catering to morbid curiosity, titillation, or outrage.” The revision names the tactic, saying “Avoid sensationalism tactics that manipulate appeal.”

The new addition is a recommendation to “Provide an overall great page experience,” with a link to Google’s page experience documentation. That recommendation isn’t in the archived version.

Image requirements, traffic fluctuation guidance, and performance monitoring sections remain unchanged.

Why This Matters

These documentation changes map to what Google said the core update targets. The blog post announcing the update said the update would show more locally relevant content, reduce sensational content and clickbait, and surface more original content from sites with expertise.

Discover documentation has changed before alongside algorithm updates. Previously, Google added Discover to its Helpful Content System documentation and later expanded its explanation of why Discover traffic fluctuates. Both of those updates aligned with broader changes to how Discover evaluated content.

Page experience has been part of Google’s Search guidance since 2020 but wasn’t in the Discover-specific recommendations before this revision.

Looking Ahead

The February Discover core update is rolling out to English-language users in the United States over the next two weeks. Google said it plans to expand to all countries and languages in the months ahead.

Publishers monitoring Discover traffic in Search Console should check the Get on Discover page for the current recommendations. Google’s standard core update guidance applies as well.


Featured Image: ZikG/Shutterstock

Google Shows How To Check Passage Indexing via @sejournal, @martinibuster

Google’s John Mueller was asked how many megabytes of HTML Googlebot crawls per page. The question was whether Googlebot indexes two megabytes (MB) or fifteen megabytes of data. Mueller’s answer minimized the technical aspect of the question and went straight to the heart of the issue, which is really about how much content is indexed.

GoogleBot And Other Bots

In the middle of an ongoing discussion in Bluesky someone revived the question about whether Googlebot crawls and indexes 2 or 15 megabytes of data.

They posted:

“Hope you got whatever made you run 🙂

It would be super useful to have more precisions, and real-life examples like “My page is X Mb long, it gets cut after X Mb, it also loads resource A: 15Kb, resource B: 3Mb, resource B is not fully loaded, but resource A is because 15Kb < 2Mb”.”

Panic About 2 Megabyte Limit Is Overblown

Mueller said that it’s not necessary to weigh bytes and implied that what’s ultimately important isn’t about constraining how many bytes are on a page but rather whether or not important passages are indexed.

Furthermore, Mueller said that it is rare that a site exceeds two megabytes of HTML, dismissing the idea that it’s possible that a website’s content might not get indexed because it’s too big.

He also said that Googlebot isn’t the only bot that crawls a web page, apparently to explain why 2 megabytes and 15 megabytes aren’t limiting factors. Google publishes a list of all the crawlers they use for various purposes.

How To Check If Content Passages Are Indexed

Lastly, Mueller’s response confirmed a simple way to check whether or not important passages are indexed.

Mueller answered:

“Google has a lot of crawlers, which is why we split it. It’s extremely rare that sites run into issues in this regard, 2MB of HTML (for those focusing on Googlebot) is quite a bit. The way I usually check is to search for an important quote further down on a page – usually no need to weigh bytes.”

Passages For Ranking

People have short attention spans except when they’re reading about a topic that they are passionate about. That’s when a comprehensive article may come in handy for those readers who really want to take a deep dive to learn more.

From an SEO perspective, I can understand why some may feel that a comprehensive article might not be ideal for ranking if a document provides deep coverage of multiple topics, any one of which could be a standalone article.

A publisher or an SEO needs to step back and assess whether a user is satisfied with deep coverage of a topic or whether a deeper treatment of it is needed by users. There are also different levels of comprehensiveness, one with granular details and another with an overview-level of coverage of details, with links to deeper coverage.

In other words, sometimes users require a view of the forest and sometimes they require a view of the trees.

Google has long been able to rank document passages with their passage ranking algorithms. Ultimately, in my opinion, it really comes down to what is useful to users and is likely to result in a higher level of user satisfaction.

If comprehensive topic coverage excites people and makes them passionate enough about to share it with other people then that is a win.

If comprehensive coverage isn’t useful for that specific topic then it may be better to split the content into shorter coverage that better aligns with the reasons why people are coming to that page to read about that topic.

Takeaways

While most of these takeaways aren’t represented in Mueller’s response, they do in my opinion represent good practices for SEO.

  • HTML size limits belie a concern for deeper questions about content length and indexing visibility
  • Megabyte thresholds are rarely a practical constraint for real-world pages
  • Counting bytes is less useful than verifying whether content actually appears in search
  • Searching for distinctive passages is a practical way to confirm indexing
  • Comprehensiveness should be driven by user intent, not crawl assumptions
  • Content usefulness and clarity matter more than document size
  • User satisfaction remains the deciding factor in content performance

Concern over how many megabytes are a hard crawl limit for Googlebot reflect uncertainty about whether important content in a long document is being indexed and is available to rank in search. Focusing on megabytes shifts attention away from the real issues SEOs should be focusing on, which is whether the topic coverage depth best serves a user’s needs.

Mueller’s response reinforces the point that web pages that are too big to be indexed are uncommon, and fixed byte limits are not a constraint that SEOs should be concerned about.

In my opinion, SEOs and publishers will probably have better search coverage by shifting their focus away from optimizing for assumed crawl limits and instead focus on user content consumption limits.

But if a publisher or SEO is concerned about whether a passage near the end of a document is indexed, there is an easy way to check the status by simply doing a search for an exact match for that passage.

Comprehensive topic coverage is not automatically a ranking problem, and it not always the best (or worst) approach. HTML size is not really a concern unless it starts impacting page speed. What matters is whether content is clear, relevant, and useful to the intended audience at the precise levels of granularity that serves the user’s purposes.

Featured Image by Shutterstock/Krakenimages.com

Google Releases Discover-Focused Core Update via @sejournal, @MattGSouthern
  • Google has launched a core update specifically for Discover, rather than Search more broadly.
  • The February Discover core update began Feb. 5 for English-language users in the U.S., with plans to expand to other countries and languages.
  • Google says the rollout may take up to two weeks.

Google has started a Discover core update. The rollout may take up to two weeks, with expansion to more countries and languages later.

Google Search Hits $63B, Details AI Mode Ad Tests via @sejournal, @MattGSouthern

Alphabet reported Q4 2025 revenue of $113.8 billion, beating Wall Street estimates and marking the company’s first year above $400 billion in annual revenue. Google Search grew 17% to $63.07 billion.

On the earnings call, the company revealed how it plans to monetize AI Mode and shared new data on how AI is changing search behavior.

What’s Happening

Google Search and other advertising revenue hit $63.07 billion, up 17% from $54.03 billion in Q4 2024. Search growth accelerated through 2025, rising from 10% in Q1 to 12% in Q2 to 15% in Q3 and 17% in Q4.

CEO Sundar Pichai said Search had more usage in Q4 than ever before. He attributed the growth to AI features changing how people search.

Pichai said on the call:

“Once people start using these new experiences, they use them more. In the US, we saw daily AI Mode queries per user double since launch.”

Queries in AI Mode are three times longer than traditional searches, and a “significant portion” lead to follow-up questions.

AI Mode Monetization Tests

Chief Business Officer Philipp Schindler said Google is “in the early stages of experimenting with AI Mode monetization, like testing ads below the AI response, with more underway.”

On Direct Offers, a new pilot program, Schindler said:

“We announced Direct Offers, a new Google Ads pilot, which will allow advertisers to show exclusive offers for shoppers who are ready to buy, directly in AI Mode.”

Google also plans to launch checkout directly within AI Mode from select merchants.

Schindler said the longer AI Mode queries are creating new ad inventory. Gemini’s understanding of intent “has increased our ability to deliver ads on longer, more complex searches that were previously challenging to monetize.”

YouTube Miss Explained

YouTube ad revenue reached $11.38 billion, up 9% but below the $11.84 billion analysts expected.

Schindler attributed the miss to election ad lapping from Q4 2024:

“On the brand side, as an ad share, the largest factor negatively impacting the year-over-year growth rate was lapping the strong spend on U.S. elections.”

He also noted that subscription growth can reduce ad revenue. When users switch to YouTube Premium, it hurts ad revenue but helps the overall business.

What Else Happened

Google Cloud revenue jumped 48% to $17.66 billion. Alphabet plans to spend $175 billion to $185 billion on capital expenditures in 2026, nearly double its 2025 spending. That suggests more AI features coming to Search and other products.

Why This Matters

Looking back a year ago at Q4 2024 results, Search grew 12%. By Q1 2025, AI Overviews reached 1.5 billion monthly users, and Search was growing 10%. Now Search growth has accelerated to 17%.

The metrics Google celebrated on this call describe users staying on Google longer. Schindler described the new ad inventory as additive, reaching queries that were “previously challenging to monetize.”

That’s a monetization win for Google. The tradeoff to watch is referral traffic.

When asked about cannibalization, Pichai said Google hasn’t seen evidence of it:

“The combination of all of that I think creates an expansionary moment. I think it’s expanding the type of queries people do with Google overall.”

That may be true for queries. Whether it holds for referral traffic is something you’ll need to track in your own analytics.

Looking Ahead

Google maintains the position that AI features expand search activity rather than cannibalize it. The Q4 revenue numbers back it up.

The open question is what expanding AI Mode features means for referral traffic, and your own analytics will tell that story.


Featured Image: Rokas Tenys/Shutterstock