7 Insights From Washington Post’s Strategy To Win Back Traffic via @sejournal, @martinibuster

The Washington Post’s recent announcement of staffing cuts is a story with heroes, villains, and victims, but buried beneath the headlines is the reality of a big brand publisher confronting the same changes with Google Search that SEOs, publishers, and ecommerce stores are struggling with. The following are insights into their strategy to claw back traffic and income that could be useful for everyone seeking to stabilize traffic and grow.

Disclaimer

The Washington Post is proposing the following strategies in response to steep drops in search traffic, the rise of multi-modal content consumption, and many other factors that are fragmenting online audiences. The strategies have yet to be proven.

The value lies in analyzing what they are doing and understanding if there are any useful ideas for others.

Problem That Is Being Solved

The reasons given for the announced changes are similar to what SEOs, online stores, and publishers are going through right now because of the decline of search and the hyper-diversification of sources of information.

The memo explains:

“Platforms like Search that shaped the previous era of digital news, and which once helped The Post thrive, are in serious decline. Our organic search has fallen by nearly half in the last three years.

And we are still in the early days of AI-generated content, which is drastically reshaping user experiences and expectations.”

Those problems are the exact same ones affecting virtually all online businesses. This makes The Washington Post’s solution of interest to everyone beyond just news sites.

Problems Specific To The Washington Post

Recent reporting on The Washington Post tended to narrowly frame it in the context of politics, concerns about the concentration of wealth, and how it impacts coverage of sports, international news, and the performing arts, in addition to the hundreds of staff and reporters who lost their jobs.

The job cuts in particular are a highly specific solution applied by The Washington Post and are highly controversial. An opinion can be made that cutting some of the lower performing topics removes the very things that differentiate the website. As you will see next, Executive Editor Matt Murray justifies the cuts as listening to readers’ signals.

Challenges Affecting Everyone

If you zoom out, there is a larger pattern of how many organizations are struggling to understand where the audience has gone and how best to bring them back.

Shared Industry Challenges

  • Changes in content consumption habits
  • Decline of search
  • Rise of the creator economy
  • Growth of podcasts and video shows
  • Social media competing for audience attention
  • Rise of AI search and chat

A recent podcast interview (link to Spotify) with the executive editor of The Washington Post, Matt Murray, revealed a years-long struggle to restructure the organization’s workflow into one that:

  • Was responsive to audience signals
  • Could react in real time instead of the rigid print-based news schedule
  • Explored emerging content formats so as to evolve alongside readers
  • Produced content that is perceived as indispensable

The issues affecting the Washington Post are similar to issues affecting everyone else from recipe bloggers to big brand review sites. A key point Murray made was the changes were driven by audience signals.

Matt Murray said the following about reader signals:

“Readers in today’s world tell you what they want and what they don’t want. They have more power. …And we weren’t picking up enough of the reader signals.”

Then a little later on he again emphasized the importance of understanding reader signals:

“…we are living in a different kind of a world that is a data reader centric world. Readers send us signals on what they want. We have to meet them more where they are. That is going to drive a lot of our success.”

Whether listening to audience signals justifies cutting staff or ends up removing the things that differentiate The Washington Post remains to be seen.

For example, I used to subscribe to the print edition of The New Yorker for the articles, not for the restaurant or theater reviews yet they were still of interest to me as I liked to keep track of trends in live theater and dining. The New Yorker cartoons rarely had anything to do with the article topics and yet they were a value add. Would something like that show up in audience signals?

Build A Base Then Adapt

The memo paints what they’re doing as a foundation for building a strategy that is still evolving, not as a proven strategy. In my opinion that reflects the uncertainty introduced by the rapid decline of classic search and the knowledge that there are no proven strategies.

That uncertainty makes it more interesting to examine what a big brand organization like The Washington Post is doing to create a base strategy to start from and adapt it based on outcomes. That, in itself, is a strategy for coping with a lack of proven tactics.

Three concrete goals they are focusing on are:

  1. Attracting readers
  2. Create content that leads to subscriptions
  3. Increase engagement.

They write:

“From this foundation, we aim to build on what is working, and grow with discipline and intent, to experiment, to measure and deepen what resonates with customers.”

In the podcast interview, Murray also described the stability of a foundation as a way to nurture growth, explaining that it creates the conditions for talent to do its best work. He explains that building the foundation gives the staff the space to focus on things that work.

He explained:

“One of the reasons I wanted to get to stability, as I want room for that talent to thrive and flourish.

I also want us to develop it in a more modern multi-modal way with those that we’ve been able to do.”

A Path To Becoming Indispensable

The Washington Post memo offered insights about their strategy, with the goal stated that the brand must become indispensable to readers, naming three criteria that articles must validate against.

According to the memo:

“We can’t be everything to everyone. But we must be indispensable where we compete. That means continually asking why a story matters, who it serves and how it gives people a clearer understanding of the world and an advantage in navigating it.”

Three Criteria For Content

  1. Content must matter to site visitors.
  2. Content must have an identifiable audience.
  3. Content must provide understanding and also be applicable (useful).

Content Must Matter
Regardless of whether the content is about a product, a service, or informational, the Washington Post’s strategy states that content must strongly fulfill a specific need. For SEOs, creators, ecommerce stores, and informational content publishers, “mattering” is one of the pillars that support making a business indispensable to a site visitor and provides an advantage.

Identifiable Audience
Information doesn’t exist in a vacuum, but traditional SEO has strongly focused on keyword volume and keyword relevance, essentially treating information as existing in a space devoid of human relevance. Keyword relevance is not the same as human relevance. Keyword relevance is relevance to a keyword phrase, not relevance to a human.

This point matters because AI Chat and Search destroys the concept of keywords, because people are no longer typing in keyword phrases but are instead engaging in goal-oriented discussions.

When SEOs talk about keyword relevance, they are talking about relevance to an algorithm. Put another way, they are essentially defining the audience as an algorithm.

So, point two is really about stepping back and asking, “Why does a person need this information?”

Provide Understanding And Be Applicable
Point three states that it’s not enough for content to provide an understanding of what happened (facts). It requires that the information must make the world around the reader navigable (application of the facts).

This is perhaps the most interesting pillar of the strategy because it acknowledges that information vomit is not enough. It must be information that is utilitarian. Utilitarian in this context means that content must have some practical use.

In my opinion, an example of this principle in the context of an ecommerce site is product data. The other day I was on a fishing lure site, and the site assumed that the consumer understood how each lure is supposed to be used. It just had the name of the lure and a photo. In every case, the name of the lure was abstract and gave no indication of how the lure was to be used, under what circumstances, and what tactic it was for.

Another example is a clothing site where clothing is described as small, medium, large, and extra large, which are subjective measurements because every retailer defines small and large differently. One brand I shop at consistently labels objectively small-sized jackets as medium. Fortunately, that same retailer also provides chest, shoulder, and length measurements, which enable a user to understand exactly whether that clothing fits.

I think that’s part of what the Washington Post memo means when it says that the information should provide understanding but also be applicable. It’s that last part that makes the understanding part useful.

Three Pillars To Thriving In A Post-Search Information Economy

All three criteria are pillars that support the mandate to be indispensable and provide an advantage. Satisfying those goals help content differentiate it from information vomit, AI slop. Their strategy supports becoming a navigational entity, a destination that users specifically seek out and it helps the publisher, ecommerce store, and SEOs build an audience in order to claw back what classic search no longer provides.

Featured Image by Shutterstock/Roman Samborskyi

Google Revises Discover Guidelines Alongside Core Update via @sejournal, @MattGSouthern

Google revised its “Get on Discover” documentation following the lauch of the February Discover core update.

On its documentation updates page, Google said it added more information on how sites can increase the likelihood of content appearing in Discover. Here’s what was added.

What Changed

Comparing the archived version with the current page shows Google rewrote its list of recommendations for Discover visibility.

The previous version combined title and clickbait guidance into a single bullet, saying to “Use page titles that capture the essence of the content, but in a non-clickbait fashion.”

Google split that into two items. The first now says “Use page titles and headlines that capture the essence of the content.” The second says “Avoid clickbait and similar tactics to artificially inflate engagement.”

That word “clickbait” is new. The previous version said “Avoid tactics to artificially inflate engagement” without naming the tactic.

The sensationalism guidance changed too. The old version said “Avoid tactics that manipulate appeal by catering to morbid curiosity, titillation, or outrage.” The revision names the tactic, saying “Avoid sensationalism tactics that manipulate appeal.”

The new addition is a recommendation to “Provide an overall great page experience,” with a link to Google’s page experience documentation. That recommendation isn’t in the archived version.

Image requirements, traffic fluctuation guidance, and performance monitoring sections remain unchanged.

Why This Matters

These documentation changes map to what Google said the core update targets. The blog post announcing the update said the update would show more locally relevant content, reduce sensational content and clickbait, and surface more original content from sites with expertise.

Discover documentation has changed before alongside algorithm updates. Previously, Google added Discover to its Helpful Content System documentation and later expanded its explanation of why Discover traffic fluctuates. Both of those updates aligned with broader changes to how Discover evaluated content.

Page experience has been part of Google’s Search guidance since 2020 but wasn’t in the Discover-specific recommendations before this revision.

Looking Ahead

The February Discover core update is rolling out to English-language users in the United States over the next two weeks. Google said it plans to expand to all countries and languages in the months ahead.

Publishers monitoring Discover traffic in Search Console should check the Get on Discover page for the current recommendations. Google’s standard core update guidance applies as well.


Featured Image: ZikG/Shutterstock

Google Shows How To Check Passage Indexing via @sejournal, @martinibuster

Google’s John Mueller was asked how many megabytes of HTML Googlebot crawls per page. The question was whether Googlebot indexes two megabytes (MB) or fifteen megabytes of data. Mueller’s answer minimized the technical aspect of the question and went straight to the heart of the issue, which is really about how much content is indexed.

GoogleBot And Other Bots

In the middle of an ongoing discussion in Bluesky someone revived the question about whether Googlebot crawls and indexes 2 or 15 megabytes of data.

They posted:

“Hope you got whatever made you run 🙂

It would be super useful to have more precisions, and real-life examples like “My page is X Mb long, it gets cut after X Mb, it also loads resource A: 15Kb, resource B: 3Mb, resource B is not fully loaded, but resource A is because 15Kb < 2Mb”.”

Panic About 2 Megabyte Limit Is Overblown

Mueller said that it’s not necessary to weigh bytes and implied that what’s ultimately important isn’t about constraining how many bytes are on a page but rather whether or not important passages are indexed.

Furthermore, Mueller said that it is rare that a site exceeds two megabytes of HTML, dismissing the idea that it’s possible that a website’s content might not get indexed because it’s too big.

He also said that Googlebot isn’t the only bot that crawls a web page, apparently to explain why 2 megabytes and 15 megabytes aren’t limiting factors. Google publishes a list of all the crawlers they use for various purposes.

How To Check If Content Passages Are Indexed

Lastly, Mueller’s response confirmed a simple way to check whether or not important passages are indexed.

Mueller answered:

“Google has a lot of crawlers, which is why we split it. It’s extremely rare that sites run into issues in this regard, 2MB of HTML (for those focusing on Googlebot) is quite a bit. The way I usually check is to search for an important quote further down on a page – usually no need to weigh bytes.”

Passages For Ranking

People have short attention spans except when they’re reading about a topic that they are passionate about. That’s when a comprehensive article may come in handy for those readers who really want to take a deep dive to learn more.

From an SEO perspective, I can understand why some may feel that a comprehensive article might not be ideal for ranking if a document provides deep coverage of multiple topics, any one of which could be a standalone article.

A publisher or an SEO needs to step back and assess whether a user is satisfied with deep coverage of a topic or whether a deeper treatment of it is needed by users. There are also different levels of comprehensiveness, one with granular details and another with an overview-level of coverage of details, with links to deeper coverage.

In other words, sometimes users require a view of the forest and sometimes they require a view of the trees.

Google has long been able to rank document passages with their passage ranking algorithms. Ultimately, in my opinion, it really comes down to what is useful to users and is likely to result in a higher level of user satisfaction.

If comprehensive topic coverage excites people and makes them passionate enough about to share it with other people then that is a win.

If comprehensive coverage isn’t useful for that specific topic then it may be better to split the content into shorter coverage that better aligns with the reasons why people are coming to that page to read about that topic.

Takeaways

While most of these takeaways aren’t represented in Mueller’s response, they do in my opinion represent good practices for SEO.

  • HTML size limits belie a concern for deeper questions about content length and indexing visibility
  • Megabyte thresholds are rarely a practical constraint for real-world pages
  • Counting bytes is less useful than verifying whether content actually appears in search
  • Searching for distinctive passages is a practical way to confirm indexing
  • Comprehensiveness should be driven by user intent, not crawl assumptions
  • Content usefulness and clarity matter more than document size
  • User satisfaction remains the deciding factor in content performance

Concern over how many megabytes are a hard crawl limit for Googlebot reflect uncertainty about whether important content in a long document is being indexed and is available to rank in search. Focusing on megabytes shifts attention away from the real issues SEOs should be focusing on, which is whether the topic coverage depth best serves a user’s needs.

Mueller’s response reinforces the point that web pages that are too big to be indexed are uncommon, and fixed byte limits are not a constraint that SEOs should be concerned about.

In my opinion, SEOs and publishers will probably have better search coverage by shifting their focus away from optimizing for assumed crawl limits and instead focus on user content consumption limits.

But if a publisher or SEO is concerned about whether a passage near the end of a document is indexed, there is an easy way to check the status by simply doing a search for an exact match for that passage.

Comprehensive topic coverage is not automatically a ranking problem, and it not always the best (or worst) approach. HTML size is not really a concern unless it starts impacting page speed. What matters is whether content is clear, relevant, and useful to the intended audience at the precise levels of granularity that serves the user’s purposes.

Featured Image by Shutterstock/Krakenimages.com

Google Releases Discover-Focused Core Update via @sejournal, @MattGSouthern
  • Google has launched a core update specifically for Discover, rather than Search more broadly.
  • The February Discover core update began Feb. 5 for English-language users in the U.S., with plans to expand to other countries and languages.
  • Google says the rollout may take up to two weeks.

Google has started a Discover core update. The rollout may take up to two weeks, with expansion to more countries and languages later.

Google Search Hits $63B, Details AI Mode Ad Tests via @sejournal, @MattGSouthern

Alphabet reported Q4 2025 revenue of $113.8 billion, beating Wall Street estimates and marking the company’s first year above $400 billion in annual revenue. Google Search grew 17% to $63.07 billion.

On the earnings call, the company revealed how it plans to monetize AI Mode and shared new data on how AI is changing search behavior.

What’s Happening

Google Search and other advertising revenue hit $63.07 billion, up 17% from $54.03 billion in Q4 2024. Search growth accelerated through 2025, rising from 10% in Q1 to 12% in Q2 to 15% in Q3 and 17% in Q4.

CEO Sundar Pichai said Search had more usage in Q4 than ever before. He attributed the growth to AI features changing how people search.

Pichai said on the call:

“Once people start using these new experiences, they use them more. In the US, we saw daily AI Mode queries per user double since launch.”

Queries in AI Mode are three times longer than traditional searches, and a “significant portion” lead to follow-up questions.

AI Mode Monetization Tests

Chief Business Officer Philipp Schindler said Google is “in the early stages of experimenting with AI Mode monetization, like testing ads below the AI response, with more underway.”

On Direct Offers, a new pilot program, Schindler said:

“We announced Direct Offers, a new Google Ads pilot, which will allow advertisers to show exclusive offers for shoppers who are ready to buy, directly in AI Mode.”

Google also plans to launch checkout directly within AI Mode from select merchants.

Schindler said the longer AI Mode queries are creating new ad inventory. Gemini’s understanding of intent “has increased our ability to deliver ads on longer, more complex searches that were previously challenging to monetize.”

YouTube Miss Explained

YouTube ad revenue reached $11.38 billion, up 9% but below the $11.84 billion analysts expected.

Schindler attributed the miss to election ad lapping from Q4 2024:

“On the brand side, as an ad share, the largest factor negatively impacting the year-over-year growth rate was lapping the strong spend on U.S. elections.”

He also noted that subscription growth can reduce ad revenue. When users switch to YouTube Premium, it hurts ad revenue but helps the overall business.

What Else Happened

Google Cloud revenue jumped 48% to $17.66 billion. Alphabet plans to spend $175 billion to $185 billion on capital expenditures in 2026, nearly double its 2025 spending. That suggests more AI features coming to Search and other products.

Why This Matters

Looking back a year ago at Q4 2024 results, Search grew 12%. By Q1 2025, AI Overviews reached 1.5 billion monthly users, and Search was growing 10%. Now Search growth has accelerated to 17%.

The metrics Google celebrated on this call describe users staying on Google longer. Schindler described the new ad inventory as additive, reaching queries that were “previously challenging to monetize.”

That’s a monetization win for Google. The tradeoff to watch is referral traffic.

When asked about cannibalization, Pichai said Google hasn’t seen evidence of it:

“The combination of all of that I think creates an expansionary moment. I think it’s expanding the type of queries people do with Google overall.”

That may be true for queries. Whether it holds for referral traffic is something you’ll need to track in your own analytics.

Looking Ahead

Google maintains the position that AI features expand search activity rather than cannibalize it. The Q4 revenue numbers back it up.

The open question is what expanding AI Mode features means for referral traffic, and your own analytics will tell that story.


Featured Image: Rokas Tenys/Shutterstock

Google’s Mueller Calls Markdown-For-Bots Idea ‘A Stupid Idea’ via @sejournal, @MattGSouthern

Some developers have been experimenting with bot-specific Markdown delivery as a way to reduce token usage for AI crawlers.

Google Search Advocate John Mueller pushed back on the idea of serving raw Markdown files to LLM crawlers, raising technical concerns on Reddit and calling the concept “a stupid idea” on Bluesky.

What’s Happening

A developer posted on r/TechSEO, describing plans to use Next.js middleware to detect AI user agents such as GPTBot and ClaudeBot. When those bots hit a page, the middleware intercepts the request and serves a raw Markdown file instead of the full React/HTML payload.

The developer claimed early benchmarks showed a 95% reduction in token usage per page, which they argued should increase the site’s ingestion capacity for retrieval-augmented generation (RAG) bots.

Mueller responded with a series of questions.

“Are you sure they can even recognize MD on a website as anything other than a text file? Can they parse & follow the links? What will happen to your site’s internal linking, header, footer, sidebar, navigation? It’s one thing to give it a MD file manually, it seems very different to serve it a text file when they’re looking for a HTML page.”

On Bluesky, Mueller was more direct. Responding to technical SEO consultant Jono Alderson, who argued that flattening pages into Markdown strips out meaning and structure,

Mueller wrote:

“Converting pages to markdown is such a stupid idea. Did you know LLMs can read images? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?”

Alderson argued that collapsing a page into Markdown removes important context and structure, and framed Markdown-fetching as a convenience play rather than a lasting strategy.

Other voices in the Reddit thread echoed the concerns. One commenter questioned whether the effort could limit crawling rather than enhance it. They noted that there’s no evidence that LLMs are trained to favor documents that are less resource-intensive to parse.

The original poster defended the theory, arguing LLMs are better at parsing Markdown than HTML because they’re heavily trained on code repositories. That claim is untested.

Why This Matters

Mueller has been consistent on this. In a previous exchange, he responded to a question from Lily Rayabout creating separate Markdown or JSON pages for LLMs. His position then was the same. He said to focus on clean HTML and structured data rather than building bot-only content copies.

That response followed SE Ranking’s analysis of 300,000 domains, which found no connection between having an llms.txt file and how often a domain gets cited in LLM answers. Additionally, Mueller has compared llms.txt to the keywords meta tag, a format major platforms haven’t documented as something they use for ranking or citations.

So far, public platform documentation hasn’t shown that bot-only formats, such as Markdown versions of pages, improve ranking or citations. Mueller raised the same objections across multiple discussions, and SE Ranking’s data found nothing to suggest otherwise.

Looking Ahead

Until an AI platform publishes a spec requesting Markdown versions of web pages, the best practice remains as it is. Keep HTML clean, reduce unnecessary JavaScript that blocks content parsing, and use structured data where platforms have documented schemas.

Google’s Crawl Team Filed Bugs Against WordPress Plugins via @sejournal, @MattGSouthern

Google’s crawl team has been filing bugs directly against WordPress plugins that waste crawl budget at scale.

Gary Illyes, Analyst at Google, shared the details on the latest Search Off the Record podcast. His team filed an issue against WooCommerce after identifying its add-to-cart URL parameters as a top source of crawl waste. WooCommerce picked up the bug and fixed it quickly.

Not every plugin developer has been as responsive. An issue filed against a separate action-parameter plugin is still sitting unclaimed. And Google says its outreach to the developer of a commercial calendar plugin that generates infinite URL paths fell on deaf ears.

What Google Found

The details come from Google’s internal year-end crawl issue report, which Illyes reviewed during the podcast with fellow Google Search Relations team member Martin Splitt.

Action parameters accounted for roughly 25% of all crawl issues reported in 2025. Only faceted navigation ranked higher, at 50%. Together, those two categories represent about three-quarters of every crawl issue Google flagged last year.

The problem with action parameters is that each one creates what appears to be a new URL by adding text like ?add_to_cart=true. Parameters can stack, doubling or tripling the crawlable URL space on a site.

Illyes said these parameters are often injected by CMS plugins rather than built intentionally by site owners.

The WooCommerce Fix

Google’s crawl team filed a bug report against the plugin, flagging the add-to-cart parameter behavior as a source of crawl waste affecting sites at scale.

Illyes describes how they identified the issue:

“So we would try to dig into like where are these coming from and then sometimes you can identify that perhaps these action parameters are coming from WordPress plug-ins because WordPress is quite a popular CMS content management system. And then you would find that yes, these plugins are the ones that add to cart and add to wish list.”

And then what you would do if you were a Gary is to try to see if they are open source in the sense that they have a repository where you can report bugs and issues and in both of these cases the answer was yes. So we would file issues against these uh plugins.”

WooCommerce responded and shipped a fix. Illyes noted the turnaround was fast, but other plugin developers with similar issues haven’t responded. Illyes didn’t name the other plugins.

He added:

“What I really, really loved is that the good folks at Woolcommerce almost immediately picked up the issue and they solved it.”

Why This Matters

This is the same URL parameter problem Illyes warned about before and continued flagging. Google then formalized its faceted navigation guidelines into official documentation and revised its URL parameter best practices.

The data shows those warnings and documentation updates didn’t solve the problem because the same issues still dominate crawl reports.

The crawl waste is often baked into the plugin layer. That creates a real bind for websites with ecommerce plugins. Your crawl problems may not be your fault, but they’re still your responsibility to manage.

Illyes said Googlebot can’t determine whether a URL space is useful “unless it crawled a large chunk of that URL space.” By the time you notice the server strain, the damage is already happening.

Google consistently recommends robots.txt, as blocking parameter URLs proactively is more effective than waiting for symptoms.

Looking Ahead

Google filing bugs against open-source plugins could help reduce crawl waste at the source. The full podcast episode with Illyes and Splitt is available with a transcript.

Google Updates Googlebot File Size Limit Docs via @sejournal, @MattGSouthern

Google updated its Googlebot documentation to clarify information about file size limits.

The change involves moving information about default file size limits from the Googlebot page to Google’s broader crawler documentation. Google also updated the Googlebot page to be more specific about Googlebot’s own limits.

What’s New

Google’s documentation changelog describes the update as a two-part clarification.

The default file size limits that previously lived on the Googlebot page now appear in the crawler documentation. Google said the original location wasn’t the most logical place because the limits apply to all of Google’s crawlers and fetchers, not just Googlebot.

With the defaults now housed in the crawler documentation, Google updated the Googlebot page to describe Googlebot’s specific file size limits more precisely.

The crawling infrastructure docs list a 15 MB default for Google’s crawlers and fetchers, while the Googlebot page now lists 2 MB for supported file types and 64 MB for PDFs when crawling for Google Search.

The crawler overview describes a default limit across Google’s crawling infrastructure, while the Googlebot page describes Google Search–specific limits for Googlebot. Each resource referenced in the HTML, such as CSS and JavaScript, is fetched separately.

Why This Matters

This fits a pattern Google has been running since late 2025. In November, Google migrated its core crawling documentation to a standalone site, separating it from Search Central. The reasoning was that Google’s crawling infrastructure serves products beyond Search, including Shopping, News, Gemini, and AdSense.

In December, more documentation followed, including faceted navigation guidance and crawl budget optimization.

The latest update continues that reorganization. The 15 MB file size limit was first documented in 2022, when Google added it to the Googlebot help page. Mueller confirmed at the time that the limit wasn’t new. It had been in effect for years. Google was just putting it on the record.

When managing crawl budgets or troubleshooting indexing on content-heavy pages, Google’s docs now describe the limits differently depending on where you look.

The crawling infrastructure overview lists 15 MB as the default for all crawlers and fetchers. The Googlebot page lists 2 MB for HTML and supported text-based files, and 64 MB for PDFs. Google’s changelog does not explain how these figures relate to one another.

Default limits now live in the crawler overview documentation, while Googlebot-specific limits are on the Googlebot page.

Looking Ahead

Google’s documentation reorganization suggests there will likely be more updates to the crawling infrastructure site in the coming months. By separating crawler-wide defaults from product-specific documentation, Google can more easily document new crawlers and fetchers as they are introduced.

WordPress Publishes AI Guidelines To Combat AI Slop via @sejournal, @martinibuster

WordPress published guidelines for using AI for coding plugins, themes, documentation, and media assets. The purpose of the guidelines, guided by five principles, is to keep WordPress contributions transparent, GPL-compatible, and human-accountable, while maintaining high quality standards for AI-assisted work.

The new guidelines lists the following five principles:

  1. “You are responsible for your contributions (AI can assist, but it isn’t a contributor).
  2. Disclose meaningful AI assistance in your PR description and/or Trac ticket comment.
  3. License compatibility matters: contributions must remain compatible with GPLv2-or-later, including AI-assisted output.
  4. Non-code assets count too (docs, screenshots, images, educational materials).
  5. Quality over volume: avoid low-signal, unverified “AI slop”; reviewers may close or reject work that doesn’t meet the bar.”

Transparency

The purpose of the transparency guidelines is to encourage contributors to disclose that AI was used and how it was used so that reviewers can be aware when evaluating the work.

License Compatibility And Tool Choice

Licensing is a big deal with WordPress because it’s designed to be a fully open source publishing platform under the GPLv2 licensing framework. Everything that’s made for WordPress, including plugins and themes, must also be open source. It’s an essential element of everything created with WordPress.

The guidelines specify that AI cannot be used if the output is not licensable under GPLv2.

It also states:

“Do not use tools whose terms forbid using their output in GPL-licensed projects or impose additional restrictions on redistribution.

Do not rely on tools to “launder” incompatible licenses. If an AI output reproduces non-free or incompatible code, it cannot be included.”

AI Slop

Of course, the guidelines address the issue of AI slop. In this case, AI slop is defined as hallucinated references (such as links or APIs that do not exist), overly complicated code where simpler solutions exist, and GitHub PRs that are generic or do not reflect actual testing or experience.

The AI Slop guidelines has recommendations of what they expect from contributors:

“Use AI to draft, then review yourself.

Submit PRs (or patches) that are small, concise and with atomic and well defined commit messages to make reviewing easier.

Run and document real tests.

Link to real Trac tickets, GitHub issues, or documentation that you have verified.”

The guidelines are clear that the WordPress contributors who are responsible for overseeing, reviewing, and deciding whether changes are accepted into a specific part of the project may close or reject contributions that they determine to be AI slop “with little added human insight.”

Takeaways

The new WordPress AI guidelines appear to be about preserving trust in the contribution process as AI becomes more common across development, documentation, and media creation. It in no way discourages the use of AI but rather encourages its use in a responsible manner.

Requiring disclosure, enforcing GPL compatibility, and giving maintainers the authority to reject low-quality submissions, the guidelines set boundaries that protect both the legal integrity of the WordPress project and the time of its reviewers.

Featured Image by Shutterstock/Ivan Moreno sl

LinkedIn Shares What Works For AI Search Visibility via @sejournal, @MattGSouthern

LinkedIn published findings from its internal testing on what drives visibility in AI-generated search results.

The company, reportedly among the most-cited sources in AI responses, shared what worked for improving its presence in LLMs and AI Overviews. For practitioners adjusting to AI search, this is a rare look at what a heavily-cited source tested and measured.

In a blog post, Inna Meklin, Director of Digital Marketing at LinkedIn, and Cassie Dell, Group Manager, Organic Growth at LinkedIn, detailed the tactics that got results.

Content Structure And Markup

LinkedIn found that how you organize content affects whether LLMs can extract and surface it. The authors wrote that headings and information hierarchy matter because “the more structured and logical your content is, the easier it is for LLMs to understand and surface.”

Semantic HTML markup also played a role, with clear structure helping LLMs interpret what each section is for. The authors called this “AI readability.”

The takeaway is that content structure isn’t just a UX consideration anymore. Proper heading hierarchy and clean markup may affect whether your content gets cited.

Expert Authorship And Timestamps

LinkedIn’s testing also pointed to credibility signals. The authors wrote:

“LLMs favor content that signals credibility and relevance, authored by real experts, clearly time-stamped, and written in a conversational, insight-driven style.”

Named authors with visible credentials and clear publication dates appeared to perform better in LinkedIn’s testing than anonymous or undated content.

The Measurement Change

LinkedIn added new KPIs alongside traffic for awareness-stage content, tracking citation share, visibility rate, and LLM mentions using AI visibility software. The company also said it’s creating a new traffic source in its internal analytics specifically for LLM-driven visits, and monitoring LLM bot behavior in CMS logs.

The authors acknowledged the measurement challenge:

“We simply couldn’t quantify how visibility within LLM responses impacts the bottom line.”

For teams still reporting traffic as the primary SEO metric, there’s a gap here. If non-brand informational content is increasingly consumed inside AI answers rather than on your site, traffic may undercount your actual reach.

Why This Matters

What caught my attention is how much this overlaps with what AI platforms themselves are saying.

SEJ’s Roger Montti recently interviewed Jesse Dwyer from Perplexity about what drives AI search visibility. Dwyer explained that Perplexity retrieves content at the sub-document level, pulling granular fragments rather than reasoning over full pages. That means how you structure content affects whether it gets extracted at all.

LinkedIn’s findings point in the same direction from the publisher side. Structure and markup matter because LLMs parse content in fragments. The credibility signals LinkedIn identified, like expert authorship and timestamps, appear to affect which fragments get surfaced.

When a heavily-cited source and an AI search platform land on the same conclusions independently, you have something to work with beyond speculation.

Looking Ahead

The authors are adopting a different mindset that practitioners can learn from:

“We are moving away from ‘search, click, website’ thinking toward a new model: Be seen, be mentioned, be considered, be chosen.”

LinkedIn indicated Part 3 of the series will include a guide on optimizing owned content for AI search, covering answer blocks and explicit definitions.