Merging SEO And Content Using Your Knowledge Graph to AI-Proof Content via @sejournal, @marthavanberkel

New AI platforms, powered by generative technologies like Google’s Gemini, Microsoft’s Copilot, Grok, and countless specialized chatbots, are rapidly becoming the front door for digital discovery.

We’ve entered an era of machine-led discovery, where AI systems aggregate, summarize, and contextualize content across multiple platforms.

Users today no longer follow a linear journey from keyword to website. Instead, they engage in conversations and move fluidly between channels and experiences.

These shifts are being driven by new types of digital engagement, including:

  • AI-generated overviews, such as AI Overviews in Google, that pull data from many sources.
  • Conversational search, such as ChatGPT and Gemini, where follow-up questions replace traditional browsing.
  • Social engagement, with platforms like TikTok equipped with their own generative search features, engaging entire generations in interactive journeys of discovery.

The result is a new definition of discoverability and a need to rethink how you manage your brand across these experiences.

It’s not enough to optimize your brand’s website for search engines. You must ensure your website content is machine-consumable and semantically connected to appear in AI-generated results.

This is why forward-thinking organizations are turning to schema markup (structured data) and building content knowledge graphs to manage the data layer that powers both traditional search and emerging AI platforms.

Semantic structured data transforms your content into a machine-readable network of information, enabling your brand to be recognized, connected, and potentially included in AI-driven experiences across channels.

In this article, we’ll explore how SEO and content teams can partner to build a content knowledge graph that fuels discoverability in the age of AI, and why this approach is critical for enterprise brands aiming to future-proof their digital presence.

Why Schema Markup Is Your Strategic Data Layer

You may be asking, “Schema markup – is that not just for rich results (visual changes in SERP)?”

Schema markup is no longer just a technical SEO tactic for achieving rich results; it can also be used to define the content on your website and its relationship to other entities within your brand.

When you apply markup in a connected way, AI and search can do more accurate inferencing, resulting in more accurate targeting to user queries or prompts.

In May 2025, Google and Microsoft both reiterated that the use of structured data does make your content “machine-readable” that makes you eligible for certain features. [Editor’s note: Although, Gary Illyes recently said to avoid excessive use and that Schema is not a ranking factor.]

Schema markup can be a strategic foundation for creating a data layer that feeds AI systems. While schema markup is a technical SEO approach, it all starts with content.

When You Implement Schema Markup, You’re:

Defining Entities

Schema markup clarifies the “things” your content is about, such as products, services, people, locations, and more.

It provides precise tags that help machines recognize and categorize your content accurately.

Establishing Relationships

Beyond defining individual entities (a.k.a. topics), schema markup describes how those entities connect to each other and to broader topics across the web.

This creates a web of meaning that mirrors how humans understand context and relationships.

Providing Machine-Readable Context

Schema markup assists your content to be machine-readable.

It enables search engines and AI tools to confidently identify, interpret, and surface your content in relevant contexts, which can help your brand appear where it is most relevant.

Enterprise SEO and content teams can work together to implement schema markup to create a content knowledge graph, a structured representation of your brand’s expertise, offerings, and topic authority.

When you do this, the data you put into search and AI platforms is ready for large language models (LLMs) to make accurate inferences, which can help with consumer visibility.

What Is A Content Knowledge Graph?

A content knowledge graph organizes your website’s data into a network of interconnected entities and topics, all defined by implementing schema markup based on the Schema.org vocabulary. This graph serves as a digital map of your brand’s expertise and topical authority.

Imagine your website as a library. Without a knowledge graph, AI systems trying to read your site have to sift through thousands of pages, hoping to piece together meaning from scattered words and phrases.

With a content knowledge graph:

  • Entities are defined. Machines can informed precisely who, what, and where you’re talking about.
  • Topics are connected. Machines can better understand and infer how subjects relate. For example, machines can infer that “cardiology” encompasses entities like heart disease, cholesterol, or specific medical procedures.
  • Content becomes query-ready. your content is assisted to become structured data that AI can reference, cite, and include in responses.

When your content is organized into a knowledge graph, you’re effectively supplying AI platforms with information about your products, services, and expertise.

This becomes a powerful control point for how your brand is represented in AI search experiences.

Rather than leaving it to chance how AI systems interpret your web content, you can help to proactively shape the narrative and ensure machines have the right signals to potentially include your brand in conversations, summaries, and recommendations.

Your organization’s leaders should be aware this is now a strategic issue, not just a technical one.

A content knowledge graph gives you some influence over how your organization’s expertise and authority are recognized and distributed by AI systems, which can impact discoverability, reputation, and competitive advantage in a rapidly evolving digital landscape.

This structure can improve your chances of appearing in AI-generated answers and equips your content and SEO teams with data-driven insights to guide your content strategy and optimization efforts.

How Enterprise SEO And Content Teams Can Build A Content Knowledge Graph

Here’s how enterprise teams can operationalize a content knowledge graph to future-proof discoverability and unify SEO and content strategies:

1. Define What You Want To Be Known For

Enterprise brands should start by identifying their core topical authority areas. Ask:

  • Which topics matter most to our audience and brand?
  • Where do we want to be the recognized authority?
  • What new topics are emerging in our industry that we should own?

These strategic priorities shape the pillars of your content knowledge graph.

2. Use Schema Markup To Define Key Entities

Next, use schema markup to:

  • Identify key entities tied to your priority topics, such as products, services, people, places, or concepts.
  • Connect those entities to each other through Schema.org properties, such as “about,” “mentions,” or “sameAs.”
  • Ensure consistent entity definitions across your entire site so that AI systems can reliably identify and understand entities and their relationships.

This is how your content becomes machine-readable and more likely to be accurately included in AI-driven results and recommendations.

3. Audit Your Existing Content Against Your Content Knowledge Graph

Instead of just tracking keywords, enterprises should audit their content based on entity coverage:

  • Are all priority entities represented on your site?
  • Do you have “entity homes” (pillar pages) that serve as authoritative hubs for those priority entities?
  • Where are there gaps in entity coverage that could limit your presence in search and AI responses?
  • What content opportunities exist to improve coverage of priority entities where these gaps have been identified?

A thorough audit provides a clear roadmap for aligning your content strategy with how machines interpret and surface information, ensuring your brand has the potential to be discoverable in evolving AI-driven search experiences.

4. Create Pillar Pages And Fill Content Gaps

Based on your findings from Step 3, create dedicated pillar pages for high-priority entities where needed. These become the authoritative source that:

  • Defines the entity.
  • Links to supporting content, including case studies, blog posts, or service pages.
  • Signals to search engines and AI systems on where to find reliable information about that entity.

Supporting content can then be created to expand on subtopics and related entities that link back to these pillar pages, ensuring comprehensive coverage of topics.

5. Measure Performance By Entity And Topic

Finally, enterprises should track how well their content performs at the entity and topic levels:

  • Which entities drive impressions and clicks in AI-powered search results?
  • Are there emerging entities gaining traction in your industry that you should cover?
  • How does your topical authority compare to competitors?

This data-driven approach enables continuous optimization, helping you to stay visible as AI search evolves.

Why SEO And Content Teams Are The Heroes Of The AI Search Evolution

In this new landscape, where AI generates answers before users ever reach your website, schema markup and content knowledge graphs provide a critical control point.

They enable your brand to signal its authority to machines, support the possibility of accurate inclusion in AI results and overviews, and inform SEO and content investment based on data, not guesswork.

For enterprise organizations, this isn’t just an SEO tactic; it’s a strategic imperative that could protect visibility and brand presence in the new digital ecosystem.

So, the question remains: What does your brand want to be known for?

Your content knowledge graph is the infrastructure that ensures AI systems, and by extension, your future customers, know the answer.

More Resources:


Featured Image: Urbanscape/Shutterstock

2025 Core Web Vitals Challenge: WordPress Versus Everyone via @sejournal, @martinibuster

The Core Web Vitals Technology Report shows the top-ranked content management systems by Core Web Vitals (CWV) for the month of June (July’s statistics aren’t out yet). The breakout star this year is an e-commerce platform, which is notable because shopping sites generally have poor performance due to the heavy JavaScript and image loads necessary to provide shopping features.

This comparison also looks at the Interaction to Next Paint (INP) scores because they don’t mirror the CWV scores. INP measures how quickly a website responds visually after a user interacts with it. The phrase “next paint” refers to the moment the browser visually updates the page in response to a user’s interaction.

A poor INP score can mean that users will be frustrated with the site because it’s perceived as unresponsive. A good INP score correlates with a better user experience because of how quickly the website performs.

Core Web Vitals Technology Report

The HTTP Archive Technology Report combines two public datasets:

  1. Chrome UX Report (CrUX)
  2. HTTP Archive

1. Chrome UX Report (CrUX)
CrUX obtains its data from Chrome users who opt into providing usage statistics reporting as they browse over 8 million websites. This data includes performance on Core Web Vitals metrics and is aggregated into monthly datasets.

2. HTTP Archive
HTTP Archive obtains its data from lab tests by tools like WebPageTest and Lighthouse that analyze how pages are built and whether they follow performance best practices. Together, these datasets show how websites perform and what technologies they use.

The CWV Technology Report combines data from HTTP Archive (which tracks websites through lab-based crawling and testing) and CrUX (which collects real-user performance data from Chrome users), and that’s where the Core Web Vitals performance data of content management systems comes from.

#1 Ranked Core Web Vitals (CWV) Performer

The top-performing content management system is Duda. A remarkable 83.63% of websites on the Duda platform received a good CWV score. Duda has consistently ranked #1, and this month continues that trend.

For Interaction to Next Paint scores, Duda ranks in the second position.

#2 Ranked CWV CMS: Shopify

The next position is occupied by Shopify. 75.22% of Shopify websites received a good CWV score.

This is extraordinary because shopping sites are typically burdened with excessive JavaScript to power features like product filters, sliders, image effects, and other tools that shoppers rely on to make their choices. Shopify, however, appears to have largely solved those issues and is outperforming other platforms, like Wix and WordPress.

In terms of INP, Shopify is ranked #3, at the upper end of the rankings.

#3 Ranked CMS For CWV: Wix

Wix comes in third place, just behind Shopify. 70.76% of Wix websites received a good CWV score. In terms of INP scores, 86.82% of Wix sites received a good INP score. That puts them in fourth place for INP.

#4 Ranked CMS: Squarespace

67.66% of Squarespace sites had a good CWV score, putting them in fourth place for CWV, just a few percentage points behind the No. 3 ranked Wix.

That said, Squarespace ranks No. 1 for INP, with a total of 95.85% of Squarespace sites achieving a good INP score. That’s a big deal because INP is a strong indicator of a good user experience.

#5 Ranked CMS: Drupal

59.07% of sites on the Drupal platform had a good CWV score. That’s more than half of sites, considerably lower than Duda’s 83.63% score but higher than WordPress’s score.

But when it comes to the INP score, Drupal ranks last, with only 85.5% of sites scoring a good INP score.

#6 Ranked CMS: WordPress

Only 43.44% of WordPress sites had a good CWV score. That’s over fifteen percentage points lower than fifth-ranked Drupal. So WordPress isn’t just last in terms of CWV performance; it’s last by a wide margin.

WordPress performance hasn’t been getting better this year either. It started 2025 at 42.58%, then went up a few points in April to 44.93%, then fell back to 43.44%, finishing June at less than one percentage point higher than where it started the year.

WordPress is in fifth place for INP scores, with 85.89% of WordPress sites achieving a good INP score, just 0.39 points above Drupal, which is in last place.

But that’s not the whole story about the WordPress INP scores. WordPress started the year with a score of 86.05% and ended June with a slightly lower score.

INP Rankings By CMS

Here are the rankings for INP, with the percentage of sites exhibiting a good INP score next to the CMS name:

  1. Squarespace 95.85%
  2. Duda 93.35%
  3. Shopify 89.07%
  4. Wix 86.82%
  5. WordPress 85.89%
  6. Drupal 85.5%

As you can see, positions 3–6 are all bunched together in the eighty percent range, with only a 3.57 percentage point difference between the last-placed Drupal and the third-ranked Shopify. So, clearly, all the content management systems deserve a trophy for INP scores. Those are decent scores, especially for Shopify, which earned a second-place ranking for CWV and third place for INP.

Takeaways

  • Duda Is #1
    Duda leads in Core Web Vitals (CWV) performance, with 83.63% of sites scoring well, maintaining its top position.
  • Shopify Is A Strong Performer
    Shopify ranks #2 for CWV, a surprising performance given the complexity of e-commerce platforms, and scores well for INP.
  • Squarespace #1 For User Experience
    Squarespace ranks #1 for INP, with 95.85% of its sites showing good responsiveness, indicating an excellent user experience.
  • WordPress Performance Scores Are Stagnant
    WordPress lags far behind, with only 43.44% of sites passing CWV and no signs of positive momentum.
  • Drupal Also Lags
    Drupal ranks last in INP and fifth in CWV, with over half its sites passing but still underperforming against most competitors.
  • INP Scores Are Generally High Across All CMSs
    Overall INP scores are close among the bottom four platforms, suggesting that INP scores are relatively high across all content management systems.

Find the Looker Studio rankings for here (must be logged into a Google account to view).

Featured Image by Shutterstock/Krakenimages.com

OpenAI Is Pulling Shared ChatGPT Chats From Google Search via @sejournal, @MattGSouthern

OpenAI has rolled back a feature that allowed ChatGPT conversations shared via link to appear in Google Search results.

The company confirms it has disabled the toggle that enabled shared chats to be “discoverable” by search engines and is working to remove existing indexed links.

Shared Chats Were “Short-Lived Experiment”

When users shared a ChatGPT conversation using the platform’s built-in “Share” button, they were given the option to make the chat visible in search engines.

That feature, introduced quietly earlier this year, caused concern after thousands of personal chats started showing up in search results.

Fast Company first reported the issue, finding over 4,500 shared ChatGPT links indexed by Google, some containing personally identifiable information such as names, resumes, emotional reflections, and confidential work content.

In a statement, OpenAI confirms:

“We just removed a feature from [ChatGPT] that allowed users to make their conversations discoverable by search engines, such as Google. This was a short-lived experiment to help people discover useful conversations. This feature required users to opt-in, first by picking a chat to share, then by clicking a checkbox for it to be shared with search engines (see below).

Ultimately we think this feature introduced too many opportunities for folks to accidentally share things they didn’t intend to, so we’re removing the option. We’re also working to remove indexed content from the relevant search engines. This change is rolling out to all users through tomorrow morning.

Security and privacy are paramount for us, and we’ll keep working to maximally reflect that in our products and features.”

How the Feature Worked

By default, shared ChatGPT links were accessible only to people with the URL. But users could choose to toggle on discoverability, allowing search engines like Google to index the conversation.

That setting has now been removed, and previously shared chats will no longer be indexed going forward. However, OpenAI cautions that already-indexed content may still appear in search results temporarily due to caching.

Importantly, deleting a conversation from your ChatGPT history does not delete the public share link or remove it from search engines.

Why It Matters

The discoverability toggle was intended to encourage people to reuse outputs generated in ChatGPT, but the company acknowledges it came with unintended privacy tradeoffs.

Even though OpenAI offered explicit controls over visibility, many people may not have understood the implications of enabling search indexing.

This is a reminder to be cautious about what kinds of information you enter into AI chatbots. Although a chat starts out private, features like sharing, logging, or model training can create paths for that content to be exposed publicly.

Looking Ahead

OpenAI says it’s working with Google and other search engines to remove indexed shared links and is reassessing how public sharing features are handled in ChatGPT.

If you’ve shared a ChatGPT conversation in the past, you can check your visibility settings and delete shared links through the ChatGPT Shared Links dashboard.

Featured Image: Mehaniq/Shutterstock

The two people shaping the future of OpenAI’s research

For the past couple of years, OpenAI has felt like a one-man brand. With his showbiz style and fundraising glitz, CEO Sam Altman overshadows all other big names on the firm’s roster. Even his bungled ouster ended with him back on top—and more famous than ever. But look past the charismatic frontman and you get a clearer sense of where this company is going. After all, Altman is not the one building the technology on which its reputation rests. 

That responsibility falls to OpenAI’s twin heads of research—chief research officer Mark Chen and chief scientist Jakub Pachocki. Between them, they share the role of making sure OpenAI stays one step ahead of powerhouse rivals like Google.

I sat down with Chen and Pachocki for an exclusive conversation during a recent trip the pair made to London, where OpenAI set up its first international office in 2023. We talked about how they manage the inherent tension between research and product. We also talked about why they think coding and math are the keys to more capable all-purpose models; what they really mean when they talk about AGI; and what happened to OpenAI’s superalignment team, set up by the firm’s cofounder and former chief scientist Ilya Sutskever to prevent a hypothetical superintelligence from going rogue, which disbanded soon after he quit. 

In particular, I wanted to get a sense of where their heads are at in the run-up to OpenAI’s biggest product release in months: GPT-5.

Reports are out that the firm’s next-generation model will be launched in August. OpenAI’s official line—well, Altman’s—is that it will release GPT-5 “soon.” Anticipation is high. The leaps OpenAI made with GPT-3 and then GPT-4 raised the bar of what was thought possible with this technology. And yet delays to the launch of GPT-5 have fueled rumors that OpenAI has struggled to build a model that meets its own—not to mention everyone else’s—expectations.

But expectation management is part of the job for a company that for the last several years has set the agenda for the industry. And Chen and Pachocki set the agenda inside OpenAI.

Twin peaks 

The firm’s main London office is in St James’s Park, a few hundred meters east of Buckingham Palace. But I met Chen and Pachocki in a conference room in a coworking space near King’s Cross, which OpenAI keeps as a kind of pied-à-terre in the heart of London’s tech neighborhood (Google DeepMind and Meta are just around the corner). OpenAI’s head of research communications, Laurance Fauconnet, sat with an open laptop at the end of the table. 

Chen, who was wearing a maroon polo shirt, is clean-cut, almost preppy. He’s media trained and comfortable talking to a reporter. (That’s him flirting with a chatbot in the “Introducing GPT-4o” video.) Pachocki, in a black elephant-logo tee, has more of a TV-movie hacker look. He stares at his hands a lot when he speaks.

But the pair are a tighter double act than they first appear. Pachocki summed up their roles. Chen shapes and manages the research teams, he said. “I am responsible for setting the research roadmap and establishing our long-term technical vision.”

“But there’s fluidity in the roles,” Chen said. “We’re both researchers, we pull on technical threads. Whatever we see that we can pull on and fix, that’s what we do.”

Chen joined the company in 2018 after working as a quantitative trader at the Wall Street firm Jane Street Capital, where he developed machine-learning models for futures trading. At OpenAI he spearheaded the creation of DALL-E, the firm’s breakthrough generative image model. He then worked on adding image recognition to GPT‑4 and led the development of Codex, the generative coding model that powers GitHub Copilot.

Pachocki left an academic career in theoretical computer science to join OpenAI in 2017 and replaced Sutskever as chief scientist in 2024. He is the key architect of OpenAI’s so-called reasoning models—especially o1 and o3—which are designed to tackle complex tasks in science, math, and coding. 

When we met they were buzzing, fresh off the high of two new back-to-back wins for their company’s technology.

On July 16, one of OpenAI’s large language models came in second in the AtCoder World Tour Finals, one of the world’s most hardcore programming competitions. On July 19, OpenAI announced that one of its models had achieved gold-medal-level results on the 2025 International Math Olympiad, one of the world’s most prestigious math contests.

The math result made headlines, not only because of OpenAI’s remarkable achievement, but because rival Google DeepMind revealed two days later that one of its models had achieved the same score in the same competition. Google DeepMind had played by the competition’s rules and waited for its results to be checked by the organizers before making an announcement; OpenAI had in effect marked its own answers.

For Chen and Pachocki, the result speaks for itself. Anyway, it’s the programming win they’re most excited about. “I think that’s quite underrated,” Chen told me. A gold medal result in the International Math Olympiad puts you somewhere in the top 20 to 50 competitors, he said. But in the AtCoder contest OpenAI’s model placed in the top two: “To break into a really different tier of human performance—that’s unprecedented.”

Ship, ship, ship!

People at OpenAI still like to say they work at a research lab. But the company is very different from the one it was before the release of ChatGPT three years ago. The firm is now in a race with the biggest and richest technology companies in the world and valued at $300 billion. Envelope-pushing research and eye-catching demos no longer cut it. It needs to ship products and get them into people’s hands—and boy, it does. 

OpenAI has kept up a run of new releases—putting out major updates to its GPT-4 series, launching a string of generative image and video models, and introducing the ability to talk to ChatGPT with your voice. Six months ago it kicked off a new wave of so-called reasoning models with its o1 release, soon followed by o3. And last week it released its browser-using agent Operator to the public. It now claims that more than 400 million people use its products every week and submit 2.5 billion prompts a day. 

OpenAI’s incoming CEO of applications, Fidji Simo, plans to keep up the momentum. In a memo to the company, she told employees she is looking forward to “helping get OpenAI’s technologies into the hands of more people around the world,” where they will “unlock more opportunities for more people than any other technology in history.” Expect the products to keep coming.

I asked how OpenAI juggles open-ended research and product development. “This is something we have been thinking about for a very long time, long before ChatGPT,” Pachocki said. “If we are actually serious about trying to build artificial general intelligence, clearly there will be so much that you can do with this technology along the way, so many tangents you can go down that will be big products.” In other words, keep shaking the tree and harvest what you can.

A talking point that comes up with OpenAI folks is that putting experimental models out into the world was a necessary part of research. The goal was to make people aware of how good this technology had become. “We want to educate people about what’s coming so that we can participate in what will be a very hard societal conversation,” Altman told me back in 2022. The makers of this strange new technology were also curious what it might be for: OpenAI was keen to get it into people’s hands to see what they would do with it.

Is that still the case? They answered at the same time. “Yeah!” Chen said. “To some extent,” Pachocki said. Chen laughed: “No, go ahead.” 

“I wouldn’t say research iterates on product,” said Pachocki. “But now that models are at the edge of the capabilities that can be measured by classical benchmarks and a lot of the long-standing challenges that we’ve been thinking about are starting to fall, we’re at the point where it really is about what the models can do in the real world.”

Like taking on humans in coding competitions. The person who beat OpenAI’s model at this year’s AtCoder contest, held in Japan, was a programmer named Przemysław Dębiak, also known as Psyho. The contest was a puzzle-solving marathon in which competitors had 10 hours to find the most efficient way to solve a complex coding problem. After his win, Psyho posted on X: “I’m completely exhausted … I’m barely alive.”  

Chen and Pachocki have strong ties to the world of competitive coding. Both have competed in international coding contests in the past and Chen coaches the USA Computing Olympiad team. I asked whether that personal enthusiasm for competitive coding colors their sense of how big a deal it is for a model to perform well at such a challenge.

They both laughed. “Definitely,” said Pachocki. “So: Psyho is kind of a legend. He’s been the number one competitor for many years. He’s also actually a friend of mine—we used to compete together in these contests.” Dębiak also used to work with Pachocki at OpenAI.

When Pachocki competed in coding contests he favored those that focused on shorter problems with concrete solutions. But Dębiak liked longer, open-ended problems without an obvious correct answer.

“He used to poke fun at me, saying that the kind of contest I was into will be automated long before the ones he liked,” Pachocki recalled. “So I was seriously invested in the performance of this model in this latest competition.”

Pachocki told me he was glued to the late-night livestream from Tokyo, watching his model come in second: “Psyho resists for now.” 

“We’ve tracked the performance of LLMs on coding contests for a while,” said Chen. “We’ve watched them become better than me, better than Jakub. It feels something like Lee Sedol playing Go.”

Lee is the master Go player who lost a series of matches to DeepMind’s game-playing model AlphaGo in 2016. The results stunned the international Go community and led Lee to give up professional play. Last year he told the New York Times: “Losing to AI, in a sense, meant my entire world was collapsing … I could no longer enjoy the game.” And yet, unlike Lee, Chen and Pachocki are thrilled to be surpassed.   

But why should the rest of us care about these niche wins? It’s clear that this technology—designed to mimic and, ultimately, stand in for human intelligence—is being built by people whose idea of peak intelligence is acing a math contest or holding your own against a legendary coder. Is it a problem that this view of intelligence is skewed toward the mathematical, analytical end of the scale?

“I mean, I think you are right that—you know, selfishly, we do want to create models which accelerate ourselves,” Chen told me. “We see that as a very fast factor to progress.”  

The argument researchers like Chen and Pachocki make is that math and coding are the bedrock for a far more general form of intelligence, one that can solve a wide range of problems in ways we might not have thought of ourselves. “We’re talking about programming and math here,” said Pachocki. “But it’s really about creativity, coming up with novel ideas, connecting ideas from different places.”

Look at the two recent competitions: “In both cases, there were problems which required very hard, out-of-the-box thinking. Psyho spent half the programming competition thinking and then came up with a solution that was really novel and quite different from anything that our model looked at.”

“This is really what we’re after,” Pachocki continued. “How do we get models to discover this sort of novel insight? To actually advance our knowledge? I think they are already capable of that in some limited ways. But I think this technology has the potential to really accelerate scientific progress.” 

I returned to the question about whether the focus on math and programming was a problem, conceding that maybe it’s fine if what we’re building are tools to help us do science. We don’t necessarily want large language models to replace politicians and have people skills, I suggested.

Chen pulled a face and looked up at the ceiling: “Why not?”

What’s missing

OpenAI was founded with a level of hubris that stood out even by Silicon Valley standards, boasting about its goal of building AGI back when talk of AGI still sounded kooky. OpenAI remains as gung-ho about AGI as ever, and it has done more than most to make AGI a mainstream multibillion-dollar concern. It’s not there yet, though. I asked Chen and Pachocki what they think is missing.

“I think the way to envision the future is to really, deeply study the technology that we see today,” Pachocki said. “From the beginning, OpenAI has looked at deep learning as this very mysterious and clearly very powerful technology with a lot of potential. We’ve been trying to understand its bottlenecks. What can it do? What can it not do?”  

At the current cutting edge, Chen said, are reasoning models, which break down problems into smaller, more manageable steps, but even they have limits: “You know, you have these models which know a lot of things but can’t chain that knowledge together. Why is that? Why can’t it do that in a way that humans can?”

OpenAI is throwing everything at answering that question.

“We are probably still, like, at the very beginning of this reasoning paradigm,” Pachocki told me. “Really, we are thinking about how to get these models to learn and explore over the long term and actually deliver very new ideas.”

Chen pushed the point home: “I really don’t consider reasoning done. We’ve definitely not solved it. You have to read so much text to get a kind of approximation of what humans know.”

OpenAI won’t say what data it uses to train its models or give details about their size and shape—only that it is working hard to make all stages of the development process more efficient.

Those efforts make them confident that so-called scaling laws—which suggest that models will continue to get better the more compute you throw at them—show no sign of breaking down.

“I don’t think there’s evidence that scaling laws are dead in any sense,” Chen insisted. “There have always been bottlenecks, right? Sometimes they’re to do with the way models are built. Sometimes they’re to do with data. But fundamentally it’s just about finding the research that breaks you through the current bottleneck.” 

The faith in progress is unshakeable. I brought up something Pachocki had said about AGI in an interview with Nature in May: “When I joined OpenAI in 2017, I was still among the biggest skeptics at the company.” He looked doubtful. 

“I’m not sure I was skeptical about the concept,” he said. “But I think I was—” He paused, looking at his hands on the table in front of him. “When I joined OpenAI, I expected the timelines to be longer to get to the point that we are now.”

“There’s a lot of consequences of AI,” he said. “But the one I think the most about is automated research. When we look at human history, a lot of it is about technological progress, about humans building new technologies. The point when computers can develop new technologies themselves seems like a very important, um, inflection point.

“We already see these models assist scientists. But when they are able to work on longer horizons—when they’re able to establish research programs for themselves—the world will feel meaningfully different.”

For Chen, that ability for models to work by themselves for longer is key. “I mean, I do think everyone has their own definitions of AGI,” he said. “But this concept of autonomous time—just the amount of time that the model can spend making productive progress on a difficult problem without hitting a dead end—that’s one of the big things that we’re after.”

It’s a bold vision—and far beyond the capabilities of today’s models. But I was nevertheless struck by how Chen and Pachocki made AGI sound almost mundane. Compare this with how Sutskever responded when I spoke to him 18 months ago. “It’s going to be monumental, earth-shattering,” he told me. “There will be a before and an after.” Faced with the immensity of what he was building, Sutskever switched the focus of his career from designing better and better models to figuring out how to control a technology that he believed would soon be smarter than himself.

Two years ago Sutskever set up what he called a superalignment team that he would co-lead with another OpenAI safety researcher, Jan Leike. The claim was that this team would funnel a full fifth of OpenAI’s resources into figuring out how to control a hypothetical superintelligence. Today, most of the people on the superalignment team, including Sutskever and Leike, have left the company and the team no longer exists.   

When Leike quit, he said it was because the team had not been given the support he felt it deserved. He posted this on X: “Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity. But over the past years, safety culture and processes have taken a backseat to shiny products.” Other departing researchers shared similar statements.

I asked Chen and Pachocki what they make of such concerns. “A lot of these things are highly personal decisions,” Chen said. “You know, a researcher can kind of, you know—”

He started again. “They might have a belief that the field is going to evolve in a certain way and that their research is going to pan out and is going to bear fruit. And, you know, maybe the company doesn’t reshape in the way that you want it to. It’s a very dynamic field.”

“A lot of these things are personal decisions,” he repeated. “Sometimes the field is just evolving in a way that is less consistent with the way that you’re doing research.”

But alignment, both of them insist, is now part of the core business rather than the concern of one specific team. According to Pachocki, these models don’t work at all unless they work as you expect them to. There’s also little desire to focus on aligning a hypothetical superintelligence with your objectives when doing so with existing models is already enough of a challenge.

“Two years ago the risks that we were imagining were mostly theoretical risks,” Pachocki said. “The world today looks very different, and I think a lot of alignment problems are now very practically motivated.”

Still, experimental technology is being spun into mass-market products faster than ever before. Does that really never lead to disagreements between the two of them?

I am often afforded the luxury of really kind of thinking about the long term, where the technology is headed,” Pachocki said. “Contending with the reality of the process—both in terms of people and also, like, the broader company needs—falls on Mark. It’s not really a disagreement, but there is a natural tension between these different objectives and the different challenges that the company is facing that materializes between us.”

Chen jumped in: “I think it’s just a very delicate balance.”  

Correction: we have removed a line referring to an Altman message on X about GPT-5.

The Download: OpenAI’s future research, and US climate regulation is under threat

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

The two people shaping the future of OpenAI’s research

—Will Douglas Heaven

For the past couple of years, OpenAI has felt like a one-man brand. With his showbiz style and fundraising glitz, CEO Sam Altman overshadows all other big names on the firm’s roster.

But Altman is not the one building the technology on which its reputation rests. That responsibility falls to OpenAI’s twin heads of research—chief research officer Mark Chen and chief scientist Jakub Pachocki. Between them, they share the role of making sure OpenAI stays one step ahead of powerhouse rivals like Google.

I recently sat down with Chen and Pachocki for an exclusive conversation which covered everything from how they manage the inherent tension between research and product, to what they really mean when they talk about AGI, to what happened to OpenAI’s superalignment team. 

I also wanted to get a sense of where their heads are at in the run-up to OpenAI’s biggest product release in months: GPT-5. Read the full story.

An EPA rule change threatens to gut US climate regulations

The mechanism that allows the US federal government to regulate climate change is on the chopping block.

On Tuesday, US Environmental Protection Agency administrator Lee Zeldin announced that the agency is taking aim at the endangerment finding, a 2009 rule that’s essentially the tentpole supporting federal greenhouse-gas regulations.

This might sound like an obscure legal situation, but it’s a really big deal for climate policy in the US. So let’s look at what this rule says now, what the proposed change looks like, and what it all means. Read the full story.

—Casey Crownhart

This story is part of MIT Technology Review’s “America Undone” series, examining how the foundations of US success in science and innovation are currently under threat. You can read the rest here.

It appeared first in The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

The AI Hype Index: The White House’s war on “woke AI”

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry. Take a look at this month’s edition of the index here.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Trump has announced a new US health care records system 
Experts warn the initiative could leave patients’ medical records open to abuse. (NYT $)
+ Big Tech has pledged to work with providers and health systems. (The Hill)

2 China says it’s worried Nvidia’s chips have serious security issues
Just as the company sought to resume sales in the country. (Reuters)
+Experts reportedly found the chips featured location tracking tech. (FT $)

3 Mark Zuckerberg believes superintelligence “is now in sight”
Although he didn’t illuminate what it even means. (The Guardian)
+ Zuckerberg has taken a leaf out of the Altman playbook. (NY Mag $)
+ Don’t expect Meta to open source any of those superintelligent models. (TechCrunch)
+ Tech billionaires are making a risky bet with humanity’s future. (MIT Technology Review)

4 NASA is in turmoil
Without a permanent leader, workers are leaving in their thousands. (WP $)

5 Google removed negative articles about a tech CEO from search results
After someone made fraudulent requests using its Refresh Outdated Content Tool. (404 Media)
+ They exploited a bug in the tool to get pages removed. (Ars Technica)

6 How AI has transformed data center design
They need to accommodate a lot more heat and power than they used to. (FT $)
+ A proposed Wyoming data center would use more electricity than its homes. (Ars Technica)
+ Apple manufacturer Foxconn wants to get involved in building data centers. (CNBC)
+ Should we be moving data centers to space? (MIT Technology Review)

7 AI agents can probe websites for security weaknesses
Especially shoddily-constructed vibe-coded ones. (Wired $)
+ Cyberattacks by AI agents are coming. (MIT Technology Review)

8 New forms of life have been filmed at the ocean’s deepest points
The abundance of life was amazing, the Chinese-led research team says. (BBC)
+ Meet the divers trying to figure out how deep humans can go. (MIT Technology Review)

9 TikTok is adding Footnotes to its clips
As AI-generated videos become even harder to spot. (The Verge)
+ This fake viral clip of rabbits on a trampoline is a great example. (404 Media)

10 What it’s like to attend an Elon Musk fan fest
X Takeover promised to unite Tesla and SpaceX-heads alike. (Insider $)
+ Some people who definitely aren’t fans: neighbors of Tesla’s diner. (404 Media)

Quote of the day

“Patients across America should be very worried that their medical records are going to be used in ways that harm them and their families.”

—Lawrence Gostin, a Georgetown University law professor specializing in public health, warns of the potential repercussions of the Trump administration’s new health data tracking system, the Associated Press reports.

One more thing

The cost of building the perfect wave

For nearly as long as surfing has existed, surfers have been obsessed with the search for the perfect wave.

While this hunt has taken surfers from tropical coastlines to icebergs, these days that search may take place closer to home. That is, at least, the vision presented by developers and boosters in the growing industry of surf pools, spurred by advances in wave-­generating technology that have finally created artificial waves surfers actually want to ride.

But there’s a problem: some of these pools are in drought-ridden areas, and face fierce local opposition. At the core of these fights is a question that’s also at the heart of the sport: What is the cost of finding, or now creating, the perfect wave—and who will have to bear it? Read the full story.

—Eileen Guo

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Maybe airplane food isn’t so bad after all.
+ An unwitting metal detectorist uncovered some ancient armor in the Czech Republic that may have been worn during the Trojan war.
+ Talking of the siege of Troy, tickets for Christopher Nolan’s retelling of The Odyssey are already selling out a year before it’s released.
+ This fun website refreshes every few seconds with a new picture of someone pointing at your mouse pointer.

Charts: U.S. Manufacturing Trends Q3 2025

U.S. manufacturing activity showed modest improvement in June 2025, according to data released by the Institute for Supply Management (PDF).

The Institute’s Manufacturing Purchasing Managers Index (PMI) derives from monthly survey responses collected from purchasing and supply executives at more than 400 industrial firms. The overall PMI is a weighted composite of five seasonally adjusted indicators: new orders (30%), production (25%), employment (20%), supplier deliveries (15%), and inventories (10%).

The June PMI rose to 49.0, up from 48.5 in May, surpassing forecasts.

Since 1968 the U.S. Federal Reserve Bank in Philadelphia has conducted a monthly survey targeting approximately 250 manufacturers in its district of Delaware, southern New Jersey, and eastern and central Pennsylvania. Respondents report on the business conditions and various aspects of activity at their facilities, such as employment, work hours, new and backlogged orders, shipments, inventory levels, and delivery times.

The July 2025 survey, conducted from the 7th to the 14th, solicited executives’ expectations for changes in operational and labor costs for the current year. Most anticipate rising costs in all expense categories throughout 2025.

The National Association of Manufacturers (NAM) is the largest such organization in the United States, representing small and large manufacturers in every sector and in all 50 states.

Trade uncertainties and rising costs are the leading concerns for manufacturers, according to NAM’s Q1 2025 “Manufacturers’ Outlook Survey” (PDF) of approximately 250 firms, released in March.

Moreover, manufacturers are increasingly focusing on digital transformation. NAM introduced a question in March to measure the importance manufacturers are giving to digitally transforming their operations. Over one-third of respondents (36.8%) plan to moderately prioritize digital transformation in the coming year.

5 Ecommerce Platforms for Startups 2025

Ecommerce entrepreneurs getting started in 2025 should look for easy-to-use platforms that won’t bust the budget.

The ecommerce industry is maturing. The first online shop was likely the Boston Computer Exchange, launched in 1982. Amazon and eBay turn 30 this year.

For the most part, the technology underpinning online selling, such as displaying products or accepting payments, is standard and reliable. The difference in ecommerce platforms is not core functionality.

AI image of a male working on a laptop in a living room setting

Bootstrapped and focused: launching an ecommerce business from the living room.

Key Elements

Entrepreneurs with limited capital and no technical help require ecommerce tools that work out of the box, grow with the business, and stay out of the way.

Thus new, bootstrapped ecommerce merchants should seek six platform characteristics, as follows:

  • Easy to set up. It should be possible to launch in hours without hiring a designer or developer.
  • Easy to maintain. Changing prices, updating products, and adding pages should be small tasks.
  • Inexpensive or free. New entrepreneurs should invest in inventory and advertising. An ecommerce platform should be a marginal cost at worst.
  • Comprehensive. The platform should perform all ecommerce functions with the stability described above.
  • Able to scale. As it grows, a shop will need more features and throughput; the platform should be viable through a few million in annual sales.
  • Easy to market. New stores depend on advertising. This means having good landing pages, email capture forms, and discount features such as coupon codes. Organic traffic from search engines and generative AI platforms is beyond reach, so startups require features that turn paid clicks into customers.

5 Platforms

No list of ecommerce platforms makes everyone happy. It’s especially true for the makers of the many capable software solutions not mentioned below. But here are my five recommendations for new ecommerce stores.

Shopify

Shopify, the leading industry platform, is built for growth. The Basic plan is $39.99 per month or $359 annually, plus transaction fees. That investment delivers a fully fledged online store with unlimited products, loads of payment integrations, and access to the platform’s massive app ecosystem.

It is reasonable to have a Shopify store up and running in less than an hour that can easily scale to the enterprise level.

Some parts of the platform may be confusing for a novice, but Shopify works for most ecommerce businesses.

Square Online

Ecommerce businesses sometimes originate from in-person experiences with point-of-sale software. This might be a business that sells handmade jewelry at local art shows, has an Etsy shop, and wants its own ecommerce site. Why not pick a platform from a supplier you already work with?

The Square Online platform is low-cost to start (just transaction fees) and integrates directly with Square’s POS system. The service is built on the Weebly platform, which Square acquired in 2018, and is simple to use and understand.

Ecwid by Lightspeed

The term “ecommerce platform” evolved with the software it described. Years ago, the industry called these tools “shopping carts,” and many could bolt on to just about any website someone had built.

Ecwid is both a freestanding platform and a bolt-on to an existing site, shopping cart style. So, the gardening blog turned organic seed seller doesn’t require a new website; it needs only to add Ecwid. The same is true for the social influencer selling directly from a profile page.

The Ecwid add-on is free for five products and remains reasonably priced as a store scales.

Wix and Squarespace

Drag-and-drop editors make it easy for some of the least technical ecommerce operators to produce functional and attractive online stores.

My final two recommended ecommerce platforms — Wix and Squarespace — are head-to-head competitors known for remarkable ease of use and clean website designs.

These platforms appeal to startup founders who want to prioritize branding, design, and speed to market without hiring developers. Ecommerce functionality is built in, and templates come optimized for mobile and desktop, although neither platform is ideal for scaling.

Both cost less than $30 per month to start.

Success

Every new, bootstrapped ecommerce entrepreneur would love to win the Google lottery and have hundreds of eager shoppers flood in, but organic search is not a viable way to drive site traffic to a new store.

A new shop, with relatively thin content, cannot compete for transactional intent keyword phrases against long-established ecommerce sellers.

Regardless of the ecommerce platform, success will come from paid or at least active customer acquisition.

Google URL Removal Bug Enabled Attackers To Deindex URLs via @sejournal, @martinibuster

Google recently fixed a bug that enabled anyone to anonymously use an official Google tool to remove any URL from Google search and get away with it. The tool had the potential to be used to devastate competitor rankings by removing their URLs completely from Google’s index. The bug was known by Google since 2023 but until now Google hadn’t taken action to fix it.

Tool Exploited For Reputation Management

A report by the Freedom of the Press Foundation recounted the case of a tech CEO who had employed numerous tactics to “censor” negative reporting by a journalist, ranging from legal action to identify the reporter’s sources, an “intimidation campaign” via the San Francisco city attorney and a DMCA takedown request.

Through it all, the reporter and the Freedom of the Press Foundation prevailed in court, and the article at the center of the actions remained online until it began getting removed through abuse of Google’s Remove Outdated Content tool. Restoring the web page with Google Search Console was easy, but the abuse continued. This led to opening a discussion on the Google Search Console Help Community.

The person posted a description of what was happening and asked if there was a way to block abuse of the tool. The post alleged that the attacker was choosing a word that was no longer in the original article and using that as the basis for claiming an article is outdated and should be removed from Google’s search index.

This is what the report on Google’s Help Community explained:

“We have a dozen articles that got removed this way. We can measure it by searching Google for the article, using the headline in quotes and with the site name. It shows no results returned.

Then, we go to GSC and find it has been “APPROVED” under outdated content removal. We cancel that request. Moments later, the SAME search brings up an indexed article. This is the 5th time we’ve seen this happen.”

Four Hundred Articles Deindexed

What was happening was an aggressive attack against a website, and Google apparently was unable to do anything to stop the abuse, leaving the user in a very bad position.

In a follow-up post, they explained the devastating effect of the sustained negative SEO attack:

“Every week, dozens of pages are being deindexed and we have to check the GSC every day to see if anything else got removed, and then restore that.

We’ve had over 400 articles deindexed, and all of the articles were still live and on our sites. Someone went in and submitted them through the public removal tool, and they got deindexed.”

Google Promised To Look Into It

They asked if there was a way to block the attacks, and Google’s Danny Sullivan responded:

“Thank you — and again, the pages where you see the removal happening, there’s no blocking mechanism on them.”

Danny responded to a follow-up post, saying that they would look into it:

“The tool is designed to remove links that are no longer live or snippets that are no longer reflecting live content. We’ll look into this further.”

How Google’s Tool Was Exploited

The initial report said that the negative SEO attack was leveraging changed words within the content to file a successful outdated content removal. But it appears that they later discovered that another attack method was being used.

Google’s Outdated Content Removal tool is case-sensitive, which means that if you submit a URL containing an uppercase letter, the crawler will go out to specifically check for the uppercase version, and if the server returns a 404 Not Found error response, Google will remove all versions of the URL.

The Freedom of the Press Foundation writes that the tool is case insensitive, but that’s not entirely correct because if it were insensitive, the case wouldn’t matter. But the case does matter, which means that it is case sensitive.

By the way, the victim of the attack could have created a workaround by rewriting all requests for uppercase URLs to lowercase and enforcing lowercase URLs across the entire website.

That’s the flaw the attacker exploited. So, while the tool was case sensitive, at some point in the system Google’s removal system is case agnostic, which resulted in the correct URL being removed.

Here’s how the Freedom of the Press Foundation described it:

“Our article… was vanished from Google search using a novel maneuver that apparently hasn’t been publicly well documented before: a sustained and coordinated abuse of Google’s “Refresh Outdated Content” tool.

This tool is supposed to allow those who are not a site’s owner to request the removal from search results of web pages that are no longer live (returning a “404 error”), or to request an update in search of web pages that display outdated or obsolete information in returned results.

However, a malicious actor could, until recently, disappear a legitimate article by submitting a removal request for a URL that resembled the target article but led to a “404 error.” By altering the capitalization of a URL slug, a malicious actor apparently could take advantage of a case-insensitivity bug in Google’s automated system of content removal.”

Other Sites Affected By Thes Exploit

Google responded to the Freedom of the Press Foundation and admitted that this exploit did, in fact, affect other sites.

They are quoted as saying the issue only impacted a “tiny fraction of websites” and that the wrongly impacted sites were reinstated.

Google responded by email to note that this bug has been fixed.

Reddit Prioritizes Search, Sees 5X Growth in AI-Powered Answers via @sejournal, @MattGSouthern

Reddit is investing heavily in search, with CEO Steve Huffman announcing plans to position the platform as a destination for people seeking answers online.

In its Q2 shareholder letter, Reddit revealed that more than 70 million people now use its on-platform search each week.

Its AI-powered Reddit Answers feature is also gaining traction, reaching 6 million weekly users, up five times from the previous quarter.

Search Becomes a Strategic Priority

Reddit is now focusing on three key areas: improving the core product, growing its search presence, and expanding internationally.

As part of this shift, the company is scaling back work on its user economy initiatives.

Huffman stated:

“Reddit is one of the few platforms positioned to become a true search destination. We offer something special: a breadth of conversations and knowledge you can’t find anywhere else.”

The company plans to integrate Reddit Answers more deeply into its search experience, expand the feature to more markets, and launch marketing efforts to grow adoption globally.

Reddit Answers Gains Momentum

Reddit Answers, introduced earlier this year, uses the platform’s archive of human discussions to generate relevant responses to search queries.

It now has 6 million weekly active users and is available in the U.S., U.K., Canada, Australia, and India.

Integration with Reddit’s primary search experience is also being tested to make discovery more seamless.

Why This Matters

Reddit’s focus on search may offer new visibility opportunities. Its posts already rank well in Google results, now its internal search tools are being enhanced to surface answers directly.

Reddit also emphasizes its commercial value. The company says 40% of posts demonstrate purchase intent, making it a destination for people researching products and services.

Looking Ahead

As AI-generated content becomes more widespread, Reddit is betting that human perspectives will remain valuable.

The company expects Q3 revenue between $535 million and $545 million, with deeper integration of Reddit Answers planned as it continues to build out its search capabilities.


Featured Image: PJ McDonnell/Shutterstock

Bing Recommends lastmod Tags For AI Search Indexing via @sejournal, @MattGSouthern

Bing has updated its sitemap guidance with a renewed focus on the lastmod tag, highlighting its role in AI-powered search to determine which pages need to be recrawled.

While real-time tools like IndexNow offer faster updates, Bing says accurate lastmod values help keep content discoverable, especially on frequently updated or large-scale sites.

Bing Prioritizes lastmod For Recrawling

Bing says the lastmod field in your sitemap is a top signal for AI-driven indexing. It helps determine whether a page needs to be recrawled or can be skipped.

To make it work effectively, use ISO 8601 format with both date and time (e.g. 2004-10-01T18:23:17+00:00). That level of precision helps Bing prioritize crawl activity based on actual content changes.

Avoid setting lastmod to the time your sitemap was generated, unless the page was truly updated.

Bing also confirmed that changefreq and priority tags are ignored and no longer affect crawling or ranking.

Submission & Verification Tips

Bing recommends submitting your sitemap in one of two ways:

  • Reference it in your robots.txt file
  • Submit it via Bing Webmaster Tools

Once submitted, Bing fetches the sitemap immediately and rechecks it daily.

You can verify whether it’s working by checking the submission status, last read date, and any processing errors in Bing Webmaster Tools.

Combine With IndexNow For Better Coverage

To increase the chances of timely indexing, Bing suggests combining sitemaps with IndexNow.

While sitemaps give Bing a full picture of your site, IndexNow allows real-time URL-level updates—useful when content changes frequently.

The Bing team states:

“By combining sitemaps for comprehensive site coverage with IndexNow for fast, URL-level submission, you provide the strongest foundation for keeping your content fresh, discoverable, and visible.”

Sitemaps at Massive Scale

If you manage a large website, Bing’s sitemap capacity limits are worth your attention:

  • Up to 50,000 URLs per sitemap
  • 50,000 sitemaps per index file
  • 2.5 billion URLs per index
  • Multiple index files support indexing up to 2.5 trillion URLs

That makes the standard sitemap protocol scalable enough even for enterprise-level ecommerce or publishing platforms.

Fabrice Canel and Krishna Madhavan of Microsoft AI, Bing, noted that using these limits to their full extent helps ensure content remains discoverable in AI search.

Why This Matters

As search becomes more AI-driven, accurate crawl signals matter more.

Bing’s reliance on sitemaps, especially the lastmod field, shows that basic technical SEO practices still matter, even as AI reshapes how content is surfaced.

For large sites, Bing’s support for trillions of URLs offers scalability. For everyone else, the message is simpler: keep your sitemaps clean, accurate, and updated in real-time. This gives your content the best shot at visibility in AI search.


Featured Image: PJ McDonnell/Shutterstock