Building An In-House PPC Team: Why A Hybrid Model May Protect Your Ad Spend via @sejournal, @LisaRocksSEM

AI and automation in ad platforms are well established. Google Ads and Microsoft Advertising are heavily invested in automated features, and the technical barrier to entry has never been lower. However, that accessibility comes with a tradeoff.

Two common challenges surface when bringing a PPC team in-house:

  1. Campaigns are easier to launch than they are to explain and analyze.
  2. Machine-driven decisions risk going unquestioned without an outside perspective.

Those challenges point to something CMOs probably already know: Automation doesn’t eliminate the need for human judgment. It raises the requirements for it. Even with strong AI tools in place, experienced PPC practitioners are still writing strategy, creating ad copy, and manually updating targeting.

This article covers two structural paths for managing that reality.

  1. All in-house means your internal team manages PPC end-to-end, with no agency or external consultant involved.
  2. Hybrid means your internal team handles day-to-day execution and internal oversight while an external specialist or consultant provides strategy, auditing, and a second set of eyes.

Both models can work. The goal is to match machine automation with human accountability and independent performance checks. Without that structure, an in-house team can end up in a bubble where the ad platform’s suggestions dictate all of the optimization decisions.

Is Your Organization Ready? What To Assess Before You Hire

Before you post a job description, determine whether your company is ready to manage the technical work that comes with modern PPC search ads. Hiring an internal team is a long-term commitment.

The Shift In Daily Tasks

The role of the search marketer is shifting from manual campaign creation to evaluating and guiding automated systems. The human role is increasingly about checking what the AI creates and stepping in to do the work the ad platform can’t do well on its own.

That last part matters so much more than most job descriptions reflect. In my experience, AI-generated ad copy is often not platform-ready, and strategy still requires a human who understands the brand, the profit model, and the customer. If your candidates are only talking about managing manual bids and features, they may not be ready for the current landscape. You need people who can navigate automated systems and know when to override them.

Input And Data Quality

Because AI success depends on signal strength, an in-house PPC team’s value is directly tied to their ability to connect and maintain clean data. Ad platforms rely on:

  • Conversion tracking.
  • CRM integration.
  • Audience modeling.
  • Bidding inputs.

Tools such as Google Ads Data Manager (connecting external products inside Google Ads) and offline conversion uploads mean managing data should be a core responsibility of in-house PPC specialists.

Poorly configured conversion tracking or incomplete data signals can lead automated bidding to optimize toward low-value actions, if the data isn’t managed effectively in-house. You can’t expect a machine to give you good results if you’re feeding it bad information.

If You Are Hiring, Look For These Skills

If you’ve decided to build fully in-house, hiring criteria should shift toward business data management and the ability to work alongside AI without taking every single suggestion.

1. Understanding Business Margins

Most PPC managers haven’t had to think in depth about COGS (Cost of Goods Sold) or return rates, but that’s changing.

The bar is rising for in-house hires. A team that can connect ad spend to net profit, not just revenue, is far better positioned to make smart decisions as automation takes over the mechanical work.

2. Owning The Post-Click Experience

The PPC team must care about what happens after the user lands on the site. Creative quality and landing page performance are directly tied to conversions and what the algorithm learns over time.

AI-driven traffic efficiency can be thrown off by a poor landing page experience. Your internal hires should have a working knowledge of landing page testing and website user experience.

3. Ad Copy And Strategic Judgment

AI can generate ad copy, but it can create variations that are missing marketing strategy or brand-ready messaging. Your team needs to evaluate, rewrite, and at times reject what the ad platform produces.

The same applies to strategy. Automated systems optimize toward the goals you set, but setting the right goals and interpreting performance still require a skilled human. Hire for that judgment, not just ad platform knowledge.

4. Technical Data Strategy

Your team needs to know how to build and maintain first-party data connections, such as CRM data and customer match uploads.

Your team’s job is to ensure the right signals are flowing to the right campaigns at the right time. Technical data competency should be a core requirement for the job.

Why A Hybrid Model May Work Better

Even when hiring and data processes are going well, blind spots can happen inside fully internal teams. Three issues can show up:

  • Brand blindness from working primarily inside a single account.
  • Lack of independent auditing on spend and profit.
  • Difficulty pushing back on ad platform pressure.

An external perspective adds accountability that internal teams can have trouble providing for themselves. In an environment where so many features are automated, that accountability matters more because teams don’t tend to deep dive into the automations.

1. The Problem With Brand Blindness

Internal teams are focused on one brand. That focus builds deep expertise, but it can limit perspective. For example, when performance changes, it’s difficult to determine whether the change reflects a platform-wide trend, an industry shift, or a campaign-specific issue.

Working across many industries gives specialist consultants a reference point that internal teams may not have. They can tell you if a performance drop is happening to everyone in the industry or just to you.

2. The Need For Independent Auditing

An external partner acts as an independent auditor for your search spend. They can help confirm that internal goals line up with actual business profit rather than ad platform metrics.

It’s easy for internal teams to grow comfortable and focus on vanity metrics like ROAS (Return on Ad Spend). An objective third party can help show you exactly how much actual profit your search spend is generating.

3. Managing Ad Platform Pressure

Internal teams are the primary target for PPC ad platform representatives. These reps frequently push recommendations such that are auto-applied and display network serving that eat up budgets and prioritize the platform’s revenue over your business.

Independent experts are less likely to follow these suggestions without questioning them. They provide the pushback needed to ensure spend is justified by performance, not the platform’s optimization score.

Structuring The Partnership For Success

Consider a division of labor that draws on internal brand knowledge and external expertise. This hybrid approach offers the most protection for your ad spend.

What The In-House Team Should Own

  • Data Ownership: Managing the privacy and quality of your customer signals.
  • Creative Guidance: Ensuring brand voice stays consistent across AI-generated ads.
  • Ad Copy and Strategy: Writing, evaluating, and refining what the ad platform produces.
  • Sales Coordination: Connecting PPC spend with internal inventory levels and sales cycles.

What The External Specialist Should Own

  • Strategic Roadmap: Providing a long-term view of where the search industry is heading.
  • Advanced Analysis: Proving the true value of your spend through profit-based measurement.
  • Objective Auditing: Serving as an independent check against ad platform recommendations.

Successful PPC teams in an AI-first search environment won’t be worried about who automated the fastest. They’ll be more thoughtful and strategic about defining what the machine does and what a human approves.

Matching Structure To Accountability

The decision to go fully in-house or hybrid isn’t permanent. What matters is that your structure matches the level of accountability your ad spend requires.

If your team has clean data, strong hiring, and the ability to question what the ad platform suggests, a fully in-house model can work. But if no one is challenging the machine’s recommendations, you have a gap that’s hard to fix from the inside.

A hybrid model doesn’t mean your internal team isn’t capable. It means you’re building in a check that protects your budget from blind spots.

Whatever you choose, the people managing your PPC need to understand your business at the profit level, not just the platform level. Automation handles the mechanics. Your team handles the judgment.

More Resources:


Featured Image: ImageFlow/Shutterstock

Who Owns SEO In The Enterprise? The Accountability Gap That Kills Performance via @sejournal, @billhunt

Enterprise SEO doesn’t fail because teams don’t care, lack expertise, or miss tactics. It fails because ownership is fractured.

In most large organizations, everyone controls a piece of SEO, yet no single group owns the outcome. Visibility, traffic, and discoverability depend on dozens of upstream decisions made across engineering, content, product, UX, legal, and local markets. SEO is measured on the result, but it does not control the system that produces it.

In smaller organizations, this problem is manageable. SEO teams can directly influence content, technical decisions, and site structure. In the enterprise, that control dissolves. Incentives diverge. Workflows fragment. Coordination becomes optional.

SEO success requires alignment, but enterprise structures reward isolation. That mismatch creates what I call the accountability gap – the silent failure mode behind most large-scale SEO underperformance.

SEO Is Measured By The Team That Doesn’t Control It

SEO is the only business function I am aware of that, judged by performance, cannot be delivered independently. This is especially true in the enterprise, where SEO performance is evaluated using familiar metrics: visibility, traffic, engagement, and increasingly AI-driven exposure. The irony is that the SEO function rarely controls the systems that generate those outcomes.

Function Controls SEO Dependency
Development Templates, rendering, performance Crawlability, indexability, structured data
Content Teams Messaging, depth, updates Relevance, coverage, AI eligibility
Product Teams Taxonomy, categorization, naming Entity clarity, internal structure
UX & Design Navigation, layout, hierarchy Discoverability, user engagement
Legal & Compliance Claims, restrictions Content completeness & trust signals
Local Markets Localization & regional content Cross-market consistency & intent alignment

SEO depends on all of these departments to do their job in an SEO-friendly manner for it to have a remote chance of success. This makes SEO unusual among business functions. It is judged by performance, yet it cannot deliver that performance independently. And because SEO typically sits downstream in the organization, it must request changes rather than direct them.

That structural imbalance is not a process issue. It is an ownership problem.

The Accountability Gap Explained

The accountability gap appears whenever a business-critical outcome depends on multiple teams, but no single team is accountable for the result.

SEO is a textbook example as fundamental search success requires development to implement correctly, content to align with demand, product teams to structure information coherently, markets to maintain consistency, and legal to permit eligibility-supporting claims. Failure occurs when even one link breaks.

Inside the enterprise, each of those teams is measured on its own key performance indicators. Development is rewarded for shipping. Content is rewarded for brand alignment. Product is rewarded for features. Legal is rewarded for risk avoidance. Markets are rewarded for local revenue. SEO lives in the cracks between them.

No one is incentivized to fix a problem that primarily benefits another department’s metrics. So issues persist, not because they are invisible, but because resolving them offers no local reward.

KPI Structures Encourage Metric Shielding

This is where enterprise SEO collides head-on with organizational design.

In practice, resistance to SEO rarely looks like resistance. No one says, “We don’t care about search.” Instead, objections arrive wrapped in perfectly reasonable justifications, each grounded in a different team’s success metrics.

Engineering teams explain that template changes would disrupt sprint commitments. Localization teams point to budgets that were never allocated for rewriting content. Product teams note that naming decisions are locked for brand consistency. Legal teams flag risk exposure in expanded explanations. And once something has launched, the implicit assumption is that SEO can address any fallout afterward.

Each of these responses makes sense on its own. None are malicious. But together, they form a pattern where protecting local KPIs takes precedence over shared outcomes.

This is what I refer to as metric shielding: the quiet use of internal performance measures to avoid cross-functional work. It’s not a refusal to help; it’s a rational response to how teams are evaluated. Fixing an SEO issue rarely improves the metric a given department is rewarded for, even if it materially improves enterprise visibility.

Over time, this behavior compounds. Problems persist not because they are unsolvable, but because solving them benefits someone else’s scorecard. SEO becomes the connective tissue between teams, yet no one is incentivized to strengthen it.

This dynamic is part of a broader organizational failure mode I call the KPI trap, where teams optimize for local success while undermining shared results. In enterprise SEO, the consequences surface quickly and visibly. In other parts of the organization, the damage often stays hidden until performance breaks somewhere far downstream.

The Myth: “SEO Is Marketing’s Job”

To simplify ownership, enterprises often default to a convenient fiction: SEO belongs to marketing.

On the surface, that assumption feels logical. SEO is commonly associated with organic traffic, and organic traffic is typically tracked as a marketing KPI. When visibility is measured in visits, conversions, or demand generation, it’s easy to conclude that SEO is simply another marketing lever.

In practice, that logic collapses almost immediately. Marketing may influence messaging and campaigns, but it does not control the systems that determine discoverability. It does not own templates, rendering logic, taxonomy, structured data pipelines, localization standards, release timing, or engineering priorities. Those decisions live elsewhere, often far upstream from where SEO performance is measured.

As a result, marketing ends up owning SEO on the organizational chart, while other teams own SEO in reality. This creates a familiar enterprise paradox. One group is held accountable for outcomes, while other groups control the inputs that shape those outcomes. Accountability without authority is not ownership. It is a guaranteed failure pattern.

The Core Reality

At its core, enterprise SEO failures are rarely tactical. They are structural, driven by accountability without authority across systems SEO does not control.

Search performance is created upstream through platform decisions, information architecture, content governance, and release processes. Yet SEO is almost always measured downstream, after those decisions are already locked. That separation creates the accountability gap.

SEO becomes responsible for outcomes shaped by systems it doesn’t control, priorities it can’t override, and tradeoffs it isn’t empowered to resolve. When success requires multiple departments to change, and no one owns the outcome, performance stalls by design.

Why This Breaks Faster In AI Search

In traditional SEO, the accountability gap usually expressed itself as volatility. Rankings moved. Traffic dipped. Teams debated causes, made adjustments, and over time, many issues could be corrected. Search engines recalculated signals, pages were reindexed, and recovery, while frustrating, was often possible. AI-driven search behaves differently because the evaluation model has changed.

AI systems are not simply ranking pages against each other. They are deciding which sources are eligible to be retrieved, synthesized, and represented at all. That decision depends on whether the system can form a coherent, trustworthy understanding of a brand across structure, entities, relationships, and coverage. Those signals must align across platforms, templates, content, and governance.

This is where the accountability gap becomes fatal. When even one department blocks or weakens those elements – by fragmenting entities, constraining content, breaking templates, or enforcing inconsistent standards – the system doesn’t partially reward the brand. It fails to form a stable representation. And when representation fails, exclusion follows. Visibility doesn’t gradually decline. It disappears.

AI systems default to sources that are structurally coherent and consistently reinforced. Competitors with cleaner governance and clearer ownership become the reference point, even if their content is not objectively better. Once those narratives are established, they persist. AI systems are far less forgiving than traditional rankings, and far slower to revise once an interpretation hardens.

This is why the accountability gap now manifests as a visibility gap. What used to be recoverable through iteration is now lost through omission. And the longer ownership remains fragmented, the harder that loss is to reverse.

A Note On GEO, AIO, And The Labeling Distraction

Much of the current conversation reframes these challenges under new labels GEO, AIO, AI SEO, generative optimization. The terminology isn’t wrong. It’s just incomplete.

These labels describe where visibility appears, not why it succeeds or fails. Whether the surface is a ranking, an AI Overview, or a synthesized answer, the underlying requirements remain unchanged: structural clarity, entity consistency, governed content, trustworthy signals, and cross-functional execution.

Renaming the outcome does not change the operating model required to achieve it.

Organizations don’t fail in AI search because they picked the wrong acronym. They fail because the same accountability gap persists, with faster and less forgiving consequences.

The Enterprise SEO Ownership Paradox

At its core, enterprise SEO operates under a paradox that most organizations never explicitly confront.

SEO is inherently cross-functional. Its performance depends on systems, processes, platforms, and decisions that span development, content, product, legal, localization, and governance. It behaves like infrastructure, not a channel. And yet, it is still managed as if it were a marketing function, a reporting line, or a service desk that reacts to requests.

That mismatch explains why even well-funded SEO teams struggle. They are held responsible for outcomes created by systems they do not control, processes they cannot enforce, and decisions they are rarely empowered to shape.

This paradox stays abstract until it’s reduced to a single, uncomfortable question:

Who is accountable when SEO success requires coordinated changes across three departments?

In most enterprises, the honest answer is simple. No one.

And when no one owns cross-functional success, initiatives stall by design. SEO becomes everyone’s dependency and no one’s priority. Work continues, meetings multiply, and reports are produced – but the underlying system never changes.

That is not a failure of execution. It is a failure of ownership.

What Real Ownership Looks Like

Organizations that win redefine SEO ownership as an operational capability, not a departmental role.

They establish executive sponsorship for search visibility, shared accountability across development, content, and product, and mandatory requirements embedded into platforms and workflows. Governance replaces persuasion. Standards are enforced before launch, not debated afterward.

SEO shifts from requesting fixes to defining requirements teams must follow. Ownership becomes structural, not symbolic.

The Final Reality

This perspective isn’t theoretical. It’s grounded in my nearly 30 years of direct experience designing, repairing, and operating enterprise website search programs across large organizations, regulated industries, complex platforms, and multi-market deployments.

I’ve sat in escalation meetings where launches were declared successful internally, only for visibility to quietly erode once systems and signals reached the outside world. I’ve watched SEO teams inherit outcomes created months earlier by decisions they were never part of. And more recently, I’ve worked with leadership teams who didn’t realize they had a search problem until AI-driven systems stopped citing them altogether. These are not edge cases. They are repeatable organizational failure modes.

What ultimately separated failure from recovery was never better tactics, better tools, or better acronyms. It was ownership. Specifically, whether the organization recognized search as a shared system-level responsibility and structured itself accordingly.

Enterprise SEO doesn’t break because teams aren’t trying hard enough. It breaks when accountability is assigned without authority, and when no one owns the outcomes that require coordination across the organization.

That is the problem modern search exposes. And ownership is the only durable fix.

Coming Next

The Modern SEO Center Of Excellence: Governance, Not Guidelines

We’ll close the loop by showing how enterprises institutionalize ownership through a Center of Excellence that governs standards, enforcement, entity governance, and cross-market consistency, the missing layer that prevents the accountability gap from recurring.

More Resources:


Featured Image: ImageFlow/Shutterstock

Google Answers Why Core Updates Can Roll Out In Stages via @sejournal, @martinibuster

Google’s John Mueller responded to a question about whether core updates roll out in stages or follow a fixed sequence. His answer offers some clarity about how core updates are rolled out and also about what some core updates actually are.

Question About Core Update Timing And Volatility

An SEO asked on Bluesky whether core updates behave like a single rollout that is then refined over time or if the different parts being updated are rolled out at different stages.

The question reflects a common observation that rankings tend to shift in waves during a rollout period, often lasting several weeks. This has led to speculation that updates may be deployed incrementally rather than all at once.

They asked:

“Given the timing, I want to ask a core update related question. Usually, we see waves of volatility throughout the 2-3 weeks of a rollout. Broadly, are different parts of core updated at different times? Or is it all reset at the beginning then iterated depending on the results?”

Core Updates Can Require Step-By-Step Deployment

Mueller explained that Google does not formally define or announce stages for core updates. He noted that these updates involve broad changes across multiple systems, which can require a step-by-step rollout rather than a single deployment.

He responded:

“We generally don’t announce “stages” of core updates.. Since these are significant, broad changes to our search algorithms and systems, sometimes they have to work step-by-step, rather than all at one time. (It’s also why they can take a while to be fully live.)”

Updates Depend On Systems And Teams Involved

Mueller next added that there is no single mechanism that governs how all core updates are released. Instead, updates reflect the work of different teams and systems, which can vary from one update to another.

He explained:

“I guess in short there’s not a single “core update machine” that’s clicked on (every update has the same flow), but rather we make the changes based on what the teams have been working on, and those systems & components can change from time to time.”

Core Updates May Roll Out Incrementally Rather Than All At Once

Mueller’s explanation suggests that the waves of volatility observed during core updates may correspond to incremental changes across different systems rather than a single reset followed by adjustments. Because updates are tied to multiple components, the rollout may progress in parts as those systems are updated and brought fully live.

This reflects a process where some changes are complex and require a more nuanced step-by-step rollout, rather than being released all at once, which may explain why ranking shifts can appear uneven during the rollout period.

Connection To Google’s Spam Update?

I don’t think that it was a coincidence that the March Core update followed closely after the recent March 2026 Spam Update. The reason I think that is because it’s logical for spam fighting to be a part of the bundle of changes made in a core algorithm update. That’s why Googlers sometimes say that a core update should surface more relevant content and less of the content that’s low quality.

So when Google announces a Spam Update, that stands out because either Google is making a major change to the infrastructure that Google’s core algorithm runs on or the spam update is meant to weed out specific forms of spam prior to rolling out a core algorithm update, to clear the table, so to speak. And that is what appears to have happened with the recent spam and core algorithm updates.

Comparison With Early Google Updates

Way back in the early days, around 25 years ago, Google used to have an update every month, offering a chance to see if new pages are indexed and ranked as well as seeing how existing pages are doing. The initial first days of the update saw widescale fluctuations which we (the members of WebmasterWorld forum) called the Google Dance.

Back then, it felt like updates were just Google adding more pages and re-ranking them. Then around the 2003 Florida update it became apparent that the actual ranking systems were being changed and the fluctuations could go on for months. That was probably the first time the SEO community noticed a different kind of update that was probably closer a core algorithm update.

In my opinion, one way to think of it is that Google’s indexing and ranking algorithms are like software. And then, there’s also hardware and software that are a part of the infrastructure that the indexing and ranking algorithms run on (like the operating system and hardware of your desktop or laptop).

That’s an oversimplification but it’s useful to me for visualizing what a core algorithm update might be. Most, if not all of it, is related to the indexing and ranking part. But I think sometimes there’s infrastructure-type changes going on that improve the indexing and ranking part.

Featured Image by Shutterstock/A9 STUDIO

AI benchmarks are broken. Here’s what we need instead.

For decades, artificial intelligence has been evaluated through the question of whether machines outperform humans. From chess to advanced math, from coding to essay writing, the performance of AI models and applications is tested against that of individual humans completing tasks. 

This framing is seductive: An AI vs. human comparison on isolated problems with clear right or wrong answers is easy to standardize, compare, and optimize. It generates rankings and headlines. 

But there’s a problem: AI is almost never used in the way it is benchmarked. Although   researchers and industry have started to improve benchmarking by moving beyond static tests to more dynamic evaluation methods, these  innovations resolve only part of the issue. That’s because they still evaluate AI’s performance outside the human teams and organizational workflows where its real-world performance ultimately unfolds. 

While AI is evaluated at the task level in a vacuum, it is used in messy, complex environments where it usually interacts with more than one person. Its performance (or lack thereof) emerges only over extended periods of use. This misalignment leaves us misunderstanding AI’s capabilities, overlooking systemic risks, and misjudging its economic and social consequences.

To mitigate this, it’s time to shift from narrow methods to benchmarks that assess how AI systems perform over longer time horizons within human teams, workflows, and organizations. I have studied real-world AI deployment since 2022 in small businesses and health, humanitarian, nonprofit, and higher-education organizations in the UK, the United States, and Asia, as well as within leading AI design ecosystems in London and Silicon Valley. I propose a different approach, which I call HAIC benchmarksHuman–AI, Context-Specific Evaluation.

What happens when AI fails 

For governments and businesses, AI benchmark scores appear more objective than vendor claims. They’re a critical part of determining whether an AI model or application is “good enough” for real-world deployment. Imagine an AI model that achieves impressive technical scores on the most cutting-edge benchmarks—98% accuracy, groundbreaking speed, compelling outputs. On the strength of these results, organizations may decide to adopt the model, committing sizable financial and technical resources to purchasing and integrating it. 

But then, once it’s adopted, the gap between benchmark and real-world performance quickly becomes visible. For example, take the swathe of FDA-approved AI models that can read medical scans faster and more accurately than an expert radiologist. In the radiology units of hospitals from the heart of California to the outskirts of London, I witnessed staff using highly ranked radiology AI applications. Repeatedly, it took them extra time to interpret AI’s outputs alongside hospital-specific reporting standards and nation-specific regulatory requirements. What appeared as a productivity-enhancing AI tool when tested in a vacuum introduced delays in practice. 

It soon became clear that the benchmark tests on which medical AI models are assessed do not capture how medical decisions are actually made. Hospitals rely on multidisciplinary teams—radiologists, oncologists, physicists, nurses—who jointly review patients. Treatment planning rarely hinges on a static decision; it evolves as new information emerges over days or weeks. Decisions often arise through constructive debate and trade-offs between professional standards, patient preferences, and the shared goal of long-term patient well-being. No wonder even highly scored AI models struggle to deliver the promised performance once they encounter the complex, collaborative processes of real clinical care.

The same pattern emerges in my research across other sectors: When embedded within real-world work environments, even AI models that perform brilliantly on standardized tests don’t perform as promised. 

When high benchmark scores fail to translate into real-world performance, even the most highly scored AI is soon abandoned to what I call the “AI graveyard.” The costs are significant: Time, effort and money end up being wasted. And over time, repeated experiences like this erode organizational confidence in AI and—in critical settings such as health—may erode broader public trust in the technology as well. 

When current benchmarks provide only a partial and potentially misleading signal of an AI model’s readiness for real-world use, this creates regulatory blind spots: Oversight is shaped by metrics that do not reflect reality. It also leaves organizations and governments to shoulder the risks of testing AI in sensitive real-world settings, often with limited resources and support. 

How to build better tests 

To close the gap between benchmark and real-world performance, we must pay attention to the actual conditions in which AI models will be used. The critical questions: Can AI function as a productive participant within human teams? And can it generate sustained, collective value? 

Through my research on AI deployment across multiple sectors, I have seen a number of organizations already moving—deliberately and experimentally—toward the HAIC benchmarks I favor. 

HAIC benchmarks reframe current benchmarking in four ways: 

1.     From individual and single-task performance to team and workflow performance (shifting the unit of analysis)

2.     From one-off testing with right/wrong answers to long-term impacts (expanding the time horizon)

3.     From correctness and speed to organizational outcomes, coordination quality, and error detectability (expanding outcome measures)

4.     From isolated outputs to upstream and downstream consequences (system effects)

Across the organizations where this approach has emerged and started to be applied, the first step is shifting the unit of analysis. 

For example, in one UK hospital system in the period 2021–2024, the question expanded from whether a medical AI application improves diagnostic accuracy to how the presence of AI within the hospital’s multidisciplinary teams affects not only accuracy but also coordination and deliberation. The hospital specifically assessed coordination and deliberation in human teams using and not using AI. Multiple stakeholders (within and outside the hospital) decided on metrics like how AI influences collective reasoning, whether it surfaces overlooked considerations, whether it strengthens or weakens coordination, and whether it changes established risk and compliance practices. 

This shift is fundamental. It matters a lot in high-stakes contexts where system-level effects matter more than task-level accuracy. It also matters for the economy. It may help recalibrate inflated expectations of sweeping productivity gains that are so far predicated largely on the promise of improving individual task performance. 

Once that foundation is set, HAIC benchmarking can begin to take on the element of time. 

Today’s benchmarks resemble school exams—one-off, standardized tests of accuracy. But real professional competence is assessed differently. Junior doctors and lawyers are evaluated continuously inside real workflows, under supervision, with feedback loops and accountability structures. Performance is judged over time and in a specific context, because competence is relational. If AI systems are meant to operate alongside professionals, their impact should be judged longitudinally, reflecting how performance unfolds over repeated interactions. 

I saw this aspect of HAIC applied in one of my humanitarian-sector case studies. Over 18 months, an AI system was evaluated within real workflows, with particular attention to how detectable its errors were—that is, how easily human teams could identify and correct them. This long-term “record of error detectability” meant the organizations involved could design and test context-specific guardrails to promote trust in the system, despite the inevitability of occasional AI mistakes.

A longer time horizon also makes visible the system-level consequences that short-term benchmarks miss. An AI application may outperform a single doctor on a narrow diagnostic task yet fail to improve multidisciplinary decision-making. Worse, it may introduce systemic distortions: anchoring teams too early in plausible but incomplete answers, adding to people’s  cognitive workloads, or generating downstream inefficiencies that offset any speed or efficiency gains at the point of the AI’s use. These knock-on effects—often invisible to current benchmarks—are central to understanding real impact. 

The HAIC approach, admittedly promises to make benchmarking more complex, resource-intensive, and harder to standardize. But continuing to evaluate AI in sanitized conditions detached from the world of work will leave us misunderstanding what it truly can and cannot do for us. To deploy AI responsibly in real-world settings, we must measure what actually matters: not just what a model can do alone, but what it enables—or undermines—when humans and teams in the real world work with it.

 Angela Aristidou is a professor at University College London and a faculty fellow at the Stanford Digital Economy Lab and the Stanford Human-Centered AI Institute. She speaks, writes, and advises about the real-life deployment of artificial-intelligence tools for public good.

The Download: AI health tools and the Pentagon’s Anthropic culture war

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

There are more AI health tools than ever—but how well do they work? 

In the last few months alone, Microsoft, Amazon, and OpenAI have all launched medical chatbots. 

There’s a clear demand for these tools, given how hard it is for many people to access advice through the existing medical system—and they could make safe and useful recommendations. But concerns have surfaced about how little external evaluation they undergo before being released to the public.  

Read the full story to understand what’s at stake

—Grace Huckins 

The Pentagon’s culture war tactic against Anthropic has backfired 

A judge has temporarily blocked the Pentagon from labeling Anthropic a supply chain risk and ordering government agencies to stop using its AI. Her intervention suggests that the feud never needed to reach such a frenzy. 

It did so because the government disregarded the existing process for such disputes—and fueled the fire on social media. Find out how it happened and what comes next

—James O’Donnell 

This story is from The Algorithm, our weekly newsletter giving you the inside track on all things AI. Sign up to receive it in your inbox every Monday. 

The must-reads 

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 

1 California has defied Trump to impose new AI regulations 
Governor Newsom signed off on the new standards yesterday.  (Guardian
+ Firms seeking state contracts will need extra safeguards. (Reuters $) 
+ States are installing guardrails despite Trump’s order to stop. (NYT $)  
+ An AI regulation war is brewing in the US. (MIT Technology Review)  

2 Experiments have verified quantum simulations for the first time 
It’s a breakthrough for quantum computing applications. (Nature
+ Which could one day help solve healthcare problems. (MIT Technology Review

3 The new White House app is a security and privacy nightmare 
It extensively tracks users and relies on external code. (Gizmodo
+ The new app promises “unparalleled access” to Trump. (CNET
+ It also invites users to report people to ICE. (The Verge

4 Big Tech’s $635 billion AI spending faces an energy shock test 
The Middle East crisis is clouding prospects for growth. (Reuters $) 
+ Here are three big unknowns about AI’s energy burden. (MIT Technology Review

5 Meta and Google have been accused of breaking child safety rules 
Australia suspects they flouted a social media ban. (Bloomberg $) 
+ Indonesia is also investigating non-compliance. (Reuters $) 

6 Nebius is building a $10 billion AI data center in Finland 
The company is rapidly expanding Europe’s AI infrastructure. (CNBC

7 South Korea’s chipmakers’ helium stocks will last until June 
Beyond that? Who knows. (Reuters $) 
+ Shortages caused by the Iran war threaten the chip industry. (NYT $)  

8 Another Starlink satellite has inexplicably exploded  
SpaceX suffered a similar episode in December. (The Verge
+ We went inside Ukraine’s largest Starlink repair shop. (MIT Technology Review

9 Bluesky’s new AI tool is already its most blocked account—after JD Vance 
About 83 times as many users have blocked it as have followed it. (TechCrunch

10 An AI agent banned from Wikipedia has lashed out in angry blogs 
The bot accused its human editors of “uncivil behavior.” (404 Media)  

Quote of the day 

“Is any of this illegal? Probably not. Is it what you’d expect from an official government app? Probably not either.” 

—Security researcher Thereallo reviews the White House’s new app.

One More Thing 

CHANTAL JAHCHAN

Inside Amsterdam’s high-stakes experiment to create fair welfare AI 

When Hans de Zwart, a digital rights advocate, saw Amsterdam’s plan to have an algorithm evaluate every welfare applicant for potential fraud, he nearly fell out of his chair. He believed the system had “unfixable problems.”  

Meanwhile, Paul de Koning, a consultant to the city, was excited. He saw immense potential to improve efficiencies and remove biases. 

These opposing viewpoints epitomize a global debate about whether algorithms can ever make fair decisions that shape people’s lives. Read the full story.  

—Eileen Guo, Gabriel Geiger, and Justin-Casimir Braun 

We can still have nice things 

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.) 

+ A newly authenticated Rembrandt had been hiding in plain sight for years. 
+ This debunking of guitar legends is musical enlightenment for strummers. 
Smoking into bubbles looks oddly satisfying. 
+ The man who made the front page twice exposes the thin line between heroes and villains. 

Shifting to AI model customization is an architectural imperative

In the early days of large language models (LLMs), we grew accustomed to massive 10x jumps in reasoning and coding capability with every new model iteration. Today, those jumps have flattened into incremental gains. The exception is domain-specialized intelligence, where true step-function improvements are still the norm.

When a model is fused with an organization’s proprietary data and internal logic, it encodes the company’s history into its future workflows. This alignment creates a compounding advantage: a competitive moat built on a model that understands the business intimately. This is more than fine-tuning; it is the institutionalization of expertise into an AI system. This is the power of customization.

Intelligence tuned to context

Every sector operates within its own specific lexicon. In automotive engineering, the “language” of the firm revolves around tolerance stacks, validation cycles, and revision control. In capital markets, reasoning is dictated by risk-weighted assets and liquidity buffers. In security operations, patterns are extracted from the noise of telemetry signals and identity anomalies.

Custom-adapted models internalize the nuances of the field. They recognize which variables dictate a “go/no-go” decision, and they think in the language of the industry.

Domain expertise in action

The transition from general-purpose to tailored AI centers on one goal: encoding an organization’s unique logic directly into a model’s weights.

Mistral AI partners with organizations to incorporate domain expertise into their training ecosystems. A few use cases illustrate customized implementations in practice:

Software engineering and assisting at scale: A network hardware company with proprietary languages and specialized codebases found that out-of-the-box models could not grasp their internal stack. By training a custom model on their own development patterns, they achieved a step function in fluency. Integrated into Mistral’s software development scaffolding, this customized model now supports the entire lifecycle—from maintaining legacy systems to autonomous code modernization via reinforcement learning. This turns once-opaque, niche code into a space where AI reliably assists at scale.

Automotive and the engineering copilot: A leading automotive company uses customization to revolutionize crash test simulations. Previously, specialists spent entire days manually comparing digital simulations with physical results to find divergences. By training a model on proprietary simulation data and internal analyses, they automated this visual inspection, flagging deformations in real time. Moving beyond detection, the model now acts as a copilot, proposing design adjustments to bring simulations closer to real-world behavior and radically accelerating the R&D loop.

Public sector and sovereign AI: In Southeast Asia, a government agency is building a sovereign AI layer to move beyond Western-centric models. By commissioning a foundation model tailored to regional languages, local idioms, and cultural contexts, they created a strategic infrastructure asset. This ensures sensitive data remains under local governance while powering inclusive citizen services and regulatory assistants. Here, customization is the key to deploying AI that is both technically effective and genuinely sovereign.

The blueprint for strategic customization

Moving from a general-purpose AI strategy to a domain-specific advantage requires a structural rethinking of the model’s role within the enterprise. Success is defined by three shifts in organizational logic.

1. Treat AI as infrastructure, not an experiment.  Historically, enterprises have treated model customization as an ad hoc experiment—a single fine-tuning run for a niche use case or a localized pilot. While these bespoke silos often yield promising results, they are rarely built to scale. They produce brittle pipelines, improvised governance, and limited portability. When the underlying base models evolve, the adaptation work must often be discarded and rebuilt from scratch.

In contrast, a durable strategy treats customization as foundational infrastructure. In this model, adaptation workflows are reproducible, version-controlled, and engineered for production. Success is measured against deterministic business outcomes. By decoupling the customization logic from the underlying model, firms ensure that their “digital nervous system” remains resilient, even as the frontier of base models shifts.

    2. Retain control of your own data and models. As AI migrates from the periphery to core operations, the question of control becomes existential. Reliance on a single cloud provider or vendor for model alignment creates a dangerous asymmetry of power regarding data residency, pricing, and architectural updates.

    Enterprises that retain control of their training pipelines and deployment environments preserve their strategic agency. By adapting models within controlled environments, organizations can enforce their own data residency requirements and dictate their own update cycles. This approach transforms AI from a service consumed into an asset governed, reducing structural dependency and allowing for cost and energy optimizations aligned with internal priorities rather than vendor roadmaps.

    3. Design for continuous adaptation. The enterprise environment is never static: regulations shift, taxonomies evolve, and market conditions fluctuate. A common failure is treating a customized model as a finished artifact. In reality, a domain-aligned model is a living asset subject to model decay if left unmanaged.

    Designing for continuous adaptation requires a disciplined approach to ModelOps. This includes automated drift detection, event-driven retraining, and incremental updates. By building the capacity for constant recalibration, the organization ensures that its AI does not just reflect its history, but it evolves in lockstep with its future. This is the stage where the competitive moat begins to compound: the model’s utility grows as it internalizes the organization’s ongoing response to change.

    Control is the new leverage

    We have entered an era where generic intelligence is a commodity, but contextual intelligence is a scarcity. While raw model power is now a baseline requirement, the true differentiator is alignment—AI calibrated to an organization’s unique data, mandates, and decision logic.

    In the next decade, the most valuable AI won’t be the one that knows everything about the world; it will be the one that knows everything about you. The firms that own the model weights of that intelligence will own the market.

    This content was produced by Mistral AI. It was not written by MIT Technology Review’s editorial staff.

    Smarter Ecommerce Delivery in Africa

    Delivery is the primary barrier to completing an ecommerce sale in Africa. I’ve described the challenges posed by landmark-based addresses and the consumer preference to pay on receipt.

    The result is high return rates across Nigeria, Kenya, and other large markets, where a refused package means the merchant pays for logistics twice for zero revenue.

    However, the industry is moving from managing these failures to adopting a tech stack to avoid them.

    Data-based Delivery

    Rather than relying on a driver’s local knowledge, a growing cohort of venture-backed fulfillment companies, such as Gig LogisticsLoop, and Faramove, leverage data to predict the likelihood of a successful delivery before an order is dispatched.

    Address verification. To solve the “landmark address” problem, logistics firms are integrating tools such as OkHi’s AI-powered verification. It allows customers to verify their location at checkout via a global positioning system, a GPS. Merchants can flag non-verified addresses as high risk.

    Screenshot of OkHit's verification interface

    OkHi’s API enables customers to verify their location at checkout. Image: OkHi

    Risk scoring. Logistics engines now plug into APIs such as VerifyMe’s QoreID. The tool provides a confidence score based on location data and past delivery behavior. A phone number with a history of refusing orders is high risk.

    Automated WhatsApp flows. Flagged orders trigger automated communication to the customers via WhatsApp from APIs on platforms such as Termii (Nigeria) or Talksasa (Kenya). The system automatically redirects orders to a local pickup point for customers who do not respond.

    The success of these tactics is evident in Jumia’s financial statements. According to its February 2026 report, the dominant African marketplace lowered its 2025 fulfillment expense per order by 12% year-over-year to $1.97, largely by shifting much of its delivery volume to PUDO (Pick Up, Drop Off) locations. This strategy is anchored by its JForce network of over 40,000 local consultants who act as trusted pickup points, bypassing high-risk doorstep delivery in congested capitals.

    Investments in Logistics

    In February 2026, funding for logistics and transport startups in Africa ($119.6 million) surpassed fintech ($54.1 million). Logistics infrastructure is becoming a competitive difference maker.

    Warehouses in East Africa

    On March 11, 2026, Africa Logistics Properties listed the region’s first real estate investment trust on the Nairobi Securities Exchange. The U.K. government committed $24 million to the listing through its MOBILIST program for sustainable development in emerging markets.

    According to the Exchange’s CEO Frank Mwiti during the bell-ringing ceremony, “The debut of the dollar-denominated Industrial I-REIT is a historic milestone for our market. We are providing investors with a seamless gateway to Africa’s industrial logistics sector, combining hard currency stability with regional growth potential.”

    Automation in North Africa

    In January 2026, Egypt-based carrier Bosta launched a large automated sorting center in Cairo, the largest such facility in the Middle East. Capable of processing 11,000 parcels per hour, the center aims to reduce manual errors as Bosta prepares to handle 80 million parcels this year.

    “This sorting machine alone required an investment of $5 million,” said Bosta CEO Mohamed Ezzat. “It directly contributes to improving delivery speed and operational accuracy.”

    Lockers in Southern Africa

    In South Africa, carriers are transitioning the last mile to 24/7 automated lockers. Ship-and-collect provider Pargo leads with over 4,000 points, followed by The Courier Guy’s network of 1,100 lockers as of March 2026. Merchants are seeing a significant reduction in “theft-related” losses and failed doorstep attempts.

    Digital Trade

    The African Continental Free Trade Area (AfCFTA) is a 2018 agreement among 55 member countries establishing the world’s largest free trade zone. AfCFTA’s Digital Trade Protocol provides rules for data protection and cross-border digital payments. It mandates that African governments recognize electronic trade documents as legally equivalent to paper, enabling merchants to insure and track goods across borders with legal certainty.

    A major catalyst for this system is the integration of Kenya’s Pesalink, an instant payment network, with the Pan-African Payment and Settlement System (PAPSS). More than 80 Kenyan financial institutions now sync with 160-plus banks across Africa.

    This integration, for example, allows a merchant in Nigeria to settle logistics fees in Naira for a delivery in Kenya instantly, removing a primary barrier to intra-African trade.

    Foreign Merchants

    For merchants looking to sell goods in Africa:

    • Connect to (i) a logistics-as-a-service (Gig Logistics, Loop, Faramove) that offers real-time delivery probability and (ii) an address verification provider (OkHi, QoreID) to flag risk early.
    • Incentivize PUDO options at checkout, a successful tactic for Jumia.
    • Use PAPSS-integrated channels for cross-border settlement to preserve margins.
    Google Explains Googlebot Byte Limits And Crawling Architecture via @sejournal, @MattGSouthern

    Google’s Gary Illyes published a blog post explaining how Googlebot’s crawling systems work. The post covers byte limits, partial fetching behavior, and how Google’s crawling infrastructure is organized.

    The post references episode 105 of the Search Off the Record podcast, where Illyes and Martin Splitt discussed the same topics. Illyes adds more details about crawling architecture and byte-level behavior.

    What’s New

    Googlebot Is One Client Of A Shared Platform

    Illyes describes Googlebot as “just a user of something that resembles a centralized crawling platform.”

    Google Shopping, AdSense, and other products all send their crawl requests through the same system under different crawler names. Each client sets its own configuration, including user agent string, robots.txt tokens, and byte limits.

    When Googlebot appears in server logs, that’s Google Search. Other clients appear under their own crawler names, which Google lists on its crawler documentation site.

    How The 2 MB Limit Works In Practice

    Googlebot fetches up to 2 MB for any URL, excluding PDFs. PDFs get a 64 MB limit. Crawlers that don’t specify a limit default to 15 MB.

    Illyes adds several details about what happens at the byte level.

    He says HTTP request headers count toward the 2 MB limit. When a page exceeds 2 MB, Googlebot doesn’t reject it. The crawler stops at the cutoff and sends the truncated content to Google’s indexing systems and the Web Rendering Service (WRS).

    Those systems treat the truncated file as if it were complete. Anything past 2 MB is never fetched, rendered, or indexed.

    Every external resource referenced in the HTML, such as CSS and JavaScript files, gets fetched with its own separate byte counter. Those files don’t count toward the parent page’s 2 MB. Media files, fonts, and what Google calls “a few exotic files” are not fetched by WRS.

    Rendering After The Fetch

    The WRS processes JavaScript and executes client-side code to understand a page’s content and structure. It pulls in JavaScript, CSS, and XHR requests but doesn’t request images or videos.

    Illyes also notes that the WRS operates statelessly, clearing local storage and session data between requests. Google’s JavaScript troubleshooting documentation covers implications for JavaScript-dependent sites.

    Best Practices For Staying Under The Limit

    Google recommends moving heavy CSS and JavaScript to external files, since those get their own byte limits. Meta tags, title tags, link elements, canonicals, and structured data should appear higher in the HTML. On large pages, content placed lower in the document risks falling below the cutoff.

    Illyes flags inline base64 images, large blocks of inline CSS or JavaScript, and oversized menus as examples of what could push pages past 2 MB.

    The 2 MB limit “is not set in stone and may change over time as the web evolves and HTML pages grow in size.”

    Why This Matters

    The 2 MB limit and the 64 MB PDF limit were first documented as Googlebot-specific figures in February. HTTP Archive data showed most pages fall well below the threshold. This blog post adds the technical context behind those numbers.

    The platform description explains why different Google crawlers behave differently in server logs and why the 15 MB default differs from Googlebot’s 2 MB limit. These are separate settings for different clients.

    HTTP header details matter for pages near the limit. Google states headers consume part of the 2 MB limit alongside HTML data. Most sites won’t be affected, but pages with large headers and bloated markup might hit the limit sooner.

    Looking Ahead

    Google has now covered Googlebot’s crawl limits in documentation updates, a podcast episode, and a dedicated blog post within a two-month span. Illyes’ note that the limit may change over time suggests these figures aren’t permanent.

    For sites with standard HTML pages, the 2 MB limit isn’t a concern. Pages with heavy inline content, embedded data, or oversized navigation should verify that their critical content is within the first 2 MB of the response.


    Featured Image: Sergei Elagin/Shutterstock

    The Science Of What AI Actually Rewards via @sejournal, @Kevin_Indig

    Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

    In “The Science Of How AI Pays Attention,” I analyzed 1.2 million ChatGPT responses to understand exactly how AI reads a page. In “The Science Of How AI Picks Its Sources,” I analyzed 98,000 citation rows to understand which pages make it into the reading pool at all.

    This is Part 3.

    Where Part 1 told you where on a page AI looks, and Part 2 told you which pages AI routinely considers, this one tells you what AI actually rewards inside the content it reads.

    The data clarifies:

    • Most AI SEO writing advice doesn’t hold at scale. There is no universal “write like this to get cited” formula – the signals that lift one industry’s citation rates can actively hurt another.
    • The entity types that predict citation are not the ones being targeted. DATE and NUMBER are universal positives. PRICE suppresses citation in five of six verticals, and KG-verified entities are a negative signal.
    • The one writing signal that holds across all seven verticals: Declarative language in your intro, +14% aggregate lift.
    • Heading structure is binary. Commit to the right number for your vertical or use none. Three to four headings are worse than zero in every vertical.
    • Corporate content dominates. Reddit doesn’t. AI citation behavior does not mirror what happened to organic search in 2023-2024.

    1. Specific Writing Signals Influence Citation, While Others Harm It

    While “The Science Of How AI Pays Attention” covers parts of the page and types of writing that influence ChatGPT visibility, I wanted to understand which writing-level signals – word count, structure, language style – predict higher AI citation rates across verticals.

    Approach

    1. I compared high-cited pages (more than three unique prompt citations) vs. low-cited across seven writing metrics: word count, definitive language, hedging, list items, named entity density, and intro-specific signals.
    2. I analyzed the first 1,000 words for list item count, named entity density, intro definitive language token density, and intro number count.

    Results: Across all verticals, definitive phrasing and including relevant entities matter. But most signals are flat.

    Image Credit: Kevin Indig

    What The Industry Patterns Showed

    When splitting the data up by vertical, we suddenly see preferences:

    • Total word count was strongest in CRM/SaaS (1.59x).
    • Finance was an anomaly with word count: Shorter pages win (0.86x word count).
    • Definitive phrases in the first 1,000 characters were positive for most verticals.
    • Education is a signal void. Writing style explains almost nothing about citation likelihood there.
    Image Credit: Kevin Indig

    Top Takeaways

    1. There is no universal “write like this to get cited” formula. For example, the signals that lift CRM/SaaS citation rates actively hurt Finance. Instead, match content format to vertical norms.

    2. The one universal rule: open with a direct declarative statement. Not a question, not context-setting, not preamble. The form is “[X] is [Y]” or “[X] does [Z].” This is the only writing instruction that holds regardless of vertical, content type, or length.

    3. LLMs “penalize” hedging in your intro. “This may help teams understand” performs worse than “Teams that do X see Y.” Remove qualifiers from your opening paragraph before any other optimization.

    2. The Entity Types That Predict Citation Are Not The Ones Being Targeted

    Most AEO advice focuses on named entities as a category: Pack in more known brand names, tool names, numbers. The cross-vertical entity type analysis below tells a more specific (and more useful) story.

    Approach

    1. Ran Google’s Natural Language API on the first 1,000 characters (about 200-250 words) of each unique URL.
    2. Computed lift per entity type: % of high-cited pages with that type / % of low-cited pages.
    3. Analyzed 5,000 pages across seven verticals.

    * A quick note on terminology: Google NLP classifies software products, apps, and SaaS tools as CONSUMER_GOOD, a legacy label from when the API was built for physical retail. Throughout this analysis, CONSUMER_GOOD means software/product entities.

    Results: DATE and NUMBER are the most universal positive signals. Interestingly, PRICE is the strongest universal negative.

    Image Credit: Kevin Indig
    Image Credit: Kevin Indig

    What The Industry Patterns Showed

    • DATE is the most universal positive signal, with the exception of Finance (0.65x).
    • NUMBER is the second most universal. Specific counts, metrics, and statistics in the intro consistently predict higher citation rates. Finance (0.98x) and Product Analytics (1.10x) mark the floor and ceiling of that range.
    • PRICE is the strongest universal negative. Pages that open with pricing signal commercial intent. Finance is the sole exception at 1.16x, likely because price here means fee percentages and rate comparisons, which are the actual reference data financial queries are looking for.
    • CONSUMER_GOOD (software/product entities) is mixed. In Healthcare, product entities signal established brands and tools. In Crypto, naming specific protocols and products is core to answering technical queries.
    • PHONE_NUMBER is a positive signal in Healthcare (1.41x) and Education (1.40x). In both cases, it is almost certainly a proxy for established brands/institutions/providers with real physical presence, not a literal signal to add phone numbers to your pages.

    The Knowledge Graph inversion deserves its own note here:

    • The data showed that high-cited pages average 1.42 KG-verified entities vs. 1.75 for low-cited pages (lift: 0.81x).
    • Pages built around well-known, KG-verified entities (major brands, institutions, famous people) tend toward generic coverage, which isn’t preferred by ChatGPT.
    • High-cited pages are dense with specific, niche entities: a particular methodology, a precise statistic, a named comparison. Many of those niche entities have no KG entries at all. That specificity is what AI reaches for.

    Top Takeaways

    1. Add the publish date to your pages and aim to use at least one specific number in your content. That combination is the closest thing to a universal AI citation signal this dataset produced. But Finance gets there through price data and location specificity instead.

    2. Avoid opening with pricing in non-finance verticals. Price-dominant intros correlate with lower citation rates.

    3. KG presence and brand authority do not translate to an AI citation advantage. Chasing Wikipedia entries, brand panels, or KG verification is the wrong lever. Specific, niche entities (even ones without KG entries) outperform famous ones.

    3. Heading Structure: Commit To One Or Don’t Bother

    We know headings matter for citations from the previous two analyses. Next, I wanted to understand whether heading count predicts citation rates and whether the optimal structure varies by vertical.

    Approach

    1. Counted total headings per page (H1+H2+H3) across all cited URLs.
    2. Grouped pages into 7 heading-count buckets: 0, 1-2, 3-4, 5-9, 10-19, 20-49, 50+.
    3. Computed high-cited rate (% of URLs that are high-cited) per bucket per vertical.

    Results: Including more headings in your content is not universally better. The sweet spot depends on vertical and content type. One finding holds everywhere: Strangely, 3-4 headings are worse than zero.

    Image Credit: Kevin Indig

    What The Industry Patterns Showed

    • CRM/SaaS is the only vertical where the 20+ heading lift is confirmed: 12.7% high-cited rate at 20-49 headings vs. a 5.9% baseline. The 50+ bucket reaches 18.2%. Long structured reference pages and comparison guides with one section per tool outperform everything else here.
    • Healthcare inverts most sharply. The high-cited rate drops from 15.1% at zero headings to 2.5% at 20-49 headings. A page with 30 H2s on telehealth topics signals optimization intent, not clinical authority.
    • Finance peaks at 10-19 headings (29.4% high-cited rate). Structured but not exhaustive: think rate tables, regulatory breakdowns, and advisor comparison pages with moderate heading depth.
    • Crypto peaks at five to nine headings (34.7% high-cited rate). Technical documentation in this vertical tends toward dense prose with moderate navigation structure. Over-structuring breaks up the technical depth.
    • Education is flat across all heading counts, which is consistent with the writing signals finding. Heading structure explains almost nothing about citation likelihood in education content.
    • The three to four heading dead zone holds across every vertical without exception. Partial structure confuses AI navigation without providing the full benefit of a committed hierarchy.

    Top Takeaways

    1. The 20+ heading finding from Part 1 is a CRM/SaaS finding, not a universal one. Applying it to healthcare, education, or finance could actively suppress citation rates in those verticals.

    2. The principle that holds everywhere: Commit to structure or don’t use it. The middle ground costs you in every vertical. A fully-structured page with the right heading depth outperforms a half-structured page in every vertical.

    3. Use the optimal heading range for your vertical. Crypto: 5-9. Finance and Education: 10-19. CRM/SaaS: 20+ (with H3s). Healthcare: 0 or 5-9 at most. Long CRM reference pages with 50+ sections are the one case where maximum heading depth pays off.

    4. UGC Doesn’t Dominate

    The “Reddit effect” reshaped organic search between 2024 and 2025. I wanted to understand whether ChatGPT cites user-generated content (Reddit, forums, reviews) at meaningful rates or whether corporate/editorial content dominates.

    The common industry assumption – that AI also preferentially cites community voices – is not what we found in the data.

    Approach

    1. Classified these cited URLs as (1) UGC: Reddit, Quora, Stack Overflow, forum subdomains, Medium, Substack, Product Hunt, Tumblr, or (2) community/forum prefixes or corporate/editorial by domain.
    2. Computed citation share per category per vertical.
    3. Dataset: 98,217 citations across 7 verticals.

    Results: Corporate content accounts for 94.7% of all citations. UGC is nearly invisible.

    Image Credit: Kevin Indig

    What The Industry Patterns Showed

    • Finance is the most corporate-locked vertical at 0.5% UGC. YMYL (Your Money, Your Life) content appears to systematically suppress citations to community opinion.
    • Healthcare sits at 1.8% UGC for the same structural reason. Clinical, telehealth, and HIPAA content draws almost exclusively from institutional sources.
    • Crypto has the highest UGC penetration in the dataset at 9.2%. Community-generated content (Reddit technical threads, Medium tutorials, developer forum posts) answers a meaningful proportion of analyzed queries. In a fast-moving technical niche where official documentation consistently lags, community posts fill the gap.
    • Product Analytics and HR Tech sit at 6.9% and 5.8% UGC. Both are verticals where Reddit comparison threads and product review communities provide genuine signal alongside corporate content.

    Top Takeaways

    1. The “Reddit effect” in SEO has not translated proportionally to AI citations. In most verticals, reddit.com captures 2-5% of total citations. This finding is in line with other industry research, including this report from Profound.

    2. For finance and healthcare: UGC has near-zero AI citation value. Invest in structured, authoritative corporate content with clear sourcing. Community engagement may matter for other reasons, but it does not contribute meaningfully to AI citation share in these verticals.

    3. For crypto, product analytics, and HR tech: Community presence has measurable citation value. Detailed Reddit comparison threads, technical Medium posts, and structured developer forum answers can supplement corporate content reach.

    What This Means For How You Strategize For LLM Visibility

    Across all three parts of this study, the consistent finding is that AI citation is not primarily a writing quality problem.

    Part 2 showed it is a content architecture problem: Thin single-intent pages are structurally locked out regardless of how well they’re written. This piece shows the same logic applies inside the content itself.

    The aggregate writing signals table is the most important chart in this analysis. Not because it shows you what to do, but because it shows how much of what the AI SEO/GEO/AEO industry is telling you doesn’t survive cross-vertical scrutiny. Word count, list density, named entity counts … all flat or negative at the aggregate. The signals that work are vertical-specific and smaller than our industry’s consensus implies.

    The meta-lesson from this analysis is that findings are vertical (and probably topic) specific, which is no different in SEO.

    This part concludes the Science of AI – for now. Because the AI ecosystem is constantly changing.

    Methodology

    We analyzed ~98,000 ChatGPT citation rows pulled from approximately 1.2 million ChatGPT responses from Gauge.

    Because AI behaves differently depending on the topic, we isolated the data across seven distinct, verified verticals to ensure the findings weren’t skewed by one specific industry.

    Analyzed verticals:

    • B2B SaaS
    • Finance
    • Healthcare
    • Education
    • Crypto
    • HR Tech
    • Product Analytics

    Featured Image: CoreDESIGN/Shutterstock; Paulo Bobita/Search Engine Journal

    So Your Traffic Tanked: What Smart CMOs Do Next

    We’ve all seen it. Brands with healthy websites and excellent content have been watching their organic traffic from Google’s SERP erode for years. In a recent webinar hosted by Search Engine Journal, guest speaker Nikhil Lai, principal analyst of Performance Marketing for Forrester Research, estimated his clients are losing between 10 and 40% of organic and direct traffic year-over-year.

    However, a stunning bright spot is this: Lai said referral traffic from answer engines is growing 40% month over month. Visitors arriving from those engines convert at two to four times the rate of traditional search visitors, spend three times as long on site, and arrive with queries averaging 23 words, compared to the three or four words that defined the last decade of search.

    Lai asserted that the channel driving this shift deserves a seat at the CMO’s table. Answer engines influence brand perception before purchase intent forms, which makes answer engine optimization (AEO) a brand investment, and puts budget and measurement decisions at the CMO level.

    Here is the strategic roadmap Lai laid out at SEJ Live. He highlighted the decisions, org structures, and measurement frameworks that will move AEO from a search team initiative to a C-suite priority.

    Answer Engines Build Demand Before Buyers Know What They Want

    Classic search captures intent that already exists. A user types “running shoes,” clicks a result, and evaluates options. Answer engines operate earlier and differently: users hold extended conversations with large datasets, rarely click through, and leave those sessions with specific brand associations formed across multiple follow-up questions.

    A user who once searched “running shoes” now asks ChatGPT, “What’s the best shoe for overpronation with wide feet in cold weather on pavement?” They exit that conversation with a brand name in mind and search for it directly. Your brand appeared in an AI conversation before the user ever reached your site. Every day, demand generation is created from users’ research sessions.

    The Forrester data Lai presented reinforces the quality of that exposure: Sessions on answer engines average 23 minutes, with users asking five to eight follow-up questions per session. Each turn is another brand impression. The click-through rate stays low; the conversion rate on the traffic that does arrive runs two to four times higher than search-sourced traffic, with stronger average order value and lifetime value.

    Brand familiarity is built in answer engines before purchase intent crystallizes in the user’s mind.

    SEO Is The Foundation Of AEO

    The brands pulling back on SEO investment in response to AEO are making a costly mistake. Lai put it directly: 85 to 90% of current SEO best practices remain fully valid for answer engine visibility.

    Google’s E-E-A-T framework (experience, expertise, authoritativeness, trustworthiness) still governs how quality is evaluated across every index. Site architecture, mobile load speed, structured data, and indexation hygiene all strengthen performance across every engine. Every alternative index (Bing’s, Brave’s) is benchmarked against Google’s for completeness. Every bot (GPTBot, Claudebot, Perplexitybot) is benchmarked against Googlebot for sophistication.

    SEO is the infrastructure on which AEO runs. The shift is an expansion of scope and emphasis, but AEO is not a replacement of SEO fundamentals.

    What changes is where additional effort goes: natural-language FAQ optimization, off-site authority building, pre-rendering for less sophisticated bots, and a measurement framework built around share of voice rather than click volume.

    Bing Is Now Your Distribution Network For Every Non-Google Engine

    Most answer engines outside Google draw primarily from Bing’s index.

    Bing evaluates credibility by weighting what others say about your brand more heavily than what your own site claims. This explains why Reddit threads, Quora answers, Wikipedia entries, G2 reviews, YouTube videos, and Trustpilot pages dominate AI-generated answers. The off-site web has become the primary source of record for how AI describes your brand.

    The immediate tactical implication: Push every sitemap update directly to Bing via the IndexNow protocol. This triggers Bingbot to crawl fresh content and feeds that content into Perplexity, ChatGPT, and the broader answer engine ecosystem faster than waiting for organic discovery.

    Bing’s index remains the fastest route to non-Google answer engine visibility. Perplexity is building its own index (Sonar), and OpenAI has signaled plans to build or acquire one, but Bing is the distribution network that matters today.

    AEO Requires Cross-Functional Ownership

    AEO arguably spans more functions than SEO, with these three in common with SEO: content, web development, and paid search. AEO also more strongly interfaces with PR, brand marketing, and social media.

    PR earns a seat because off-site authority outweighs on-site signals in AEO. Brand mentions in publications, influencer mentions, and third-party reviews all directly shape how answer engines describe your brand.

    Social belongs in the room because Reddit threads and Facebook group discussions show up in AI-generated answers. Community management and reputation management, previously handled separately from SEO, are now integral to AEO. When your social listening data reaches content teams before they draft, the content responds to the questions buyers are actually asking. When it doesn’t, you’re optimizing for questions nobody asked.

    Lai proposed two organizational models that work to capture the opportunities inherent in AEO:

    1. Center of Excellence: A senior SEO specialist evolves into an AEO evangelist, runs a COE, and publishes cross-functional standards: clear rules like “every piece of content must answer these five questions” or “every page must include author schema.”
    2. AI Orchestrator: A dedicated hire who builds agents to handle repeatable AEO tasks (schema implementation, JavaScript reduction, FAQ content creation) and governs the cross-functional workflow with published guidelines for all stakeholders.

    The CMO’s decision is which model fits the organization’s scale, and whether to build it internally or partner with an agency that has already built the infrastructure.

    The Content Strategy That Wins In AI Responses

    Long-form skyscraper content is an ancient relic. Answer engines reward precise, specific answers to real questions, delivered succinctly and across multiple formats. Lai framed this as Forrester’s question-to-content framework: Every piece of content maps directly to a FAQ being asked on answer engines, including the follow-up questions that emerge within a single session.

    Five content moves that produce results:

    1. Build surround-sound FAQ coverage. Create glossaries, FAQ pages, videos, and blog posts that address the same topic cluster from different angles. When Claudebot crawls 38,000 pages for every referred page visit (per Cloudflare data), each page it indexes is an opportunity to signal topical authority. Volume and variety matter.
    2. Publish direct competitor comparisons. Users ask answer engines to compare brands. Brands that create honest, data-backed comparison guides are gaining prominent visibility, because they directly answer the queries being asked that pit a brand against its competitors. This was once a taboo content format; it has become a competitive requirement.
    3. Treat off-site syndication as the new backlinking. Hosting AMAs on Reddit, answering questions on Quora, and contributing to industry publications that rank in AI responses all earn the off-site authority that answer engines weigh most heavily. Give third-party voices data and perspective they couldn’t generate themselves, and they will produce mentions that shape how AI describes your brand.
    4. Pre-render pages for bot access. The bots crawling your site lack the compute budget to render JavaScript-heavy pages. Claudebot’s 38,000:1 crawl-to-referral ratio compared to Googlebot’s 5:1 ratio reflects this sophistication gap. Pre-rendering a JavaScript-free version for bots while serving the full experience to human visitors ensures your content gets indexed across every engine. Over time, limit the amount of JavaScript on site. Have content directly in HTML so bots can understand your content, and index it more often. The more you’re crawled and indexed, the more visible you become.
    5. Create unique content. Lai said, “Being distinctive, differentiated, and unique will help your brand stand out in a sea of sameness. Implicit in all this is that you need a lot more content, greater content velocity and diversity, which means you can use AI to create content. Google won’t automatically penalize AI-created content unless it lacks the watermarks of human authorship. The syntax and diction have to be natural. Use AI to create content, but don’t make it seem AI-generated. Get down into the details. It’s not enough to say your product is great. Explain why in different temperatures, conditions, the thickness, and so on, to satisfy long-tail intent.”

    Replace Legacy KPIs With Metrics That Predict Market Share

    The internal conversation, Lai said, he hears most from Forrester clients: “The hardest part of this transition from SEO to AEO has been trying to convince management to not focus as much on CTR and traffic. Those were indicators of organic authority. They are no longer reliable indicators.

    “The new KPIs to focus on are visibility and share of voice. Share of voice can be measured in many ways. The most common are citation share: how often is my brand cited, how often is my content linked, of the opportunities I have to be cited; and mention share: how often is my brand mentioned of the opportunities I have to be mentioned. I’m also seeing more clients look into citation attempts: how often is ChatGPT trying to cite my content, and are there things I can do on the back end of my site to make that citation attempt score go up? Those are the new indicators of authority,” said Lai.

    These metrics connect directly to branded search volume, which Lai called “the single strongest leading indicator of market share growth.” The chain of logic to present to the board: higher citation and mention share drives more branded searches, which converts at higher rates, which compounds into measurable market share gains against competitors.

    Lai said he expects Google to add citation metrics to Search Console once AI Max adoption reaches critical mass, and an OpenAI Analytics product before year-end.

    For now, Lai suggested, the best course of action is to establish a baseline with your current SEO platform and track the directional trend. Lai contended that, to address concerns of accuracy within today’s popular SEO tools of answer engine mentions, even imperfect measurement reveals which content clusters are earning citations and which need rebuilding.

    The Agentic Phase Starts The Clock On B2B Urgency

    Answer engines are moving from conversation to action. The current phase, characterized by extended back-and-forth with large datasets, is the warm-up. The agentic phase is defined by engines’ booking, filing, researching, and purchasing on users’ behalf. This will mean fewer clicks, longer sessions, and richer intent signals available to advertisers.

    For B2B CMOs, the urgency is immediate. Forrester research shows GenAI has already become the number one source of information for business buyers evaluating purchases of $1 million or more, coming in ahead of customer references, vendor websites, and social media. Your largest deals are being influenced by AI conversations before your sales team enters the picture.

    AEO visibility in B2B is a current-pipeline variable that requires immediate attention.

    The brands building complete search strategies now, covering answer engines, on-site conversational search, and structured data across every indexed channel, will own discovery and have greater control over brand perception in the next phase of buying behavior.

    The window to gain an early-mover competitive advantage is shrinking, before AEO visibility becomes just another standard expectation everyone has to meet.

    Key Takeaways For CMOs

    • Reframe the traffic story. Lower overall traffic volume paired with two-to-four-times higher conversion rates is a net performance gain. Build that case proactively before your CEO draws the wrong conclusion from a falling traffic chart.
    • Fund AEO as an upper-funnel brand channel. That means applying the same budget logic, measurement frameworks, and executive ownership you would bring to any major brand awareness investment, where success is measured in visibility, perception, and long-term share of voice rather than clicks and conversions.
    • Move to share-of-voice KPIs. Citation share and mention share drive branded search volume, which drives market share. Make that causal chain visible to your leadership team.
    • Assign cross-functional ownership with clear governance. Choose between a center of excellence or an AI orchestrator model and make that structural decision this quarter.
    • Prioritize off-site authority as a content strategy responsibility. Reddit, Quora, third-party publications, and YouTube shape AI’s perception of your brand. PR and social teams own the channels that matter most for AEO.
    • Push every sitemap update to Bing via IndexNow. Bing’s index feeds most non-Google answer engines. This is a 15-minute technical change with compounding distribution benefits.
    • Use AI to help with content, but always apply human editing for authority. Content that reads as machine-generated loses trust across every engine, including Google.

    What Does A Smart CMO Do Next?

    Start with a 90-day experiment using some or all of these strategies.

    Audit your current citation and mention share in one category using your existing SEO platform. Identify three high-intent FAQ clusters where your brand should be visible and build surround-sound content for each: a dedicated FAQ page, a comparison guide, and one off-site piece in a publication that appears in AI responses. Push fresh sitemaps to Bing. Track citation share and branded search volume at 30, 60, and 90 days.

    The data may make the investment case for broader rollout. If not, tweak your approach. The brands moving first will capture the highest-quality traffic at the lowest incremental cost, and set the citation baseline that becomes progressively harder for competitors to close.

    The full webinar is available on demand.

    More Resources:


    Featured Image: Dmitry Demidovich/Shutterstock