The Truth About LLM Hallucinations With Barry Adams via @sejournal, @theshelleywalsh

The launch of ChatGPT blew apart the search industry, and the last few years have seen more and more AI integration into search engine results pages.

In an attempt to keep up with the LLMs, Google launched AI Overviews and just announced AI Mode tabs.

The expectation is that SERPs will become blended with a Large Language Model (LLM) interface, and the nature of how users search will adapt to conversations and journeys.

However, there is an issue surrounding AI hallucinations and misinformation within LLM and Google AI Overview generated results, and it seems to be largely ignored, not just by Google but also by the news publishers it affects.

More worrying is that users are either unaware or prepared to accept the cost of misinformation for the sake of convenience.

Barry Adams is the authority on editorial SEO and works with the leading news publisher titles worldwide via Polemic Digital. Barry also founded the News & Editorial SEO Summit along with John Shehata.

I read a LinkedIn post from Barry where he said:

“LLMs are incredibly dumb. There is nothing intelligent about LLMs. They’re advanced word predictors, and using them for any purpose that requires a basis in verifiable facts – like search queries – is fundamentally wrong.

But people don’t seem to care. Google doesn’t seem to care. And the tech industry sure as hell doesn’t care, they’re wilfully blinded by dollar signs.

I don’t feel the wider media are sufficiently reporting on the inherent inaccuracies of LLMs. Publishers are keen to say that generative AI could be an existential threat to publishing on the web, yet they fail to consistently point out GenAI’s biggest weakness.”

The post prompted me to speak to him in more detail about LLM hallucinations, their impact on publishing, and what the industry needs to understand about AI’s limitations.

You can watch the full interview with Barry on IMHO below, or continue reading the article summary.

Why Are LLMs So Bad At Citing Sources?

I asked Barry to explain why LLMs struggle with accurate source attribution and factual reliability.

Barry responded, “It’s because they don’t know anything. There’s no intelligence. I think calling them AIs is the wrong label. They’re not intelligent in any way. They’re probability machines. They don’t have any reasoning faculties as we understand it.”

He explained that LLMs operate by regurgitating answers based on training data, then attempting to rationalize their responses through grounding efforts and link citations.

Even with careful prompting to use only verified sources, these systems maintain a high probability of hallucinating references.

“They are just predictive text from your phone, on steroids, and they will just make stuff up and very confidently present it to you because that’s just what they do. That’s the entire nature of the technology,” Barry emphasized.

This confident presentation of potentially false information represents a fundamental problem with how these systems are being deployed in scenarios they’re not suited for.

Are We Creating An AI Spiral Of Misinformation?

I shared with Barry my concerns about an AI misinformation spiral where AI content increasingly references other AI content, potentially losing the source of facts and truth entirely.

Barry’s outlook was pessimistic, “I don’t think people care as much about truth as maybe we believe they should. I think people will accept information presented to them if it’s useful and if it conforms with their pre-existing beliefs.”

“People don’t really care about truth. They care about convenience.”

He argued that the last 15 years of social media have proven that people prioritize confirmation of their beliefs over factual accuracy.

LLMs facilitate this process even more than social media by providing convenient answers without requiring critical thinking or verification.

“The real threat is how AI is replacing truth with convenience,” Barry observed, noting that Google’s embrace of AI represents a clear step away from surfacing factual information toward providing what users want to hear.

Barry warned we’re entering a spiral where “entire societies will live in parallel realities and we’ll deride the other side as being fake news and just not real.”

Why Isn’t Mainstream Media Calling Out AI’s Limitations?

I asked Barry why mainstream media isn’t more vocal about AI’s weaknesses, especially given that publishers could save themselves by influencing public perception of Gen AI limitations.

Barry identified several factors: “Google is such a powerful force in driving traffic and revenue to publishers that a lot of publishers are afraid to write too critically about Google because they feel there might be repercussions.”

He also noted that many journalists don’t genuinely understand how AI systems work. Technology journalists who understand the issues sometimes raise questions, but general reporters for major newspapers often lack the knowledge to scrutinize AI claims properly.

Barry pointed to Google’s promise that AI Overviews would send more traffic to publishers as an example: “It turns out, no, that’s the exact opposite of what’s happening, which everybody with two brain cells saw coming a mile away.”

How Do We Explain The Traffic Reduction To News Publishers?

I noted research that shows users do click on sources to verify AI outputs, and that Google doesn’t show AI Overviews on top news stories. Yet, traffic to news publishers continues to decline overall.

Barry explained this involves multiple factors:

“People do click on sources. People do double-check the citations, but not to the same extent as before. ChatGPT and Gemini will give you an answer. People will click two or three links to verify.

Previously, users conducting their own research would click 30 to 40 links and read them in detail. Now they might verify AI responses with just a few clicks.

Additionally, while news publishers are less affected by AI Overviews, they’ve lost traffic on explainer content, background stories, and analysis pieces that AI now handles directly with minimal click-through to sources.”

Barry emphasized that Google has been diminishing publisher traffic for years through algorithm updates and efforts to keep users within Google’s ecosystem longer.

“Google is the monopoly informational gateway on the web. So you can say, ‘Oh, don’t be dependent on Google,’ but you have to be where your users are and you cannot have a viable publishing business without heavily relying on Google traffic.”

What Should Publishers Do To Survive?

I asked Barry for his recommendations on optimizing for LLM inclusion and how to survive the introduction of AI-generated search results.

Barry advised publishers to accept that search traffic will diminish while focusing on building a stronger brand identity.

“I think publishers need to be more confident about what they are and specifically what they’re not.”

He highlighted the Financial Times as an exemplary model because “nobody has any doubt about what the Financial Times is and what kind of reporting they’re signing up for.”

This clarity enables strong subscription conversion because readers understand the specific value they’re receiving.

Barry emphasized the importance of developing brand power that makes users specifically seek out particular publications, “I think too many publishers try to be everything to everybody and therefore are nothing to nobody. You need to have a strong brand voice.”

He used the example of the Daily Mail that succeeds through consistent brand identity, with users specifically searching for the brand name with topical searches such as “Meghan Markle Daily Mail” or “Prince Harry Daily Mail.”

The goal is to build direct relationships that bypass intermediaries through apps, newsletters, and direct website visits.

The Brand Identity Imperative

Barry stressed that publishers covering similar topics with interchangeable content face existential threats.

He works with publishers where “they’re all reporting the same stuff with the same screenshots and the same set photos and pretty much the same content.”

Such publications become vulnerable because readers lose nothing by substituting one source for another. Success requires developing unique value propositions that make audiences specifically seek out particular publications.

“You need to have a very strong brand identity as a publisher. And if you don’t have it, you probably won’t exist in the next five to ten years,” Barry concluded.

Barry advised news publishers to focus on brand development, subscription models, and building content ecosystems that don’t rely entirely on Google. That may mean fewer clicks, but more meaningful, higher-quality engagement.

Moving Forward

Barry’s opinion and the reality of the changes AI is forcing are hard truths.

The industry requires honest acknowledgment of AI limitations, strategic brand building, and acceptance that easy search traffic won’t return.

Publishers have two options: To continue chasing diminishing search traffic with the same content that everyone else is producing, or they invest in direct audience relationships that provide sustainable foundations for quality journalism.

Thank you to Barry Adams for offering his insights and being my guest on IMHO.

More Resources: 


Featured Image: Shelley Walsh/Search Engine Journal 

Your Next Time Saver: How To Use AI To Save Time On Hosting Maintenance, Agency Edition via @sejournal, @Hanrahan7

This post was sponsored by Cloudways. The opinions expressed in this article are the sponsor’s own.

Have you ever woken up to a 3 AM client website panic?

Did your client’s ecommerce site crash during a flash sale?

Has another client asked why their site is slow, “even though we’re paying for premium hosting.”

This isn’t just an occasional nuisance.

If you’re managing multiple client sites, hosting maintenance becomes a full-on job in itself. The worst part? None of this time is billable, and every minute spent troubleshooting is a minute you’re not spending on business growth.

Here’s the truth: The way you handle hosting maintenance may be broken. And it’s costing you far more than you realize, in time, money, and missed opportunities.

In this article, we’ll explore:

Ways You’re Accidentally Draining Agency Revenue

You and your agency may lose countless hours to hosting maintenance without realizing the true cost.

Behind every “quick fix” lies a hidden drain on productivity and profits.

Are You Doing This?

A frantic client message or monitoring alert, often hours after the problem started. Then:

  • Developers scramble to check logs and test configurations.
  • The team disables plugins one by one as a diagnostic method.
  • Someone finally contacts hosting support after internal efforts fail.
  • The issue gets resolved (often) after hours of back-and-forth.

The financial impact is staggering when you do the math.

Consider an agency managing just 30 websites.

If each site experiences only 2 hosting incidents per month requiring 3 hours to resolve, that’s 180 hours annually.

This is nearly an entire month’s worth of lost productivity.

  • Average resolution time: 3.5 hours per incident.
  • For an agency with 50 client sites, 4,200 hours/year lost.
  • At a $150/hour billable rate → $630,000 potential revenue wasted.

Beyond direct costs, this broken system creates three major problems:

  1. Team burnout – Constant firefighting demoralizes developers
  2. Client distrust – Repeated issues make your agency look incompetent
  3. Growth stagnation – Leadership spends time troubleshooting instead of scaling

Each downtime incident plants seeds of doubt about your agency’s technical competence. After just a few occurrences, clients start questioning why they’re paying premium rates for what feels like unreliable service. This erosion of confidence makes contract renewals harder and opens the door for competitors.

How To Solve Client Website Hosting Issues

Most agencies cycle through the same ineffective solutions, each with significant drawbacks:

Don’t: Only Take The Staffing Approach

The most common solution is hiring dedicated infrastructure staff. Many agencies believe bringing a systems admin or DevOps engineer on board will solve their hosting woes. While this provides more control, it creates new problems. You’re now responsible for recruiting, managing, and covering the cost of specialized technical talent.

  • $85k+ annual salary for each infrastructure specialist.
  • Ongoing management overhead for technical staff.
  • Limited availability for after-hours emergencies.
  • Still requires hosting provider support for complex issues.

Don’t: Just Take The Managed Hosting Solution

Many agencies turn to managed hosting providers to alleviate their maintenance burden.

Technically adept teams can absolutely handle straightforward server-level maintenance, security patches, and core updates; however, most still require some additional support when faced with:

  • Application-specific troubleshooting (plugin conflicts, theme issues).
  • Custom performance optimization.
  • Specialized configurations.

The key difference lies in how managed hosting providers address these residual needs. Traditional hosting providers might still leave you waiting in support queues, while next-gen platforms automatically begin repairs.

Don’t: Simply Use Website Uptime Monitoring Tools

You may think about attempting to solve the problem through monitoring tools.

Website monitoring tools layer on services like New Relic, Datadog, and UptimeRobot, hoping the better visibility will reduce firefighting.

While these tools provide valuable data, they primarily generate more alerts for your team to interpret and take action on. You’ve essentially traded one problem for another – instead of lacking information, you’re now drowning in it.

  • Alert overload from multiple systems.
  • False positives that waste investigation time.
  • No actionable insights – just more data to interpret.
  • Still requires manual diagnosis and resolution.

Do: Incorporate AI-Powered Hosting Maintenance

Imagine, instead of the chaotic process, you:

  1. Know about issues before clients did.
  2. Understand exactly what went wrong, in plain English.
  3. Get step-by-step instructions to fix it immediately.

Copilots that can do these tasks are your first step towards using and creating a self-learning, auto-healing hosting platform.

They can use intelligent monitoring to detect and help resolve the most common and critical server issues.

Hosting Maintenance: Before & After AI Integration

The Old Way:

  • Client reports site is down (30+ minutes after it actually went down).
  • You spend an hour checking logs and plugins.
  • You contact support and wait 2 hours for a response.
  • Support suggests a fix that may or may not work.
  • Total downtime: 4+ hours.

With Cloudways Copilot:

  • Copilot detects the issue immediately (often before users notice).
  • You receive an alert with exact cause and fix.
  • You implement the solution in minutes.
  • Total downtime: Dramatically reduced resolution time compared to traditional troubleshooting.

How To Get Automatic Hosting & Site Alerts, Repairs & Updates

You can configure Cloudways Copilot to manage many facets of web hosting.

Host Health

Triggers when your entire server goes down, typically from:

Webstack Health

  • Alerts when core services fail (Apache, Nginx, MySQL, PHP-FPM).
  • Catches crashes before they take sites offline.
  • Identifies resource exhaustion issues.

Disk & Inode Health

Warns before you hit critical limits:

  • Disk space (95%+ utilization).
  • Inode usage (separate from storage space).

Result: Instant problem detection!

Copilot continuously monitors your servers and applications for:

  • Performance bottlenecks.
  • Security threats.
  • Resource constraints.
  • Configuration errors.

Unlike traditional monitoring tools that just tell you “something’s wrong,” Copilot identifies the specific issue.

What AI Reports Look Like For Website Maintenance

For each problem detected, Copilot provides:

  1. What happened: The specific error or issue
  2. Where it occurred: Which site, which server
  3. Why it happened: The root cause analysis
  4. How to fix it: Step-by-step resolution instructions

Real-World Example

Instead of just “High CPU Usage” alert, Copilot tells you:

“Your WordPress site on Server X is experiencing high CPU due to a poorly optimized WooCommerce query in Plugin Y. Disable the plugin or contact the developer for an update.”

To be more specific, the example below shows Copilot detecting a DDOS attack, listing malicious IPs, and identifying the file that is being attacked. It then provides remediation steps to prevent the DDOS attack.

Investigation Summary

The investigation shows that much of the server load is driven by multiple concurrent Magento cron tasks and frequent POST requests to the xmlrpc.php endpoint. These factors have caused high CPU usage and intermittent HTTP 500 errors. It appears that the repeated bot visits to xmlrpc.php and resource-intensive Magento cron jobs are the likely cause of the problem.

Remediation Steps

To mitigate the issues related to the potential DDoS attack and improve server performance, please follow these steps:

  1. Block Offending IPs: Use the `.htaccess` file to block the specific IP addresses that are contributing to the abuse. Add the following lines to your .htaccess file:
    1. deny from 66.249.75.34
    2. deny from 114.119.136.249
    3. deny from 182.43.65.168
  2. Implement Cloudflare: If you haven’t already, consider setting up Cloudflare for your application. This will enhance your website’s security and help mitigate DDoS attacks by filtering malicious traffic before it reaches your server. Follow the integration steps provided by Cloudways to set up CNAME records.
  3. Enable Under Attack Mode: Once Cloudflare is set up, enable Under Attack Mode, which performs an extra security check for every visitor and helps protect your site from ongoing DDoS attacks, especially targeting specific URLs like /xmlrpc.php.

By taking these actions, you can significantly reduce the strain on your server and improve its performance.”,

Support links:

https://support.cloudways.com/en/articles/6009152-how-to-integrate-cloudflare-with-your-application

https://support.cloudways.com/en/articles/5120765-how-to-monitor-system-processes-using-htop-command

This is how Cloudways Copilot uses AI to identify hosting and server issues by comparing them to similar cases across the fleet, quickly suggesting the most effective remediation solutions with step-by-step instructions. This saves you time by providing immediate solutions without the need for manual detection, troubleshooting, or back-and-forth support tickets, preventing disappointment for your clients.

Image create by Cloudways, April 2025

At the end of the day, hosting headaches shouldn’t waste your agency’s most valuable resource: time. Every minute spent troubleshooting is a minute taken away from client work, business growth, or simply having a life outside of server emergencies.

Cloudways Copilot tackles this problem at its root by:

  • Detecting issues before clients notice.
  • Pinpointing exactly what broke and why.
  • Showing where problems occurred (specific apps/servers).
  • Providing step-by-step fixes in plain language.
  • Cutting resolution time from hours to minutes.

What’s coming next makes Cloudways Copilot even better:

  • One-click fixes – Resolve common errors automatically with a single click
  • Automated resolutions – Let Copilot handle routine tasks like server-wide cache purges and backup management
  • Developer workflows – Automate performance monitoring and testing to free up your team

Best of all? During our early access period, Cloudways Copilot is completely free. We’re currently onboarding users through our limited-access program – visit the Cloudways Copilot page and submit your details to secure your spot.


Image Credits

Featured Image: Image by Cloudways. Used with permission.

In-Post Image: Images by Cloudways. Used with permission.

Inside Amsterdam’s high-stakes experiment to create fair welfare AI

This story is a partnership between MIT Technology Review, Lighthouse Reports, and Trouw, and was supported by the Pulitzer Center. 

Two futures

Hans de Zwart, a gym teacher turned digital rights advocate, says that when he saw Amsterdam’s plan to have an algorithm evaluate every welfare applicant in the city for potential fraud, he nearly fell out of his chair. 

It was February 2023, and de Zwart, who had served as the executive director of Bits of Freedom, the Netherlands’ leading digital rights NGO, had been working as an informal advisor to Amsterdam’s city government for nearly two years, reviewing and providing feedback on the AI systems it was developing. 

According to the city’s documentation, this specific AI model—referred to as “Smart Check”—would consider submissions from potential welfare recipients and determine who might have submitted an incorrect application. More than any other project that had come across his desk, this one stood out immediately, he told us—and not in a good way. “There’s some very fundamental [and] unfixable problems,” he says, in using this algorithm “on real people.”

From his vantage point behind the sweeping arc of glass windows at Amsterdam’s city hall, Paul de Koning, a consultant to the city whose résumé includes stops at various agencies in the Dutch welfare state, had viewed the same system with pride. De Koning, who managed Smart Check’s pilot phase, was excited about what he saw as the project’s potential to improve efficiency and remove bias from Amsterdam’s social benefits system. 

A team of fraud investigators and data scientists had spent years working on Smart Check, and de Koning believed that promising early results had vindicated their approach. The city had consulted experts, run bias tests, implemented technical safeguards, and solicited feedback from the people who’d be affected by the program—more or less following every recommendation in the ethical-AI playbook. “I got a good feeling,” he told us. 

These opposing viewpoints epitomize a global debate about whether algorithms can ever be fair when tasked with making decisions that shape people’s lives. Over the past several years of efforts to use artificial intelligence in this way, examples of collateral damage have mounted: nonwhite job applicants weeded out of job application pools in the US, families being wrongly flagged for child abuse investigations in Japan, and low-income residents being denied food subsidies in India. 

Proponents of these assessment systems argue that they can create more efficient public services by doing more with less and, in the case of welfare systems specifically, reclaim money that is allegedly being lost from the public purse. In practice, many were poorly designed from the start. They sometimes factor in personal characteristics in a way that leads to discrimination, and sometimes they have been deployed without testing for bias or effectiveness. In general, they offer few options for people to challenge—or even understand—the automated actions directly affecting how they live. 

The result has been more than a decade of scandals. In response, lawmakers, bureaucrats, and the private sector, from Amsterdam to New York, Seoul to Mexico City, have been trying to atone by creating algorithmic systems that integrate the principles of “responsible AI”—an approach that aims to guide AI development to benefit society while minimizing negative consequences. 

CHANTAL JAHCHAN

Developing and deploying ethical AI is a top priority for the European Union, and the same was true for the US under former president Joe Biden, who released a blueprint for an AI Bill of Rights. That plan was rescinded by the Trump administration, which has removed considerations of equity and fairness, including in technology, at the national level. Nevertheless, systems influenced by these principles are still being tested by leaders in countries, states, provinces, and cities—in and out of the US—that have immense power to make decisions like whom to hire, when to investigate cases of potential child abuse, and which residents should receive services first. 

Amsterdam indeed thought it was on the right track. City officials in the welfare department believed they could build technology that would prevent fraud while protecting citizens’ rights. They followed these emerging best practices and invested a vast amount of time and money in a project that eventually processed live welfare applications. But in their pilot, they found that the system they’d developed was still not fair and effective. Why? 

Lighthouse Reports, MIT Technology Review, and the Dutch newspaper Trouw have gained unprecedented access to the system to try to find out. In response to a public records request, the city disclosed multiple versions of the Smart Check algorithm and data on how it evaluated real-world welfare applicants, offering us unique insight into whether, under the best possible conditions, algorithmic systems can deliver on their ambitious promises.  

The answer to that question is far from simple. For de Koning, Smart Check represented technological progress toward a fairer and more transparent welfare system. For de Zwart, it represented a substantial risk to welfare recipients’ rights that no amount of technical tweaking could fix. As this algorithmic experiment unfolded over several years, it called into question the project’s central premise: that responsible AI can be more than a thought experiment or corporate selling point—and actually make algorithmic systems fair in the real world.

A chance at redemption

Understanding how Amsterdam found itself conducting a high-stakes endeavor with AI-driven fraud prevention requires going back four decades, to a national scandal around welfare investigations gone too far. 

In 1984, Albine Grumböck, a divorced single mother of three, had been receiving welfare for several years when she learned that one of her neighbors, an employee at the social service’s local office, had been secretly surveilling her life. He documented visits from a male friend, who in theory could have been contributing unreported income to the family. On the basis of his observations, the welfare office cut Grumböck’s benefits. She fought the decision in court and won.

Albine Grumböck in the courtroom with her lawyer and assembled spectators
Albine Grumböck, whose benefits had been cut off, learns of the judgement for interim relief.
ROB BOGAERTS/ NATIONAAL ARCHIEF

Despite her personal vindication, Dutch welfare policy has continued to empower welfare fraud investigators, sometimes referred to as “toothbrush counters,” to turn over people’s lives. This has helped create an atmosphere of suspicion that leads to problems for both sides, says Marc van Hoof, a lawyer who has helped Dutch welfare recipients navigate the system for decades: “The government doesn’t trust its people, and the people don’t trust the government.”

Harry Bodaar, a career civil servant, has observed the Netherlands’ welfare policy up close throughout much of this time—first as a social worker, then as a fraud investigator, and now as a welfare policy advisor for the city. The past 30 years have shown him that “the system is held together by rubber bands and staples,” he says. “And if you’re at the bottom of that system, you’re the first to fall through the cracks.”

Making the system work better for beneficiaries, he adds, was a large motivating factor when the city began designing Smart Check in 2019. “We wanted to do a fair check only on the people we [really] thought needed to be checked,” Bodaar says—in contrast to previous department policy, which until 2007 was to conduct home visits for every applicant. 

But he also knew that the Netherlands had become something of a ground zero for problematic welfare AI deployments. The Dutch government’s attempts to modernize fraud detection through AI had backfired on a few notorious occasions.

In 2019, it was revealed that the national government had been using an algorithm to create risk profiles that it hoped would help spot fraud in the child care benefits system. The resulting scandal saw nearly 35,000 parents, most of whom were migrants or the children of migrants, wrongly accused of defrauding the assistance system over six years. It put families in debt, pushed some into poverty, and ultimately led the entire government to resign in 2021.  

front page of Trouw from January 16, 2021

COURTESY OF TROUW

In Rotterdam, a 2023 investigation by Lighthouse Reports into a system for detecting welfare fraud found it to be biased against women, parents, non-native Dutch speakers, and other vulnerable groups, eventually forcing the city to suspend use of the system. Other cities, like Amsterdam and Leiden, used a system called the Fraud Scorecard, which was first deployed more than 20 years ago and included education, neighborhood, parenthood, and gender as crude risk factors to assess welfare applicants; that program was also discontinued.

The Netherlands is not alone. In the United States, there have been at least 11 cases in which state governments used algorithms to help disperse public benefits, according to the nonprofit Benefits Tech Advocacy Hub, often with troubling results. Michigan, for instance, falsely accused 40,000 people of committing unemployment fraud. And in France, campaigners are taking the national welfare authority to court over an algorithm they claim discriminates against low-income applicants and people with disabilities. 

This string of scandals, as well as a growing awareness of how racial discrimination can be embedded in algorithmic systems, helped fuel the growing emphasis on responsible AI. It’s become “this umbrella term to say that we need to think about not just ethics, but also fairness,” says Jiahao Chen, an ethical-AI consultant who has provided auditing services to both private and local government entities. “I think we are seeing that realization that we need things like transparency and privacy, security and safety, and so on.” 

The approach, based on a set of tools intended to rein in the harms caused by the proliferating technology, has given rise to a rapidly growing field built upon a familiar formula: white papers and frameworks from think tanks and international bodies, and a lucrative consulting industry made up of traditional power players like the Big 5 consultancies, as well as a host of startups and nonprofits. In 2019, for instance, the Organisation for Economic Co-operation and Development, a global economic policy body, published its Principles on Artificial Intelligence as a guide for the development of “trustworthy AI.” Those principles include building explainable systems, consulting public stakeholders, and conducting audits. 

But the legacy left by decades of algorithmic misconduct has proved hard to shake off, and there is little agreement on where to draw the line between what is fair and what is not. While the Netherlands works to institute reforms shaped by responsible AI at the national level, Algorithm Audit, a Dutch NGO that has provided ethical-AI auditing services to government ministries, has concluded that the technology should be used to profile welfare recipients only under strictly defined conditions, and only if systems avoid taking into account protected characteristics like gender. Meanwhile, Amnesty International, digital rights advocates like de Zwart, and some welfare recipients themselves argue that when it comes to making decisions about people’s lives, as in the case of social services, the public sector should not be using AI at all.

Amsterdam hoped it had found the right balance. “We’ve learned from the things that happened before us,” says Bodaar, the policy advisor, of the past scandals. And this time around, the city wanted to build a system that would “show the people in Amsterdam we do good and we do fair.”

Finding a better way

Every time an Amsterdam resident applies for benefits, a caseworker reviews the application for irregularities. If an application looks suspicious, it can be sent to the city’s investigations department—which could lead to a rejection, a request to correct paperwork errors, or a recommendation that the candidate receive less money. Investigations can also happen later, once benefits have been dispersed; the outcome may force recipients to pay back funds, and even push some into debt.

Officials have broad authority over both applicants and existing welfare recipients. They can request bank records, summon beneficiaries to city hall, and in some cases make unannounced visits to a person’s home. As investigations are carried out—or paperwork errors fixed—much-needed payments may be delayed. And often—in more than half of the investigations of applications, according to figures provided by Bodaar—the city finds no evidence of wrongdoing. In those cases, this can mean that the city has “wrongly harassed people,” Bodaar says. 

The Smart Check system was designed to avoid these scenarios by eventually replacing the initial caseworker who flags which cases to send to the investigations department. The algorithm would screen the applications to identify those most likely to involve major errors, based on certain personal characteristics, and redirect those cases for further scrutiny by the enforcement team.

If all went well, the city wrote in its internal documentation, the system would improve on the performance of its human caseworkers, flagging fewer welfare applicants for investigation while identifying a greater proportion of cases with errors. In one document, the city projected that the model would prevent up to 125 individual Amsterdammers from facing debt collection and save €2.4 million annually. 

Smart Check was an exciting prospect for city officials like de Koning, who would manage the project when it was deployed. He was optimistic, since the city was taking a scientific approach, he says; it would “see if it was going to work” instead of taking the attitude that “this must work, and no matter what, we will continue this.”

It was the kind of bold idea that attracted optimistic techies like Loek Berkers, a data scientist who worked on Smart Check in only his second job out of college. Speaking in a cafe tucked behind Amsterdam’s city hall, Berkers remembers being impressed at his first contact with the system: “Especially for a project within the municipality,” he says, it “was very much a sort of innovative project that was trying something new.”

Smart Check made use of an algorithm called an “explainable boosting machine,” which allows people to more easily understand how AI models produce their predictions. Most other machine-learning models are often regarded as “black boxes” running abstract mathematical processes that are hard to understand for both the employees tasked with using them and the people affected by the results. 

The Smart Check model would consider 15 characteristics—including whether applicants had previously applied for or received benefits, the sum of their assets, and the number of addresses they had on file—to assign a risk score to each person. It purposefully avoided demographic factors, such as gender, nationality, or age, that were thought to lead to bias. It also tried to avoid “proxy” factors—like postal codes—that may not look sensitive on the surface but can become so if, for example, a postal code is statistically associated with a particular ethnic group.

In an unusual step, the city has disclosed this information and shared multiple versions of the Smart Check model with us, effectively inviting outside scrutiny into the system’s design and function. With this data, we were able to build a hypothetical welfare recipient to get insight into how an individual applicant would be evaluated by Smart Check.  

This model was trained on a data set encompassing 3,400 previous investigations of welfare recipients. The idea was that it would use the outcomes from these investigations, carried out by city employees, to figure out which factors in the initial applications were correlated with potential fraud. 

But using past investigations introduces potential problems from the start, says Sennay Ghebreab, scientific director of the Civic AI Lab (CAIL) at the University of Amsterdam, one of the external groups that the city says it consulted with. The problem of using historical data to build the models, he says, is that “we will end up [with] historic biases.” For example, if caseworkers historically made higher rates of mistakes with a specific ethnic group, the model could wrongly learn to predict that this ethnic group commits fraud at higher rates. 

The city decided it would rigorously audit its system to try to catch such biases against vulnerable groups. But how bias should be defined, and hence what it actually means for an algorithm to be fair, is a matter of fierce debate. Over the past decade, academics have proposed dozens of competing mathematical notions of fairness, some of which are incompatible. This means that a system designed to be “fair” according to one such standard will inevitably violate others.

Amsterdam officials adopted a definition of fairness that focused on equally distributing the burden of wrongful investigations across different demographic groups. 

In other words, they hoped this approach would ensure that welfare applicants of different backgrounds would carry the same burden of being incorrectly investigated at similar rates. 

Mixed feedback

As it built Smart Check, Amsterdam consulted various public bodies about the model, including the city’s internal data protection officer and the Amsterdam Personal Data Commission. It also consulted private organizations, including the consulting firm Deloitte. Each gave the project its approval. 

But one key group was not on board: the Participation Council, a 15-member advisory committee composed of benefits recipients, advocates, and other nongovernmental stakeholders who represent the interests of the people the system was designed to help—and to scrutinize. The committee, like de Zwart, the digital rights advocate, was deeply troubled by what the system could mean for individuals already in precarious positions. 

Anke van der Vliet, now in her 70s, is one longtime member of the council. After she sinks slowly from her walker into a seat at a restaurant in Amsterdam’s Zuid neighborhood, where she lives, she retrieves her reading glasses from their case. “We distrusted it from the start,” she says, pulling out a stack of papers she’s saved on Smart Check. “Everyone was against it.”

For decades, she has been a steadfast advocate for the city’s welfare recipients—a group that, by the end of 2024, numbered around 35,000. In the late 1970s, she helped found Women on Welfare, a group dedicated to exposing the unique challenges faced by women within the welfare system.

City employees first presented their plan to the Participation Council in the fall of 2021. Members like van der Vliet were deeply skeptical. “We wanted to know, is it to my advantage or disadvantage?” she says. 

Two more meetings could not convince them. Their feedback did lead to key changes—including reducing the number of variables the city had initially considered to calculate an applicant’s score and excluding variables that could introduce bias, such as age, from the system. But the Participation Council stopped engaging with the city’s development efforts altogether after six months. “The Council is of the opinion that such an experiment affects the fundamental rights of citizens and should be discontinued,” the group wrote in March 2022. Since only around 3% of welfare benefit applications are fraudulent, the letter continued, using the algorithm was “disproportionate.”

De Koning, the project manager, is skeptical that the system would ever have received the approval of van der Vliet and her colleagues. “I think it was never going to work that the whole Participation Council was going to stand behind the Smart Check idea,” he says. “There was too much emotion in that group about the whole process of the social benefit system.” He adds, “They were very scared there was going to be another scandal.” 

But for advocates working with welfare beneficiaries, and for some of the beneficiaries themselves, the worry wasn’t a scandal but the prospect of real harm. The technology could not only make damaging errors but leave them even more difficult to correct—allowing welfare officers to “hide themselves behind digital walls,” says Henk Kroon, an advocate who assists welfare beneficiaries at the Amsterdam Welfare Association, a union established in the 1970s. Such a system could make work “easy for [officials],” he says. “But for the common citizens, it’s very often the problem.” 

Time to test 

Despite the Participation Council’s ultimate objections, the city decided to push forward and put the working Smart Check model to the test. 

The first results were not what they’d hoped for. When the city’s advanced analytics team ran the initial model in May 2022, they found that the algorithm showed heavy bias against migrants and men, which we were able to independently verify. 

As the city told us and as our analysis confirmed, the initial model was more likely to wrongly flag non-Dutch applicants. And it was nearly twice as likely to wrongly flag an applicant with a non-Western nationality than one with a Western nationality. The model was also 14% more likely to wrongly flag men for investigation. 

In the process of training the model, the city also collected data on who its human case workers had flagged for investigation and which groups the wrongly flagged people were more likely to belong to. In essence, they ran a bias test on their own analog system—an important way to benchmark that is rarely done before deploying such systems. 

What they found in the process led by caseworkers was a strikingly different pattern. Whereas the Smart Check model was more likely to wrongly flag non-Dutch nationals and men, human caseworkers were more likely to wrongly flag Dutch nationals and women. 

The team behind Smart Check knew that if they couldn’t correct for bias, the project would be canceled. So they turned to a technique from academic research, known as training-data reweighting. In practice, that meant applicants with a non-Western nationality who were deemed to have made meaningful errors in their applications were given less weight in the data, while those with a Western nationality were given more.

Eventually, this appeared to solve their problem: As Lighthouse’s analysis confirms, once the model was reweighted, Dutch and non-Dutch nationals were equally likely to be wrongly flagged. 

De Koning, who joined the Smart Check team after the data was reweighted, said the results were a positive sign: “Because it was fair … we could continue the process.” 

The model also appeared to be better than caseworkers at identifying applications worthy of extra scrutiny, with internal testing showing a 20% improvement in accuracy.

Buoyed by these results, in the spring of 2023, the city was almost ready to go public. It submitted Smart Check to the Algorithm Register, a government-run transparency initiative meant to keep citizens informed about machine-learning algorithms either in development or already in use by the government.

For de Koning, the city’s extensive assessments and consultations were encouraging, particularly since they also revealed the biases in the analog system. But for de Zwart, those same processes represented a profound misunderstanding: that fairness could be engineered. 

In a letter to city officials, de Zwart criticized the premise of the project and, more specifically, outlined the unintended consequences that could result from reweighting the data. It might reduce bias against people with a migration background overall, but it wouldn’t guarantee fairness across intersecting identities; the model could still discriminate against women with a migration background, for instance. And even if that issue were addressed, he argued, the model might still treat migrant women in certain postal codes unfairly, and so on. And such biases would be hard to detect.

“The city has used all the tools in the responsible-AI tool kit,” de Zwart told us. “They have a bias test, a human rights assessment; [they have] taken into account automation bias—in short, everything that the responsible-AI world recommends. Nevertheless, the municipality has continued with something that is fundamentally a bad idea.”

Ultimately, he told us, it’s a question of whether it’s legitimate to use data on past behavior to judge “future behavior of your citizens that fundamentally you cannot predict.” 

Officials still pressed on—and set March 2023 as the date for the pilot to begin. Members of Amsterdam’s city council were given little warning. In fact, they were only informed the same month—to the disappointment of Elisabeth IJmker, a first-term council member from the Green Party, who balanced her role in municipal government with research on religion and values at Amsterdam’s Vrije University. 

“Reading the words ‘algorithm’ and ‘fraud prevention’ in one sentence, I think that’s worth a discussion,” she told us. But by the time that she learned about the project, the city had already been working on it for years. As far as she was concerned, it was clear that the city council was “being informed” rather than being asked to vote on the system. 

The city hoped the pilot could prove skeptics like her wrong.

Upping the stakes

The formal launch of Smart Check started with a limited set of actual welfare applicants, whose paperwork the city would run through the algorithm and assign a risk score to determine whether the application should be flagged for investigation. At the same time, a human would review the same application. 

Smart Check’s performance would be monitored on two key criteria. First, could it consider applicants without bias? And second, was Smart Check actually smart? In other words, could the complex math that made up the algorithm actually detect welfare fraud better and more fairly than human caseworkers? 

It didn’t take long to become clear that the model fell short on both fronts. 

While it had been designed to reduce the number of welfare applicants flagged for investigation, it was flagging more. And it proved no better than a human caseworker at identifying those that actually warranted extra scrutiny. 

What’s more, despite the lengths the city had gone to in order to recalibrate the system, bias reemerged in the live pilot. But this time, instead of wrongly flagging non-Dutch people and men as in the initial tests, the model was now more likely to wrongly flag applicants with Dutch nationality and women. 

Lighthouse’s own analysis also revealed other forms of bias unmentioned in the city’s documentation, including a greater likelihood that welfare applicants with children would be wrongly flagged for investigation. (Amsterdam officials did not respond to a request for comment about this finding, nor other follow up questions about general critiques of the city’s welfare system.)

The city was stuck. Nearly 1,600 welfare applications had been run through the model during the pilot period. But the results meant that members of the team were uncomfortable continuing to test—especially when there could be genuine consequences. In short, de Koning says, the city could not “definitely” say that “this is not discriminating.” 

He, and others working on the project, did not believe this was necessarily a reason to scrap Smart Check. They wanted more time—say, “a period of 12 months,” according to de Koning—to continue testing and refining the model. 

They knew, however, that would be a hard sell. 

In late November 2023, Rutger Groot Wassink—the city official in charge of social affairs—took his seat in the Amsterdam council chamber. He glanced at the tablet in front of him and then addressed the room: “I have decided to stop the pilot.”

The announcement brought an end to the sweeping multiyear experiment. In another council meeting a few months later, he explained why the project was terminated: “I would have found it very difficult to justify, if we were to come up with a pilot … that showed the algorithm contained enormous bias,” he said. “There would have been parties who would have rightly criticized me about that.” 

Viewed in a certain light, the city had tested out an innovative approach to identifying fraud in a way designed to minimize risks, found that it had not lived up to its promise, and scrapped it before the consequences for real people had a chance to multiply. 

But for IJmker and some of her city council colleagues focused on social welfare, there was also the question of opportunity cost. She recalls speaking with a colleague about how else the city could’ve spent that money—like to “hire some more people to do personal contact with the different people that we’re trying to reach.” 

City council members were never told exactly how much the effort cost, but in response to questions from MIT Technology Review, Lighthouse, and Trouw on this topic, the city estimated that it had spent some €500,000, plus €35,000 for the contract with Deloitte—but cautioned that the total amount put into the project was only an estimate, given that Smart Check was developed in house by various existing teams and staff members. 

For her part, van der Vliet, the Participation Council member, was not surprised by the poor result. The possibility of a discriminatory computer system was “precisely one of the reasons” her group hadn’t wanted the pilot, she says. And as for the discrimination in the existing system? “Yes,” she says, bluntly. “But we have always said that [it was discriminatory].” 

She and other advocates wished that the city had focused more on what they saw as the real problems facing welfare recipients: increases in the cost of living that have not, typically, been followed by increases in benefits; the need to document every change that could potentially affect their benefits eligibility; and the distrust with which they feel they are treated by the municipality. 

Can this kind of algorithm ever be done right?

When we spoke to Bodaar in March, a year and a half after the end of the pilot, he was candid in his reflections. “Perhaps it was unfortunate to immediately use one of the most complicated systems,” he said, “and perhaps it is also simply the case that it is not yet … the time to use artificial intelligence for this goal.”

“Niente, zero, nada. We’re not going to do that anymore,” he said about using AI to evaluate welfare applicants. “But we’re still thinking about this: What exactly have we learned?”

That is a question that IJmker thinks about too. In city council meetings she has brought up Smart Check as an example of what not to do. While she was glad that city employees had been thoughtful in their “many protocols,” she worried that the process obscured some of the larger questions of “philosophical” and “political values” that the city had yet to weigh in on as a matter of policy. 

Questions such as “How do we actually look at profiling?” or “What do we think is justified?”—or even “What is bias?” 

These questions are, “where politics comes in, or ethics,” she says, “and that’s something you cannot put into a checkbox.”

But now that the pilot has stopped, she worries that her fellow city officials might be too eager to move on. “I think a lot of people were just like, ‘Okay, well, we did this. We’re done, bye, end of story,’” she says. It feels like “a waste,” she adds, “because people worked on this for years.”

CHANTAL JAHCHAN

In abandoning the model, the city has returned to an analog process that its own analysis concluded was biased against women and Dutch nationals—a fact not lost on Berkers, the data scientist, who no longer works for the city. By shutting down the pilot, he says, the city sidestepped the uncomfortable truth—that many of the concerns de Zwart raised about the complex, layered biases within the Smart Check model also apply to the caseworker-led process.

“That’s the thing that I find a bit difficult about the decision,” Berkers says. “It’s a bit like no decision. It is a decision to go back to the analog process, which in itself has characteristics like bias.” 

Chen, the ethical-AI consultant, largely agrees. “Why do we hold AI systems to a higher standard than human agents?” he asks. When it comes to the caseworkers, he says, “there was no attempt to correct [the bias] systematically.” Amsterdam has promised to write a report on human biases in the welfare process, but the date has been pushed back several times.

“In reality, what ethics comes down to in practice is: nothing’s perfect,” he says. “There’s a high-level thing of Do not discriminate, which I think we can all agree on, but this example highlights some of the complexities of how you translate that [principle].” Ultimately, Chen believes that finding any solution will require trial and error, which by definition usually involves mistakes: “You have to pay that cost.”

But it may be time to more fundamentally reconsider how fairness should be defined—and by whom. Beyond the mathematical definitions, some researchers argue that the people most affected by the programs in question should have a greater say. “Such systems only work when people buy into them,” explains Elissa Redmiles, an assistant professor of computer science at Georgetown University who has studied algorithmic fairness. 

No matter what the process looks like, these are questions that every government will have to deal with—and urgently—in a future increasingly defined by AI. 

And, as de Zwart argues, if broader questions are not tackled, even well-intentioned officials deploying systems like Smart Check in cities like Amsterdam will be condemned to learn—or ignore—the same lessons over and over. 

“We are being seduced by technological solutions for the wrong problems,” he says. “Should we really want this? Why doesn’t the municipality build an algorithm that searches for people who do not apply for social assistance but are entitled to it?”


Eileen Guo is the senior reporter for features and investigations at MIT Technology Review. Gabriel Geiger is an investigative reporter at Lighthouse Reports. Justin-Casimir Braun is a data reporter at Lighthouse Reports.

Additional reporting by Jeroen van Raalte for Trouw, Melissa Heikkilä for MIT Technology Review, and Tahmeed Shafiq for Lighthouse Reports. Fact checked by Alice Milliken. 

You can read a detailed explanation of our technical methodology here. You can read Trouw‘s companion story, in Dutch, here.

Why humanoid robots need their own safety rules

Last year, a humanoid warehouse robot named Digit set to work handling boxes of Spanx. Digit can lift boxes up to 16 kilograms between trolleys and conveyor belts, taking over some of the heavier work for its human colleagues. It works in a restricted, defined area, separated from human workers by physical panels or laser barriers. That’s because while Digit is usually steady on its robot legs, which have a distinctive backwards knee-bend, it sometimes falls. For example, at a trade show in March, it appeared to be capably shifting boxes until it suddenly collapsed, face-planting on the concrete floor and dropping the container it was carrying.

The risk of that sort of malfunction happening around people is pretty scary. No one wants a 1.8-meter-tall, 65-kilogram machine toppling onto them, or a robot arm accidentally smashing into a sensitive body part. “Your throat is a good example,” says Pras Velagapudi, chief technology officer of Agility Robotics, Digit’s manufacturer. “If a robot were to hit it, even with a fraction of the force that it would need to carry a 50-pound tote, it could seriously injure a person.”

Physical stability—i.e., the ability to avoid tipping over—is the No. 1 safety concern identified by a group exploring new standards for humanoid robots. The IEEE Humanoid Study Group argues that humanoids differ from other robots, like industrial arms or existing mobile robots, in key ways and therefore require a new set of standards in order to protect the safety of operators, end users, and the general public. The group shared its initial findings with MIT Technology Review and plans to publish its full report later this summer. It identifies distinct challenges, including physical and psychosocial risks as well as issues such as privacy and security, that it feels standards organizations need to address before humanoids start being used in more collaborative scenarios.    

While humanoids are just taking their first tentative steps into industrial applications, the ultimate goal is to have them operating in close quarters with humans; one reason for making robots human-shaped in the first place is so they can more easily navigate the environments we’ve designed around ourselves. This means they will need to be able to share space with people, not just stay behind protective barriers. But first, they need to be safe.

One distinguishing feature of humanoids is that they are “dynamically stable,” says Aaron Prather, a director at the standards organization ASTM International and the IEEE group’s chair. This means they need power in order to stay upright; they exert force through their legs (or other limbs) to stay balanced. “In traditional robotics, if something happens, you hit the little red button, it kills the power, it stops,” Prather says. “You can’t really do that with a humanoid.” If you do, the robot will likely fall—potentially posing a bigger risk.

Slower brakes

What might a safety feature look like if it’s not an emergency stop? Agility Robotics is rolling out some new features on the latest version of Digit to try to address the toppling issue. Rather than instantly depowering (and likely falling down), the robot could decelerate more gently when, for instance, a person gets too close. “The robot basically has a fixed amount of time to try to get itself into a safe state,” Velagapudi says. Perhaps it puts down anything it’s carrying and drops to its hands and knees before powering down.

Different robots could tackle the problem in different ways. “We want to standardize the goal, not the way to get to the goal,” says Federico Vicentini, head of product safety at Boston Dynamics. Vicentini is chairing a working group at the International Organization for Standardization (ISO) to develop a new standard dedicated to the safety of industrial robots that need active control to maintain stability (experts at Agility Robotics are also involved). The idea, he says, is to set out clear safety expectations without constraining innovation on the part of robot and component manufacturers: “How to solve the problem is up to the designer.”

Trying to set universal standards while respecting freedom of design can pose challenges, however. First of all, how do you even define a humanoid robot? Does it need to have legs? Arms? A head? 

“One of our recommendations is that maybe we need to actually drop the term ‘humanoid’ altogether,” Prather says. His group advocates a classification system for humanoid robots that would take into account their capabilities, behavior, and intended use cases rather than how they look. The ISO standard Vicentini is working on refers to all industrial mobile robots “with actively controlled stability.” This would apply as much to Boston Dynamics’ dog-like quadruped Spot as to its bipedal humanoid Atlas, and could equally cover robots with wheels or some other kind of mobility.

How to speak robot

Aside from physical safety issues, humanoids pose a communication challenge. If they are to share space with people, they will need to recognize when someone’s about to cross their path and communicate their own intentions in a way everyone can understand, just as cars use brake lights and indicators to show the driver’s intent. Digit already has lights to show its status and the direction it’s traveling in, says Velagapudi, but it will need better indicators if it’s to work cooperatively, and ultimately collaboratively, with humans. 

“If Digit’s going to walk out into an aisle in front of you, you don’t want to be surprised by that,” he says. The robot could use voice commands, but audio alone is not practical for a loud industrial setting. It could be even more confusing if you have multiple robots in the same space—which one is trying to get your attention?

There’s also a psychological effect that differentiates humanoids from other kinds of robots, says Prather. We naturally anthropomorphize robots that look like us, which can lead us to overestimate their abilities and get frustrated if they don’t live up to those expectations. “Sometimes you let your guard down on safety, or your expectations of what that robot can do versus reality go higher,” he says. These issues are especially problematic when robots are intended to perform roles involving emotional labor or support for vulnerable people. The IEEE report recommends that any standards should include emotional safety assessments and policies that “mitigate psychological stress or alienation.”

To inform the report, Greta Hilburn, a user-centered designer at the US Defense Acquisition University, conducted surveys with a wide range of non-engineers to get a sense of their expectations around humanoid robots. People overwhelmingly wanted robots that could form facial expressions, read people’s micro-expressions, and use gestures, voice, and haptics to communicate. “They wanted everything—something that doesn’t exist,” she says.

Escaping the warehouse

Getting human-robot interaction right could be critical if humanoids are to move out of industrial spaces and into other contexts, such as hospitals, elderly care environments, or homes. It’s especially important for robots that may be working with vulnerable populations, says Hilburn. “The damage that can be done within an interaction with a robot if it’s not programmed to speak in a way to make a human feel safe, whether it be a child or an older adult, could certainly have different types of outcomes,” she says.

The IEEE group’s recommendations include enabling a human override, standardizing some visual and auditory cues, and aligning a robot’s appearance with its capabilities so as not to mislead users. If a robot looks human, Prather says, people will expect it to be able to hold a conversation and exhibit some emotional intelligence; if it can actually only do basic mechanical tasks, this could cause confusion, frustration, and a loss of trust. 

“It’s kind of like self-checkout machines,” he says. “No one expects them to chat with you or help with your groceries, because they’re clearly machines. But if they looked like a friendly employee and then just repeated ‘Please scan your next item,’ people would get annoyed.”

Prather and Hilburn both emphasize the need for inclusivity and adaptability when it comes to human-robot interaction. Can a robot communicate with deaf or blind people? Will it be able to adapt to waiting slightly longer for people who may need more time to respond? Can it understand different accents?

There may also need to be some different standards for robots that operate in different environments, says Prather. A robot working in a factory alongside people trained to interact with it is one thing, but a robot designed to help in the home or interact with kids at a theme park is another proposition. With some general ground rules in place, however, the public should ultimately be able to understand what robots are doing wherever they encounter them. It’s not about being prescriptive or holding back innovation, he says, but about setting some basic guidelines so that manufacturers, regulators, and end users all know what to expect: “We’re just saying you’ve got to hit this minimum bar—and we all agree below that is bad.”

The IEEE report is intended as a call to action for standards organizations, like Vicentini’s ISO group, to start the process of defining that bar. It’s still early for humanoid robots, says Vicentini—we haven’t seen the state of the art yet—but it’s better to get some checks and balances in place so the industry can move forward with confidence. Standards help manufacturers build trust in their products and make it easier to sell them in international markets, and regulators often rely on them when coming up with their own rules. Given the diversity of players in the field, it will be difficult to create a standard everyone agrees on, Vicentini says, but “everybody equally unhappy is good enough.”

The Download: Amsterdam’s welfare AI experiment, and making humanoid robots safer

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Inside Amsterdam’s high-stakes experiment to create fair welfare AI

Amsterdam thought it was on the right track. City officials in the welfare department believed they could build technology that would prevent fraud while protecting citizens’ rights. They followed these emerging best practices and invested a vast amount of time and money in a project that eventually processed live welfare applications. But in their pilot, they found that the system they’d developed was still not fair and effective. Why?

Lighthouse Reports, MIT Technology Review, and the Dutch newspaper Trouw have gained unprecedented access to the system to try to find out. Read about what we discovered.

—Eileen Guo, Gabriel Geiger & Justin-Casimir Braun

This story is a partnership between MIT Technology Review, Lighthouse Reports, and Trouw, and was supported by the Pulitzer Center. 

+ Can you make AI fairer than a judge? Play our courtroom algorithm game to find out.

Why humanoid robots need their own safety rules

While humanoid robots are taking their first tentative steps into industrial applications, the ultimate goal is to have them operating in close quarters with humans.

One reason for making robots human-shaped in the first place is so they can more easily navigate the environments we’ve designed around ourselves. This means they will need to be able to share space with people, not just stay behind protective barriers. But first, they need to be safe. Read the full story.

—Victoria Turk

MIT Technology Review Narrated: The surprising barrier that keeps us from building the housing we need

Sure, there’s too much red tape, but there is another reason building anything is so expensive: the construction industry’s “awful” productivity.

This is our latest story to be turned into a MIT Technology Review Narrated podcast, which 
we’re publishing each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Chatbots are getting facts about the LA riots wrong
AI systems can’t be relied upon at the best of times, let alone with fast-moving news. (Wired $)
+ What’s Trump’s goal here, exactly? (NY Mag $)

2 Gavin Newsom is becoming a meme
The California governor’s Trump clapbacks are winning him a legion of online fans. (WP $)
+ He’s accused the President of “pulling a military dragnet” across the city. (The Guardian)
+ Newsom has warned that other states are likely to be next. (Politico)

3 Trump’s Big Beautiful Bill could lead to more than 51,000 deaths a year
Due to the bill’s provisions for public health insurance. (Undark)

4 How Ukraine’s AI-guided drones hit Russia’s airfields
But its opponent is also stepping up its AI capabilities. (FT $)
+ Meet the radio-obsessed civilian shaping Ukraine’s drone defense. (MIT Technology Review)

5 US agencies tracked foreign nationals travelling to Elon Musk
Officials kept an eye on who visited him in 2022 and 2023. (WSJ $)

6 Snap’s new AR smart glasses will go on sale next year
Its sixth generation of Specs will enter an increasingly crowded field. (CNBC)
+ Qualcomm has made a new processor to power similar glasses. (Bloomberg $)
+ What’s next for smart glasses. (MIT Technology Review)

7 Each ChatGPT query uses ‘roughly one fifteenth of a teaspoon’ of water
That’s according to Sam Altman, at least. (The Verge)
+ We did the math on AI’s energy footprint. Here’s the story you haven’t heard. (MIT Technology Review)

8 Death Valley’s air could be a valuable water source
Scientists proved their hydrogel method worked in the real world. (New Scientist $)

9 Gen Z is choosing to skip college entirely
Increasing numbers of young tech workers are opting out and entering the workforce early. (Insider $)

10 How to fight back against a world of AI-generated choices
Good taste is your friend here. (The Atlantic $)

Quote of the day

“We’re probably going to have flying taxis before we have autonomous ones in London.”

—Steve McNamara, the general secretary of the UK’s Licensed Taxi Drivers’ Association, isn’t optimistic about London’s plans to trial autonomous cars, he tells the Guardian.

One more thing

Exosomes are touted as a trendy cure-all. We don’t know if they work.

There’s a trendy new cure-all in town—you might have seen ads pop up on social media or read rave reviews in beauty magazines.

Exosomes are being touted as a miraculous treatment for hair loss, aging skin, acne, eczema, pain conditions, long covid, and even neurological diseases like Parkinson’s and Alzheimer’s. That’s, of course, if you can afford the price tag—which can stretch to thousands of dollars.

But there’s a big problem with these big promises: We don’t fully understand how exosomes work—or what they even really are. Read our story.

—Jessica Hamzelou

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Here’s how to tap into your flow state and get things DONE.
+ Check out these must-see art shows and exhibitions of the year.
+ Everybody’s free (to listen to one of the best hits of the 90s) ☀
+ Turns out 10CC frontman Graham Gouldman doesn’t just like cricket—he’s just watched his first ever game and he really does love it 🏏

How Google Ads Fits into AI Overviews and AI Mode

“Marketing Live” is Google’s annual virtual event showcasing new products, formats, and tips. This year’s program took place last month, when Google announced over 40 updates to Ads, YouTube, and data measurement, all of which were mostly AI-driven.

I will focus this post on changes to Google Ads within AI Overviews and AI Mode.

AI Overviews, AI Mode

AI Overviews is Google’s generative AI feature that answers queries entirely in search results. Google introduced the feature in 2024. AI Overviews summarizes solutions from across the web and occasionally cites those sources for further research. For example, searching “how to clean an oven” could trigger an AI Overview that includes a list of external sites with bullets, videos, and other formats.

To date, AI Overviews have summarized mostly organic listings. At last month’s Marketing Live, Google announced that Overviews would show more ads. For instance, a “how to clean an oven” search could trigger ads for cleaning products.

Screenshot of Google Shopping ads in AI Overviews

A search for “how to clean an oven” could trigger Shopping ads in AI Overviews. Image from Google.

AI Mode extends Overviews to anticipate searchers’ intent and likely follow-up questions beyond the initial query. Google refers to these additional responses as “fan-out” results.

Google increasingly generates search results based on users’ intent, not their keywords. AI Overviews and AI Mode follow this trend.

AI for paid search

At Marketing Live, Google execs stated that both Overviews and AI Mode can include ads. Searchers’ intent triggers those ads, not keywords alone. I’ve repeatedly addressed this evolution. Google continues to promote broad match keywords with campaign types that lessen reliance on keywords, viewing such words and phrases as broad themes.

For example, Google’s newly created AI Max for Search campaign type (i) requires only broad match keywords, (ii) generates dynamic ad copy, and (iii) selects an advertiser’s landing page likely to yield the best performance.

Google’s AI aims to answer searchers’ needs. Advertisers can help by providing audience signals such as first-party data and custom segments.

Site content

Engaging, quality website content has always driven conversions. It’s now more important than ever for paid search because Google displays the text in the ads.

A Google Ads setting in most campaign types allows it to automatically create assets based on the content from (i) landing pages, (ii) the site’s domain, and (iii) existing ads.

The setting is optional, but advertisers should opt in to take full advantage of AI.

Most advertisers already focus on organic search optimization and produce quality content. There’s no need to reinvent the wheel.

The presentations at Marketing Live reinforced our understanding that ads in Google are dynamic. Advertisers still need to write compelling copy, but AI will tailor those ads to each user. The stronger the site content, the better the ads.

Google Removes Robots.txt Guidance For Blocking Auto-Translated Pages via @sejournal, @MattGSouthern

Google removes robots.txt guidance for blocking auto-translated pages. This change aligns Google’s technical documents with its spam policies.

  • Google removed guidance advising websites to block auto-translated pages via robots.txt.
  • This aligns with Google’s policies that judge content by user value, not creation method.
  • Use meta tags like “noindex” for low-quality translations instead of sitewide exclusions.
Is Google About To Bury Your Website? [Webinar] via @sejournal, @lorenbaker

The new AI Mode is rewriting the rules of search. Are you ready?

Google’s AI-generated answers are starting to dominate the SERPs, pushing traditional results further down the page. If your business relies on organic traffic, you can’t afford to ignore this shift.

Join us on June 25, 2025, for an expert-led webinar sponsored by Conductor. Get actionable strategies from Nick Gallagher, SEO Lead at Conductor, to help you adapt fast and stay ahead of the curve.

What you’ll learn:

  • Spot the queries most likely to trigger AI Overviews.
  • Identify industries seeing the biggest changes in traffic.
  • Audit which brands are being highlighted in AI answers.
  • Update your SEO game plan to stay visible.
  • Track and interpret shifts in traffic and performance metrics.

Why this matters now:

Traditional SEO tactics are no longer enough. Understanding how AI Mode works and knowing how to respond could be the difference between steady growth and a sharp drop in traffic.

Don’t let AI Mode catch you off guard.

Register today to secure your spot. Can’t make it live? Sign up anyway, and we’ll send you the full recording.

Paid Media Reporting For Ecommerce: Navigating Attribution Across Paid

Global advertising expenditure has surpassed the $1 trillion mark for the first time.

Digital advertising continues to dominate this growth, with digital channels encompassing search and social media forecast to account for 72.9% of total ad revenue by the end of the year.

From a platform perspective, Google, Meta, Amazon, and Alibaba are expected to capture more than half of global ad revenues this year.

In-house and agency-side paid media teams are working harder than ever to grow ecommerce businesses efficiently, and the amount of data being used day-to-day (even hour-to-hour) is enormous.

With this growth and investment, something is clearly working, and given that brands can map new/returning audiences to their advertising funnel and serve ads across billions of auctions, it’s a lever that millions of businesses pull.

However, with budgets being split across channels (search, social, out-of-home, etc) and brands using CRM data, analytics platforms, third-party attribution tools, and more to define their “source of truth,” fragmentation begins to appear with reporting. Only 32% of executives feel they fully capitalize on their performance marketing data for this reason.

With data being spread across several sources, ad platforms having different attribution models, and the C-suite likely asking, “Which source of truth is correct?”, reporting paid media performance for ecommerce isn’t the most straightforward task.

This post digs into key performance indicators, platform attribution & modeling, business goals, and how to bring it all together for a holistic view of your advertising efficacy.

Key Performance Indicators (KPIs)

To begin navigating paid media reporting, it starts with the KPIs that each account optimizes towards and how this feeds into channel performance.

Each of these has purpose, benefits, limitations, and practical use cases that should be viewed through a lens of attribution unique to each platform.

Short-Term Performance

Return On Ad Spend (ROAS)

  • Definition: revenue/cost.

This metric measures the revenue generated for every dollar spent on advertising.

If your total ad cost was $1,000 and you drove $18,500 revenue, your ROAS would be 18.5.

  • Benefits: Direct measure of advertising efficiency and helps provide a snapshot of campaign profitability.
  • Limitations: Does not account for customer acquisition costs (CACs), margin, LTV, returns, shipping, etc.

Cost Per Acquisition (CPA)

  • Definition: cost/sales or leads.

This metric shows the average cost to generate a sale (or lead, depending on the goal, e.g., an ecommerce brand could be measuring using CPA to sign up new customers for an event).

For example, if your total ad cost was $5,000 and you drove 180 sales, your CPA would be $ 27.77.

  • Benefits: Easy to monitor over time and helps assess efficiency.
  • Limitations: Neglects revenue, customer acquisition cost, margin, LTV, etc., and treats all sales equally regardless of value.

Cost Of Sale (CoS)

  • Definition: total ad spend/revenue.

This metric measures what % of revenue is spent on advertising.

Say a brand spends $20,000 on Meta Ads and generates £100,000 in revenue, their resulting CoS would be 20%.

  • Benefits: Useful for margin-sensitive businesses and marketplaces where prices and/or Average Order Value (AOV) are volatile.
  • Limitations: Can mask unprofitable sales (in some scenarios) if margin, returns, shipping, etc., are not considered.

Mid-Term Efficiency

Customer Acquisition Cost (CAC)

  • Definition: total marketing costs spent on acquiring new customers/total number of new customers.
  • Detailed definition: total marketing costs spent on acquiring new customers + wages + software costs + agency/consultancy fees + overheads/total number of new customers.

This metric may reflect either marketing costs associated with driving new customer acquisition or a holistic view of all costs associated with acquiring new customers.

Let’s say a business has a CAC of $175 and an AOV of $58, they will need each new customer to repeat purchase ~3x to make acquisition profitable.

  • Benefits: Holistic view of acquisition cost, ideal for longer-term profitability analysis for paid media investment.
  • Limitations: Not always the most suitable for channel-specific reporting (think account structuring, audiences, etc.), and can be a lagging metric as it doesn’t reflect short-term changes in performance like ROAS or CPA would.

Marketing Efficiency Ratio (MER)

  • Definition: Sometimes referred to as blended ROAS, MER is calculated by dividing total revenue/total ad spend across all channels.

This metric shows how efficiently your total ad spend is converting into revenue, regardless of the channel.

Where MER is especially useful is when brands are active on multiple ad networks, all of which contribute in some way to the final sale, and where siloed platform attribution is inconsistent.

  • Benefits: Captures topline performance from a transactional perspective and simplifies multi-channel reporting.
  • Limitations: Neglects exactly where the sales and revenue came from and obscures channel efficiency, especially important for search, social, etc.

Long-Term Strategic

Customer Lifetime Value (CLV Or CLTV)

  • Definition: This metric estimates the total net revenue a customer brings over their relationship with a brand.

Used alongside CAC, this metric is essential for understanding the true value of both acquisition and retention, which is important for almost all ecommerce models, and especially important for brands looking to capitalize on repeat purchases and subscription-based models.

  • Benefits: Builds a foundation for tying performance marketing to long-term outcomes while helping give room to CAC targets across valuable customer segments.
  • Limitations: Takes a fair amount of work to get set up and maintain, in addition to requiring a clean cohort and repeat purchase data. Additionally, when brands introduce new products/services, it can be hard to forecast accurate CLV numbers, and it will take time.

So, which one should you be reporting on for your ecommerce brand?

Speaking from experience, there isn’t a right or wrong answer, nor is there a blueprint for which KPIs you should be reporting on.

Having a multifaceted approach will enable more informed decision making, combining short-, medium-, and long-term KPIs to form a holistic model for measuring performance that feeds into your reports.

However, even after choosing your KPIs, different attribution models across advertising platforms add another layer of complexity, as does the ever-evolving customer journey involving multiple touchpoints across devices, channels, etc.

The Ad Platforms

Each ad platform handles attribution and tracking differently.

Take Google Ads, for example, the default model is Data-Driven Attribution (DDA), and when using the Google Ads pixel, only paid channels receive credit.

Then, with a GA4 integration to Google Ads, both paid and organic are eligible to receive credit for sales.

Click-through windows, value, count, etc, can all be customised to provide a view of performance that feeds into your Google Ads campaigns.

Using the Google Ads pixel, say a user clicks a shopping ad, then a search ad, and then returns via organic to make the purchase, 40% of the credit could go to shopping, and 60% to the search ad.

With the GA4 integrated conversion, shopping could receive 30%, search 40%, and organic visit 30%, resulting in 70% of the value being attributed back to the campaigns in-platform.

Now, comparing this to Meta Ads, which uses a seven-day click and one-day view attribution window by default, when a user converts within this time frame, 100% of the credit will be attributed to Meta.

This is why the narrative for conversion tracking on Meta is one of overrepresentation, with brands seeing inflated revenue numbers vs. other channels, even more so with loose audience targeting, where campaign types such as ASC can serve assets to audiences who have already interacted with your brand.

Then, when you dig into third-party analytics, the comparisons between Google Ads, Meta Ads, Pinterest Ads, etc., are almost the complete opposite.

So, what should this data be used for, and how does it factor into the bigger picture?

In-platform metrics are best viewed as directional.

They help optimize within the walls of that specific platform to identify high-performing audiences, auctions, creatives, and placements, but they rarely reflect the true incremental value of paid media to your business.

The data in Google, Meta, Pinterest, etc. is a platform-specific lens on performance, and the goal shouldn’t be to pick one or ignore these metrics.

It should be to interpret these for what they are and how they play into the overarching strategy.

The Bigger Picture

KPIs such as ROAS and CPA offer immediate insights but provide a fragmented view of paid media performance.

To gain a comprehensive understanding, brands must combine medium- to long-term KPIs with broader modeling and tests that account for the multifaceted nature of performance marketing, while considering how complex customer journeys are in this day and age.

Marketing Mix Modeling (MMM)

Introduced in the 1950s, MMM is a statistical analysis that evaluates the effectiveness of marketing channels over time.

By analyzing historical data, MMM helps advertisers understand how different marketing activities contribute to sales and can guide budget allocation.

A 2024 Nielsen study found that 30% of global marketers cite MMM as their preferred method of measuring holistic ROI.

The very short version of how to get started with MMM includes:

  1. Collecting aggregated data (roughly speaking, at least two years of weekly data across all channels, mapped out with every possible variable (e.g., pricing, promotions, weather, social trends, etc.)
  2. Defining the dependent variable, which for ecommerce will be sales or revenue.
  3. Run regression modeling to isolate the contribution of each variable to sales (adjusting for overlaps, lags, etc.)
  4. Analyze, optimize, and report on the coefficients to understand the relative impact and ROI of your paid media activity as whole.

Unlike platform attribution, this doesn’t rely on user-level tracking, which is especially useful with privacy restrictions now and in the future.

From a tactical standpoint, your chosen KPIs will still lead campaign optimizations for your day-to-day management, but at a macro level, MMM will determine where to invest your budget and why.

Incrementality Testing

Instead of relying on attribution models, this uses controlled experiments to isolate the impact of your paid media campaigns on actual business outcomes.

This kind of testing aims to answer the question, “Would these sales have happened without the paid media investment?”.

This involves:

  1. Defining an objective or independent variable (e.g., sales, revenue, etc.)
  2. Creating test and control groups. This could be by audience or geography – one will be exposed to the campaigns and the other will not.
  3. Run the experiment while keeping all conditions equal across both groups.
  4. Compare the outcomes, analyze performance, and calculate the impact.

This isn’t one that’s run every week, but from a strategic point of view, these tests help to validate the actual performance of paid media and direct where and what spend should be allocated across ad platforms.

Operational Factors

These are equally as important (if not more) for ecommerce reporting and absolutely need to be considered when setting KPIs and beginning to think about modeling, testing, etc.

  • Product margin.
  • AOV variability.
  • Shipping costs.
  • Returns rates.
  • Repeat rates.
  • Discounting and promotions.
  • Cancelled and/or failed payments.
  • Stock availability.
  • Attribute availability (e.g., size, color, model).
  • Pixels and tracking.

Without considering these factors, brands will use inaccurate data from the get-go.

Think about the impact of buy now, pay later. Providers such as Klarna or Clearpay can lead to higher return rates, as bundle buying and impulsive purchases become more accessible.

Without considering operational factors, using this example and a basic in-platform ROAS, brands would be optimizing toward incorrect checkout data with higher AOV’s and no consideration of returns, restocking, etc.

Ultimately, building a true picture of paid media performance means stepping beyond the platform KPIs and metrics to consider all factors involved and how best to model the data to uncover not just “what” is happening, but “why” it is and how this impacts the wider business.

Bringing It All Together

No single tool or model tells the full story.

You’ll need to compare platform data, internal analytics, and external modeling to build a more reliable view of performance.

The first step is getting watertight KPIs nailed down that consider every possible operational factor so you know the platforms are being fed the correct data, and if you need to modify these based on platform nuances due to differing attribution models, do it.

Once these are nailed down, find a model that you trust and that will show you the holistic impact of your paid media spend on overall business performance.

You could explore the use of third-party attribution tools that aim to blend data together, but even with these, you’ll still require clear and accurate KPIs and reliable tracking.

Then, when it comes to the visual side of reporting, the world is your oyster.

Looker Studio, Tableau, and Datorama are among the long list of well-known platforms, and with most brands using three to four business intelligence tools and 67% of analysts relying on multiple dashboards, don’t stress if you can’t get everything under one lens.

When all of this is executed and made into a priority over the short-term ebbs and flows of paid media performance, this is the point where connecting media spend to profit begins.

More Resources:


Featured Image: Surasak_Ch/Shutterstock

Google AI Mode: First Thoughts & Survival Strategies

The new AI Mode tab in Google’s results, currently only active in the U.S., enables users to get an AI-generated answer to their query.

You can ask a detailed question in AI Mode, and Google will provide a summarized answer.

Google AI Mode answer for the question “what are the best ways to grow your calf muscles”Google AI Mode answer for the question [what are the best ways to grow your calf muscles], providing a detailed summary of exercises and tips (Image Credit: Barry Adams)

Google explains how it generates these answers in some recently published documentation.

The critical process is what Google calls a “query fan-out” technique, where many related queries are performed in the background.

The results from these related queries are collected, summarized, and integrated into the AI-generated response to provide more detail, accuracy, and usefulness.

Having played with AI Mode since its launch, I have to admit it’s pretty good. I get useful answers, often with detailed explanations that give me the information I am looking for. It also means I have less need to click through to cited source websites.

I have to admit that, in many cases, I find myself reluctant to click on a source webpage, even when I want additional information. It’s simpler to ask AI Mode a follow-up question rather than click to a webpage.

Much of the web has become quite challenging to navigate. Clicking on an unknown website for the first time means having to brave a potential gauntlet of cookie-consent forms, email signup pop-ups, app install overlays, autoplay videos, and a barrage of intrusive ads.

The content you came to the page for is frequently hidden behind several barriers-to-entry that the average user will only persist with if they really want to read that content.

And then in many cases, the content isn’t actually there, or is incomplete and not quite what the user was looking for.

AI Mode removes that friction. You get most of the content directly in the AI-generated answer.

You can still click to a webpage, but often it’s easier to simply ask the AI a more specific follow-up question. No need to brave unusable website experiences and risk incomplete content after all.

AI Mode & News

Contrary to AI Overviews, AI Mode will provide summaries for almost any query, including news-specific queries:

AI Mode answer for the ‘latest news’ queryAI Mode answer for the [latest news] query (Image Credit: Barry Adams)

Playing with AI Mode, I’ve seen some answers to news-specific queries that don’t even cite news sources, but link only to Wikipedia.

For contrast, the regular Google SERP for the same query features a rich Top Stories box with seven news stories.

With these types of results in AI Mode, the shelf life of news is reduced even further.

Where in search, you can rely on a Top Stories news box to persist for a few days after a major news event, in AI Mode, news sources can be rapidly replaced by Wikipedia links. This further reduces the traffic potential to news publishers.

A Google SERP for ‘who won roland garros 2025’ with a rich Top Stories box vs the AI Mode answer linking only to Wikipedia A Google SERP for [who won roland garros 2025] with a rich Top Stories box vs. the AI Mode answer linking only to Wikipedia (Image Credit: Barry Adams)

There is some uncertainty about AI Mode’s traffic impact. I’ve seen examples of AI Mode answers that provide direct links to webpages in-line with the response, which could help drive clicks.

Google is certainly not done experimenting with AI Mode. We haven’t seen the final product yet, and because it’s an experimental feature that most users aren’t engaged with (see below), there’s not much data on CTR.

As an educated guess, the click-through rate from AI Mode answers to their cited sources is expected to be at least as low, and probably lower, as the CTR from AI Overviews.

This means publishers could potentially see their traffic from Google search decline by 50% or more.

AI Mode User Adoption

The good news is that user adoption of AI Mode appears to be low.

The latest data from Similarweb shows that after an initial growth, usage of the AI Mode tab on Google.com in the U.S. has slightly dipped and now sits at just over 1%.

This makes it about half as popular as the News tab, which is not a particularly popular tab within Google’s search results to begin with.

It could be that Google’s users are satisfied with AI Overviews and don’t need expanded answers in AI Mode, or that Google hasn’t given enough visual emphasis to AI Mode to drive a lot of usage.

I suspect that Google may try to make AI Mode more prominent, with perhaps allowing users to click from an AI Overview into AI Mode (the same way you can click from a Top Stories box to the News tab), or integrate it more prominently into their default SERP.

When user adoption of AI Mode increases, the impact will be keenly felt by publishers. Google’s CEO has reiterated their commitment to sending traffic to the web, but the reality appears to contradict that.

In some of their newest documentation about AI, Google strongly hints at diminished traffic and encourages publishers to “[c]onsider looking at various indicators of conversion on your site, be it sales, signups, a more engaged audience, or information lookups about your business.”.

AI Mode Survival Strategies

Broad adoption of AI Mode, whatever form that may take, can have several impactful consequences for web publishers.

Worst case scenario, most Google search traffic to websites will disappear. If AI Mode becomes the new default Google result, expect to see a collapse of clicks from search results to websites.

Focusing heavily on optimizing for visibility in AI answers will not save your traffic, as the CTR for cited sources is likely to be very low.

In my view, publishers have roughly three strategies for survival:

1. Google Discover

Google’s Discover feed may soften the blow somewhat, especially with the rollout onto desktop Chrome browsers.

Expanded presence of Discover on all devices with a Chrome browser gives more opportunities for publishers to be visible and drive traffic.

However, a reliance on Discover as a traffic source can encourage bad habits. Disregarding Discover’s inherent volatility, the unfortunate truth is that clickbait headlines and cheap churnalism do well in the Discover feed.

Reducing reliance on search in favor of Discover is not a strategy that lends itself well to quality journalism.

There’s a real risk that, in order to survive a search apocalypse, publishers will chase after Discover clicks at any cost. I doubt this will result in a victory for content quality.

2. Traffic & Revenue Diversification

Publishers need to grow traffic and income from more channels than just search. Due to Google’s enormous monopoly in search, diversified traffic acquisition has been a challenge.

Google is the gatekeeper of most of the web’s traffic, so of course we’ve been focused on maximising that channel.

With the risk of a greatly diminished traffic potential from Google search, other channels need to pick up the slack.

We already mentioned Discover and its risks, but there are more opportunities for publishing brands to drive readers and growth.

Paywalls seem inevitable for many publishers. While I’m a fan of freemium models, publishers will have to decide for themselves what kind of subscription model they want to implement.

A key consideration is whether your output is objectively worth paying for. This is a question few publishers can honestly answer, so unbiased external opinions will be required to make the right business decision.

Podcasts have become a cornerstone of many publishers’ audience strategies, and for good reason. They’re easy to produce, and you don’t need that many subscribers to make a podcast economically feasible.

Another content format that can drive meaningful growth is video, especially short-form video that has multiplatform potential (YouTube, TikTok, Instagram, Discover).

Email newsletters are a popular channel, and I suspect this will only grow. The way many journalists have managed to grow loyal audiences on Substack is testament to this channel’s potential.

And while social media hasn’t been a key traffic driver for many years, it can still send significant visitor numbers. Don’t sleep on those Facebook open graph headlines (also valuable for Discover).

3. Direct Brand Visits

The third strategy, and probably the most important one, is to build a strong publishing brand that is actively sought out by your audience.

No matter the features that Google or any other tech intermediary rolls out, when someone wants to visit your website, they will come to you directly. Not even Google’s AI Mode would prevent you from visiting a site you specifically ask for.

A brand search for ‘daily mail’ in Google AI Mode provides a link to the site’s homepage at the top of the response (Image credit: Barry AdamA brand search for [daily mail] in Google AI Mode provides a link to the site’s homepage at the top of the response (Image credit: Barry Adams)

Brand strength translates into audience loyalty.

A recognizable publisher will find it easier to convince its readers to install their dedicated app, subscribe to their newsletters, watch their videos, and listen to their podcasts.

A strong brand presence on the web is also, ironically, a cornerstone of AI visibility optimization.

LLMs are, after all, regurgitators of the web’s content, so if your brand is mentioned frequently on the web (i.e., in LLMs’ training data), you are more likely to be cited as a source in LLM-generated answers.

Exactly how to build a strong online publishing brand is the real question. Without going into specifics, I’ll repeat what I’ve said many times before: You need to have something that people are willing to actively seek out.

If you’re just another publisher writing the same news that others are also writing, without anything that makes you unique and worthwhile, you’re going to have a very bad time. The worst thing you can be as a publisher is forgettable.

There is a risk here, too. In an effort to cater to a specific target segment, a publisher could fall victim to “audience capture“: Feeding your audience what they want to hear rather than what’s true. We already see many examples of this, to the detriment of factual journalism.

It’s a dangerous pitfall that even the biggest news brands find difficult to navigate.

Optimizing For AI

In my previous article, I wrote a bit about how to optimize for AI Overviews.

I’ll expand on this in future articles with more tips, both technical and editorial, for optimizing for AI visibility.

More Resources:  


This post was originally published on SEO For Google News.


Featured Image: BestForBest/Shutterstock