Google CEO Sundar Pichai Discusses Fate Of The Human-Created Web via @sejournal, @martinibuster

Google’s CEO, Sundar Pichai, responded to concerns about the impact of recent changes in Search and was repeatedly asked to clarify his position on the web ecosystem and how it fits into what he calls the next chapter of search. Pichai’s responses were given in the context of a recent interview on the Lex Fridman podcast.

Google CEO’s Commitment To Web Ecosystem Challenged

Lex Fridman challenged Pichai on whether Google will continue sending users to the human-created web. Pichai responded that supporting the web ecosystem is something he feels deeply about.

Fridman said:

“And the idea that AI mode will still take you to the web, to the human-created web?”

Pichai responded:

“Yes, that’s going to be a core design principle for us.”

Fridman followed up by noting that he’s been asking more questions from Google’s AI Overviews and AI Mode and exploring but he still wants to end up on the “human-created web.”

Pichai responded:

“It helps us deliver higher quality referrals, right? You know where people are like they have a much higher likelihood of finding what they’re looking for. They’re exploring. They’re curious. Their intent is getting satisfied more… That’s what all our metrics show.”

The interviewer added:

“It makes the humans that create the web nervous. The journalists are getting they’ve already been nervous.”

Sundar Pichai answered:

“Look, I think news and journalism will play an important role, you know, in the future we’re pretty committed to it, right? And so I think making sure that ecosystem… In fact, I think we’ll be able to differentiate ourselves as a company over time because of our commitment there. So it’s something I think you know I definitely value a lot and as we are designing we’ll continue prioritizing approaches.”

AI Is The Next Chapter Of Search?

Pichai mentioned that user metrics of AI search are “encouraging” and referred to it as the “next chapter of search,” underlining that AI Search is an inevitability and is not going away.

Search technologies have consistently been in a steady state of change. The strongest effects were visible in the 2004 Florida update, the 2012 Penguin links update, the 2018 Medic update, and the more recent series of helpful content updates, all of which brought massive changes to search rankings. None of those changes are as ambitious and consequential as what the human-created web is facing with Google’s AI Overviews and AI Mode.

Speaking as someone who has been a part of search marketing for over 25 years, I believe Pichai may be understating the situation by calling it the next chapter in search. It may well be that Google AI Search is an entirely new book.

Search Is Evolving To More Context

Lex Fridman remarked on how Google was legendary for its simple layout and the ten blue links, saying that Google is starting to “mess with that” and that surely there must have been battles within Google about that.

Pichai subtly corrected Fridman’s suggestion that Google was moving away from the ten blue links, which hasn’t been a thing for nearly 15 years by stating that the shift to mobile is the reason why Google shifted away from ten blue links, evolving along with the pace of technological advancements and user’s expectations for answers, not links.

Pichai emphasized that Google remains the “front page of the Internet” as Fridman put it, because of their commitment to making it easier for users to explore the web, only with more context.

Pichai answered:

“Look… in some ways when mobile came… people wanted answers to more questions, so we’re …constantly evolving it. But you’re right, this moment, …that evolution, because underlying technology is becoming much more capable. You can have AI give a lot of context.

But one of our important design goals though, is when you come to Google search. You’re going to get a lot of context. But you’re going to go and find a lot of things out on the web. So that will be true in AI mode. In AI overviews and so on.

But I think to our earlier conversation, we are still giving you access to links, but think of the AI as a layer which is giving you context summary. Maybe in AI mode you can have a dialogue with it back and forth on your journey.

But through it all, you’re kind of learning what’s out there in the world. So those core principles don’t change, but I think AI mode allows us to push… we have our best models there, models which are using search as a deep tool.

Really, for every query you’re asking, fanning out doing multiple searches, assembling that knowledge in a way so you can go and consume what you want to and that’s how we think about it.”

Advertising In AI Mode

Something that isn’t immediately apparent is that Google treats advertising as a form of content that is relevant to users. Advertising is not seen as an intrusion but as something relevant to users within a context of their interests.

Fridman next asked him about advertising in AI Mode. Pichai responded that they are currently focusing on getting the “organic experience” right but he also turned to the concept of context.

Pichai’s response:

“Two things.

Early part of AI mode will obviously focus more on the organic experience to make sure we are getting it right. I think the fundamental value of ads are it enables access to deploy the services to billions of people.

Second is, the reason we’ve always taken ads seriously is we view ads as commercial information, but it’s still information. And so we bring the same quality metrics to it.

I think with AI mode, to our earlier conversation, I think AI itself will help us over time, figure out the best way to do it.

Given we are giving context around everything, I think it will give us more opportunities to also explain, okay, here’s some commercial information. Like today, as a podcaster, you do it at certain spots and you probably figure out what’s best in your podcast.

There are aspects of that, but I think the underlying need of people value commercial information. Businesses are trying to connect to users. All that doesn’t change in an AI moment. But look, we will rethink it.”

Will AI Mode Replace Everything?

Lex Fridman asked if Pichai sees a time where AI Mode will become the interface through which the Internet is filtered, asking if there’s a future where it completely replaces the current combination of AI Overviews and ten blue links.

Pichai answered:

“Our current plan is AI Mode is going to be there as a separate tab for people who really want to experience that, but it’s not yet at the level where our main search pages, but as features work, we’ll keep migrating it to the main page. And so you can view it as a continuum. AI model offer you the bleeding edge experience. But things that work will keep overflowing to AI Overviews in the main experience.”

Takeaways

The questions posed by Lex Fridman echo the fears and negative sentiment felt by many publishers about Google’s evolution to providing answers to queries instead of links to the open web.

Sundar Pichai repeatedly stated that Google intends to keep sending users to the human-created web, explaining that AI provides more context that encourages users to explore topics on the web in greater depth.

Those statements, however, are undermined by Google’s delay in enabling web publishers to accurately track referrals from AI Overviews and AI Mode. This creates the impression that publishers are an afterthought and feeds web publisher skepticism about Google’s commitment to the human-created web. While it’s refreshing to hear Google’s CEO emphatically declare his concern for the web ecosystem, I believe it will take more positive actions from Google to overcome web publishers’ negative outlook on the current state of AI search.

Google Outage Disrupts Lens, Discover, & Voice Search Results via @sejournal, @MattGSouthern

Google has confirmed an ongoing disruption that is preventing some results from appearing in Google Lens, Discover, and Voice Search.

According to the company’s Search Status Dashboard, the incident began on June 12 at 1:00 p.m. Pacific Time. A follow-up entry posted at 1:16 p.m. states:

“There’s an ongoing issue with serving Google Lens, Discover, and Voice Search results that’s affecting some users. We’re working on identifying the root cause. The next update will be within 12 hours.”

At press time, the disruption is still marked as “Incident affecting Serving,” meaning the underlying services remain online but are not consistently delivering results.

Why This Matters

Google Lens, the Discover feed, and Voice Search collectively drive significant traffic to publishers, ecommerce catalogs, and local businesses.

When any of these surfaces go dark or return incomplete results, sites that rely on them can experience abrupt drops in impressions and clicks.

What To Do Next

Check for sudden drops in Discover, image, or voice traffic starting around 1:00 p.m. PT. If you see a temporary decline that matches the time on Google’s dashboard, this is likely due to the outage, not a ranking change.

Share Google’s official dashboard notice with website stakeholders. Mention that there will be another update from Google in 12 hours and explain that performance should return to normal once the service is back up.

When Will Service Be Restored?

Google hasn’t offered an estimated time of full resolution, committing only to provide another status update within 12 hours of the 1:16 p.m. post.

Historically, incidents affecting a limited number of users have been fixed within hours, although larger issues can take longer to resolve.

Until Google publishes its next update, the safest assumption is that Lens, Discover, and Voice Search services will remain unpredictable.

The core web search experience is currently listed as “Available,” so blue-link ranking checks and traditional query troubleshooting can proceed as usual.


Featured Image: Roman Samborskyi/Shutterstock

Google Search Team Explains The “It Depends” Response via @sejournal, @MattGSouthern

Google’s Search Relations team has explained why their SEO advice often sounds vague or comes with conditions, such as “it depends.”

In a recent Search Off the Record podcast, team members Martin Splitt and Gary Illyes shared the challenges that prevent them from providing clear-cut answers.

The discussion was part of what the team referred to as a “more human episode.”

The Googlers acknowledged they sometimes come across as robotic and used this episode to show a more human side.

The Context Problem

Splitt works as Google’s bridge between developers and SEO professionals. He provided an example of how good advice can be distorted when people overlook the broader context.

At a Tech SEO Summit, he presented a slide with a bold statement about JavaScript performance. To prevent confusion, he added a note stating that the slide lacked context and provided a full explanation during the talk.

But even with that, he said the statement still got pulled out and repeated on its own.

“I had a remark on that slide saying there’s context missing here, and then I gave all that context… The problem with me saying that in general is that people will just take that one sentence and ignore everything else I said before or after.”

He clarified that JavaScript plays an important role in many web experiences, like enabling offline support. But that nuance often gets lost when single lines are quoted in isolation.

Why Google Doesn’t Share Slides

This loss of context is one reason why Google teams don’t typically share their presentation slides.

Illyes confirmed that slides on their own can be misleading:

He stated:

“Our slides without context, they are useless.”

The team sees what happens when advice meant for one specific situation gets used everywhere. This can hurt websites that have different needs.

For example, advice that works for a small local business might be wrong for a global company with websites in multiple languages.

The “It Depends” Situation

Both Google reps know the SEO community gets frustrated with “it depends” answers.

Splitt even called it his “pet peeve.” But they explained why they can’t give simple yes-or-no answers.

Splitt noted:

“Someone who is serving a very specific niche with highly regulated content in a single country in a single language might have very different requirements than a multilanguage multinational brand that sells everything to everyone.”

They try to give more complete answers by explaining what factors matter. But this makes their advice longer and more complex.

The Google team also worries about how people use their quotes. Splitt said people often pick one statement while ignoring other important information.

Splitt explained:

“It often makes things tricky because people might cherry pick and might pick one thing you said, take that out of context and use it as an example why people should follow their agenda rather than ours.”

While they know public statements can be quoted freely, both reps feel bad when selective quoting gets out of control.

What This Means

The Google team’s openness about their struggles affirms the experience of many SEO professionals.

Google’s guidance often feels cautious because it needs to account for a wide range of use cases.

Instead of seeking simple answers, focus on the factors that influence Google’s recommendations.

Understanding the “why” behind Google’s advice is more useful than chasing one-size-fits-all solutions.

Listen to the full podcast episode below:


Featured Image: Roman Samborskyi/Shutterstock

Ask A PPC: What’s The Value Of Regular PPC Audits & How To Do Them Well via @sejournal, @navahf

Regular audits are one of the foundational workflows in any paid media strategy.

Whether you’re investigating account anomalies, evaluating growth opportunities, or preparing to transition strategies or vendors, audits are an essential pillar of PPC success.

Here’s the thing: Not every audit strategy fits every account. A one-size-fits-all checklist won’t account for platform quirks, business goals, or campaign maturity.

That’s why in this month’s Ask the PPC, we’re taking a closer look at the value of doing regular audits – and how to do them in a way that actually drives meaningful insights and actions.

We’ll focus on cross-platform audits, with takeaways that apply whether you’re managing paid search or paid social campaigns.

Why Regular Audits Matter

At its core, the biggest benefit of auditing is clarity. If you’ve ever been surprised by an ad invoice and found yourself wondering, “What exactly did I pay for?” – you’re not alone.

Regular audits demystify performance. They help you understand why certain trends are happening and whether your structure is actually supporting your goals.

Beyond performance monitoring, audits unlock three critical value areas:

1. Budget Access For Net-New Entities

Ad platforms generally prefer putting spend behind “known” quantities – ads, keywords, and audiences with conversion data.

While that makes sense from a machine learning standpoint, it can sideline your new campaigns, ads, or targeting experiments unless you’re intentional about how you test.

Auditing helps ensure that newer entities aren’t starved for budget simply because older ones exist in competing campaigns/portfolios.

You can spot opportunities to move testing into separate campaigns or determine whether an older asset already covers the newer idea.

Go Do: When reviewing entity-level spend, ask: Are my new tests getting a fair shot? If not, consider spinning them out into their own campaigns with protected budgets. You’ll be able to tell if they’re being stifled by checking for impressions and budget access.

2. Active Vs. Passive Management Ratios

One of the biggest indicators of an account’s strategic health is the ratio of active to passive management.

  • Active management includes strategic actions like testing new creatives, adding keyword themes, or refining audiences.
  • Passive management is more operational: pausing campaigns, adjusting bids, or relying on automated IP exclusions and pacing scripts.

If your audit reveals a lopsided emphasis on passive tasks, it may mean strategic opportunities are being missed.

While there’s value in letting campaigns run and gather data, relying too much on autopilot can result in performance stagnation.

Note: Passive tasks are important and shouldn’t be discontinued, but they shouldn’t be the only ones completed in an account.

Go Do: Review the change history. Are most changes bid-based or budget-related? If so, build a cadence to test new creative or targeting ideas each month.

3. Testing Your Own Strategic Biases

We’re all susceptible to sticking with what’s worked in the past. That’s human nature. Yet, strategies that delivered last year might not be relevant today.

A solid audit can uncover blind spots, such as missing impression share, rising cost per click, or declining lead quality, and challenge assumptions you’ve made about your best performers.

Go Do: Build a comparison view of top-performing assets this quarter vs. last. Are your “winning” campaigns still winning? Or are they riding on past success?

How To Perform Audits That Actually Drive Value

Now that we’ve explored the why, let’s get into the how.

1. Put Audits On The Calendar

Block off time every quarter for structured audits. One to two hours per quarter per account is a good benchmark – not because the audit takes that long, but because carving out dedicated time ensures it actually gets done.

Pro Tip: Treat it like a client meeting, even if it’s internal. If it’s on your calendar, it’s happening.

2. Audit Against The Right Benchmarks

A good audit doesn’t just ask, “Is my CPA low?” It asks, “Is this CPA real, and does it reflect meaningful conversions?”

If you’re seeing great-looking cost-per-acquisition numbers, dig deeper:

  • Are micro-conversions inflating results?
  • Are conversion actions properly weighted?
  • Are your ads reaching qualified users?

Make sure you differentiate between reported cost per acquisition (in your CRM or Google Analytics 4) and platform CPA (Google, Meta, Microsoft, etc.). If there’s a mismatch, it might be time to clean up your conversion tracking setup.

Go Do: Pull a side-by-side view of your platform-reported CPA vs. your actual revenue-driving conversions. Audit the quality and intent behind each tracked action.

3. Audit Creatives For Performance And Compliance

Creative audits aren’t just about freshness or click-through rate. They’re also about compliance, especially in regulated industries. Messaging that skirts policy lines (even unintentionally) can tank account performance.

This is where industry-specific knowledge becomes non-negotiable. Your creative might be attention-grabbing, but is it allowed in your vertical?

Go Do: Cross-reference your current ad copy and creative with the platform’s most recent ad policy update. Bonus: Loop in your legal or compliance team before launching new assets.

Final Thoughts: Audits As Strategy Enablers

Audits are more than housekeeping; they’re strategic resets. They help you validate your current direction, challenge stale assumptions, and carve out space to innovate.

Too often, accounts get stuck in maintenance mode. Auditing breaks that cycle.

By incorporating regular, structured audits into your workflow, you create a feedback loop that protects budget, sharpens strategy, and ultimately drives better results.

Have a question you want addressed? Ask here!

More Resources:


Featured Image: Paulo Bobita/Search Engine Journal

The Truth About LLM Hallucinations With Barry Adams via @sejournal, @theshelleywalsh

The launch of ChatGPT blew apart the search industry, and the last few years have seen more and more AI integration into search engine results pages.

In an attempt to keep up with the LLMs, Google launched AI Overviews and just announced AI Mode tabs.

The expectation is that SERPs will become blended with a Large Language Model (LLM) interface, and the nature of how users search will adapt to conversations and journeys.

However, there is an issue surrounding AI hallucinations and misinformation within LLM and Google AI Overview generated results, and it seems to be largely ignored, not just by Google but also by the news publishers it affects.

More worrying is that users are either unaware or prepared to accept the cost of misinformation for the sake of convenience.

Barry Adams is the authority on editorial SEO and works with the leading news publisher titles worldwide via Polemic Digital. Barry also founded the News & Editorial SEO Summit along with John Shehata.

I read a LinkedIn post from Barry where he said:

“LLMs are incredibly dumb. There is nothing intelligent about LLMs. They’re advanced word predictors, and using them for any purpose that requires a basis in verifiable facts – like search queries – is fundamentally wrong.

But people don’t seem to care. Google doesn’t seem to care. And the tech industry sure as hell doesn’t care, they’re wilfully blinded by dollar signs.

I don’t feel the wider media are sufficiently reporting on the inherent inaccuracies of LLMs. Publishers are keen to say that generative AI could be an existential threat to publishing on the web, yet they fail to consistently point out GenAI’s biggest weakness.”

The post prompted me to speak to him in more detail about LLM hallucinations, their impact on publishing, and what the industry needs to understand about AI’s limitations.

You can watch the full interview with Barry on IMHO below, or continue reading the article summary.

Why Are LLMs So Bad At Citing Sources?

I asked Barry to explain why LLMs struggle with accurate source attribution and factual reliability.

Barry responded, “It’s because they don’t know anything. There’s no intelligence. I think calling them AIs is the wrong label. They’re not intelligent in any way. They’re probability machines. They don’t have any reasoning faculties as we understand it.”

He explained that LLMs operate by regurgitating answers based on training data, then attempting to rationalize their responses through grounding efforts and link citations.

Even with careful prompting to use only verified sources, these systems maintain a high probability of hallucinating references.

“They are just predictive text from your phone, on steroids, and they will just make stuff up and very confidently present it to you because that’s just what they do. That’s the entire nature of the technology,” Barry emphasized.

This confident presentation of potentially false information represents a fundamental problem with how these systems are being deployed in scenarios they’re not suited for.

Are We Creating An AI Spiral Of Misinformation?

I shared with Barry my concerns about an AI misinformation spiral where AI content increasingly references other AI content, potentially losing the source of facts and truth entirely.

Barry’s outlook was pessimistic, “I don’t think people care as much about truth as maybe we believe they should. I think people will accept information presented to them if it’s useful and if it conforms with their pre-existing beliefs.”

“People don’t really care about truth. They care about convenience.”

He argued that the last 15 years of social media have proven that people prioritize confirmation of their beliefs over factual accuracy.

LLMs facilitate this process even more than social media by providing convenient answers without requiring critical thinking or verification.

“The real threat is how AI is replacing truth with convenience,” Barry observed, noting that Google’s embrace of AI represents a clear step away from surfacing factual information toward providing what users want to hear.

Barry warned we’re entering a spiral where “entire societies will live in parallel realities and we’ll deride the other side as being fake news and just not real.”

Why Isn’t Mainstream Media Calling Out AI’s Limitations?

I asked Barry why mainstream media isn’t more vocal about AI’s weaknesses, especially given that publishers could save themselves by influencing public perception of Gen AI limitations.

Barry identified several factors: “Google is such a powerful force in driving traffic and revenue to publishers that a lot of publishers are afraid to write too critically about Google because they feel there might be repercussions.”

He also noted that many journalists don’t genuinely understand how AI systems work. Technology journalists who understand the issues sometimes raise questions, but general reporters for major newspapers often lack the knowledge to scrutinize AI claims properly.

Barry pointed to Google’s promise that AI Overviews would send more traffic to publishers as an example: “It turns out, no, that’s the exact opposite of what’s happening, which everybody with two brain cells saw coming a mile away.”

How Do We Explain The Traffic Reduction To News Publishers?

I noted research that shows users do click on sources to verify AI outputs, and that Google doesn’t show AI Overviews on top news stories. Yet, traffic to news publishers continues to decline overall.

Barry explained this involves multiple factors:

“People do click on sources. People do double-check the citations, but not to the same extent as before. ChatGPT and Gemini will give you an answer. People will click two or three links to verify.

Previously, users conducting their own research would click 30 to 40 links and read them in detail. Now they might verify AI responses with just a few clicks.

Additionally, while news publishers are less affected by AI Overviews, they’ve lost traffic on explainer content, background stories, and analysis pieces that AI now handles directly with minimal click-through to sources.”

Barry emphasized that Google has been diminishing publisher traffic for years through algorithm updates and efforts to keep users within Google’s ecosystem longer.

“Google is the monopoly informational gateway on the web. So you can say, ‘Oh, don’t be dependent on Google,’ but you have to be where your users are and you cannot have a viable publishing business without heavily relying on Google traffic.”

What Should Publishers Do To Survive?

I asked Barry for his recommendations on optimizing for LLM inclusion and how to survive the introduction of AI-generated search results.

Barry advised publishers to accept that search traffic will diminish while focusing on building a stronger brand identity.

“I think publishers need to be more confident about what they are and specifically what they’re not.”

He highlighted the Financial Times as an exemplary model because “nobody has any doubt about what the Financial Times is and what kind of reporting they’re signing up for.”

This clarity enables strong subscription conversion because readers understand the specific value they’re receiving.

Barry emphasized the importance of developing brand power that makes users specifically seek out particular publications, “I think too many publishers try to be everything to everybody and therefore are nothing to nobody. You need to have a strong brand voice.”

He used the example of the Daily Mail that succeeds through consistent brand identity, with users specifically searching for the brand name with topical searches such as “Meghan Markle Daily Mail” or “Prince Harry Daily Mail.”

The goal is to build direct relationships that bypass intermediaries through apps, newsletters, and direct website visits.

The Brand Identity Imperative

Barry stressed that publishers covering similar topics with interchangeable content face existential threats.

He works with publishers where “they’re all reporting the same stuff with the same screenshots and the same set photos and pretty much the same content.”

Such publications become vulnerable because readers lose nothing by substituting one source for another. Success requires developing unique value propositions that make audiences specifically seek out particular publications.

“You need to have a very strong brand identity as a publisher. And if you don’t have it, you probably won’t exist in the next five to ten years,” Barry concluded.

Barry advised news publishers to focus on brand development, subscription models, and building content ecosystems that don’t rely entirely on Google. That may mean fewer clicks, but more meaningful, higher-quality engagement.

Moving Forward

Barry’s opinion and the reality of the changes AI is forcing are hard truths.

The industry requires honest acknowledgment of AI limitations, strategic brand building, and acceptance that easy search traffic won’t return.

Publishers have two options: To continue chasing diminishing search traffic with the same content that everyone else is producing, or they invest in direct audience relationships that provide sustainable foundations for quality journalism.

Thank you to Barry Adams for offering his insights and being my guest on IMHO.

More Resources: 


Featured Image: Shelley Walsh/Search Engine Journal 

Your Next Time Saver: How To Use AI To Save Time On Hosting Maintenance, Agency Edition via @sejournal, @Hanrahan7

This post was sponsored by Cloudways. The opinions expressed in this article are the sponsor’s own.

Have you ever woken up to a 3 AM client website panic?

Did your client’s ecommerce site crash during a flash sale?

Has another client asked why their site is slow, “even though we’re paying for premium hosting.”

This isn’t just an occasional nuisance.

If you’re managing multiple client sites, hosting maintenance becomes a full-on job in itself. The worst part? None of this time is billable, and every minute spent troubleshooting is a minute you’re not spending on business growth.

Here’s the truth: The way you handle hosting maintenance may be broken. And it’s costing you far more than you realize, in time, money, and missed opportunities.

In this article, we’ll explore:

Ways You’re Accidentally Draining Agency Revenue

You and your agency may lose countless hours to hosting maintenance without realizing the true cost.

Behind every “quick fix” lies a hidden drain on productivity and profits.

Are You Doing This?

A frantic client message or monitoring alert, often hours after the problem started. Then:

  • Developers scramble to check logs and test configurations.
  • The team disables plugins one by one as a diagnostic method.
  • Someone finally contacts hosting support after internal efforts fail.
  • The issue gets resolved (often) after hours of back-and-forth.

The financial impact is staggering when you do the math.

Consider an agency managing just 30 websites.

If each site experiences only 2 hosting incidents per month requiring 3 hours to resolve, that’s 180 hours annually.

This is nearly an entire month’s worth of lost productivity.

  • Average resolution time: 3.5 hours per incident.
  • For an agency with 50 client sites, 4,200 hours/year lost.
  • At a $150/hour billable rate → $630,000 potential revenue wasted.

Beyond direct costs, this broken system creates three major problems:

  1. Team burnout – Constant firefighting demoralizes developers
  2. Client distrust – Repeated issues make your agency look incompetent
  3. Growth stagnation – Leadership spends time troubleshooting instead of scaling

Each downtime incident plants seeds of doubt about your agency’s technical competence. After just a few occurrences, clients start questioning why they’re paying premium rates for what feels like unreliable service. This erosion of confidence makes contract renewals harder and opens the door for competitors.

How To Solve Client Website Hosting Issues

Most agencies cycle through the same ineffective solutions, each with significant drawbacks:

Don’t: Only Take The Staffing Approach

The most common solution is hiring dedicated infrastructure staff. Many agencies believe bringing a systems admin or DevOps engineer on board will solve their hosting woes. While this provides more control, it creates new problems. You’re now responsible for recruiting, managing, and covering the cost of specialized technical talent.

  • $85k+ annual salary for each infrastructure specialist.
  • Ongoing management overhead for technical staff.
  • Limited availability for after-hours emergencies.
  • Still requires hosting provider support for complex issues.

Don’t: Just Take The Managed Hosting Solution

Many agencies turn to managed hosting providers to alleviate their maintenance burden.

Technically adept teams can absolutely handle straightforward server-level maintenance, security patches, and core updates; however, most still require some additional support when faced with:

  • Application-specific troubleshooting (plugin conflicts, theme issues).
  • Custom performance optimization.
  • Specialized configurations.

The key difference lies in how managed hosting providers address these residual needs. Traditional hosting providers might still leave you waiting in support queues, while next-gen platforms automatically begin repairs.

Don’t: Simply Use Website Uptime Monitoring Tools

You may think about attempting to solve the problem through monitoring tools.

Website monitoring tools layer on services like New Relic, Datadog, and UptimeRobot, hoping the better visibility will reduce firefighting.

While these tools provide valuable data, they primarily generate more alerts for your team to interpret and take action on. You’ve essentially traded one problem for another – instead of lacking information, you’re now drowning in it.

  • Alert overload from multiple systems.
  • False positives that waste investigation time.
  • No actionable insights – just more data to interpret.
  • Still requires manual diagnosis and resolution.

Do: Incorporate AI-Powered Hosting Maintenance

Imagine, instead of the chaotic process, you:

  1. Know about issues before clients did.
  2. Understand exactly what went wrong, in plain English.
  3. Get step-by-step instructions to fix it immediately.

Copilots that can do these tasks are your first step towards using and creating a self-learning, auto-healing hosting platform.

They can use intelligent monitoring to detect and help resolve the most common and critical server issues.

Hosting Maintenance: Before & After AI Integration

The Old Way:

  • Client reports site is down (30+ minutes after it actually went down).
  • You spend an hour checking logs and plugins.
  • You contact support and wait 2 hours for a response.
  • Support suggests a fix that may or may not work.
  • Total downtime: 4+ hours.

With Cloudways Copilot:

  • Copilot detects the issue immediately (often before users notice).
  • You receive an alert with exact cause and fix.
  • You implement the solution in minutes.
  • Total downtime: Dramatically reduced resolution time compared to traditional troubleshooting.

How To Get Automatic Hosting & Site Alerts, Repairs & Updates

You can configure Cloudways Copilot to manage many facets of web hosting.

Host Health

Triggers when your entire server goes down, typically from:

Webstack Health

  • Alerts when core services fail (Apache, Nginx, MySQL, PHP-FPM).
  • Catches crashes before they take sites offline.
  • Identifies resource exhaustion issues.

Disk & Inode Health

Warns before you hit critical limits:

  • Disk space (95%+ utilization).
  • Inode usage (separate from storage space).

Result: Instant problem detection!

Copilot continuously monitors your servers and applications for:

  • Performance bottlenecks.
  • Security threats.
  • Resource constraints.
  • Configuration errors.

Unlike traditional monitoring tools that just tell you “something’s wrong,” Copilot identifies the specific issue.

What AI Reports Look Like For Website Maintenance

For each problem detected, Copilot provides:

  1. What happened: The specific error or issue
  2. Where it occurred: Which site, which server
  3. Why it happened: The root cause analysis
  4. How to fix it: Step-by-step resolution instructions

Real-World Example

Instead of just “High CPU Usage” alert, Copilot tells you:

“Your WordPress site on Server X is experiencing high CPU due to a poorly optimized WooCommerce query in Plugin Y. Disable the plugin or contact the developer for an update.”

To be more specific, the example below shows Copilot detecting a DDOS attack, listing malicious IPs, and identifying the file that is being attacked. It then provides remediation steps to prevent the DDOS attack.

Investigation Summary

The investigation shows that much of the server load is driven by multiple concurrent Magento cron tasks and frequent POST requests to the xmlrpc.php endpoint. These factors have caused high CPU usage and intermittent HTTP 500 errors. It appears that the repeated bot visits to xmlrpc.php and resource-intensive Magento cron jobs are the likely cause of the problem.

Remediation Steps

To mitigate the issues related to the potential DDoS attack and improve server performance, please follow these steps:

  1. Block Offending IPs: Use the `.htaccess` file to block the specific IP addresses that are contributing to the abuse. Add the following lines to your .htaccess file:
    1. deny from 66.249.75.34
    2. deny from 114.119.136.249
    3. deny from 182.43.65.168
  2. Implement Cloudflare: If you haven’t already, consider setting up Cloudflare for your application. This will enhance your website’s security and help mitigate DDoS attacks by filtering malicious traffic before it reaches your server. Follow the integration steps provided by Cloudways to set up CNAME records.
  3. Enable Under Attack Mode: Once Cloudflare is set up, enable Under Attack Mode, which performs an extra security check for every visitor and helps protect your site from ongoing DDoS attacks, especially targeting specific URLs like /xmlrpc.php.

By taking these actions, you can significantly reduce the strain on your server and improve its performance.”,

Support links:

https://support.cloudways.com/en/articles/6009152-how-to-integrate-cloudflare-with-your-application

https://support.cloudways.com/en/articles/5120765-how-to-monitor-system-processes-using-htop-command

This is how Cloudways Copilot uses AI to identify hosting and server issues by comparing them to similar cases across the fleet, quickly suggesting the most effective remediation solutions with step-by-step instructions. This saves you time by providing immediate solutions without the need for manual detection, troubleshooting, or back-and-forth support tickets, preventing disappointment for your clients.

Image create by Cloudways, April 2025

At the end of the day, hosting headaches shouldn’t waste your agency’s most valuable resource: time. Every minute spent troubleshooting is a minute taken away from client work, business growth, or simply having a life outside of server emergencies.

Cloudways Copilot tackles this problem at its root by:

  • Detecting issues before clients notice.
  • Pinpointing exactly what broke and why.
  • Showing where problems occurred (specific apps/servers).
  • Providing step-by-step fixes in plain language.
  • Cutting resolution time from hours to minutes.

What’s coming next makes Cloudways Copilot even better:

  • One-click fixes – Resolve common errors automatically with a single click
  • Automated resolutions – Let Copilot handle routine tasks like server-wide cache purges and backup management
  • Developer workflows – Automate performance monitoring and testing to free up your team

Best of all? During our early access period, Cloudways Copilot is completely free. We’re currently onboarding users through our limited-access program – visit the Cloudways Copilot page and submit your details to secure your spot.


Image Credits

Featured Image: Image by Cloudways. Used with permission.

In-Post Image: Images by Cloudways. Used with permission.

Inside Amsterdam’s high-stakes experiment to create fair welfare AI

This story is a partnership between MIT Technology Review, Lighthouse Reports, and Trouw, and was supported by the Pulitzer Center. 

Two futures

Hans de Zwart, a gym teacher turned digital rights advocate, says that when he saw Amsterdam’s plan to have an algorithm evaluate every welfare applicant in the city for potential fraud, he nearly fell out of his chair. 

It was February 2023, and de Zwart, who had served as the executive director of Bits of Freedom, the Netherlands’ leading digital rights NGO, had been working as an informal advisor to Amsterdam’s city government for nearly two years, reviewing and providing feedback on the AI systems it was developing. 

According to the city’s documentation, this specific AI model—referred to as “Smart Check”—would consider submissions from potential welfare recipients and determine who might have submitted an incorrect application. More than any other project that had come across his desk, this one stood out immediately, he told us—and not in a good way. “There’s some very fundamental [and] unfixable problems,” he says, in using this algorithm “on real people.”

From his vantage point behind the sweeping arc of glass windows at Amsterdam’s city hall, Paul de Koning, a consultant to the city whose résumé includes stops at various agencies in the Dutch welfare state, had viewed the same system with pride. De Koning, who managed Smart Check’s pilot phase, was excited about what he saw as the project’s potential to improve efficiency and remove bias from Amsterdam’s social benefits system. 

A team of fraud investigators and data scientists had spent years working on Smart Check, and de Koning believed that promising early results had vindicated their approach. The city had consulted experts, run bias tests, implemented technical safeguards, and solicited feedback from the people who’d be affected by the program—more or less following every recommendation in the ethical-AI playbook. “I got a good feeling,” he told us. 

These opposing viewpoints epitomize a global debate about whether algorithms can ever be fair when tasked with making decisions that shape people’s lives. Over the past several years of efforts to use artificial intelligence in this way, examples of collateral damage have mounted: nonwhite job applicants weeded out of job application pools in the US, families being wrongly flagged for child abuse investigations in Japan, and low-income residents being denied food subsidies in India. 

Proponents of these assessment systems argue that they can create more efficient public services by doing more with less and, in the case of welfare systems specifically, reclaim money that is allegedly being lost from the public purse. In practice, many were poorly designed from the start. They sometimes factor in personal characteristics in a way that leads to discrimination, and sometimes they have been deployed without testing for bias or effectiveness. In general, they offer few options for people to challenge—or even understand—the automated actions directly affecting how they live. 

The result has been more than a decade of scandals. In response, lawmakers, bureaucrats, and the private sector, from Amsterdam to New York, Seoul to Mexico City, have been trying to atone by creating algorithmic systems that integrate the principles of “responsible AI”—an approach that aims to guide AI development to benefit society while minimizing negative consequences. 

CHANTAL JAHCHAN

Developing and deploying ethical AI is a top priority for the European Union, and the same was true for the US under former president Joe Biden, who released a blueprint for an AI Bill of Rights. That plan was rescinded by the Trump administration, which has removed considerations of equity and fairness, including in technology, at the national level. Nevertheless, systems influenced by these principles are still being tested by leaders in countries, states, provinces, and cities—in and out of the US—that have immense power to make decisions like whom to hire, when to investigate cases of potential child abuse, and which residents should receive services first. 

Amsterdam indeed thought it was on the right track. City officials in the welfare department believed they could build technology that would prevent fraud while protecting citizens’ rights. They followed these emerging best practices and invested a vast amount of time and money in a project that eventually processed live welfare applications. But in their pilot, they found that the system they’d developed was still not fair and effective. Why? 

Lighthouse Reports, MIT Technology Review, and the Dutch newspaper Trouw have gained unprecedented access to the system to try to find out. In response to a public records request, the city disclosed multiple versions of the Smart Check algorithm and data on how it evaluated real-world welfare applicants, offering us unique insight into whether, under the best possible conditions, algorithmic systems can deliver on their ambitious promises.  

The answer to that question is far from simple. For de Koning, Smart Check represented technological progress toward a fairer and more transparent welfare system. For de Zwart, it represented a substantial risk to welfare recipients’ rights that no amount of technical tweaking could fix. As this algorithmic experiment unfolded over several years, it called into question the project’s central premise: that responsible AI can be more than a thought experiment or corporate selling point—and actually make algorithmic systems fair in the real world.

A chance at redemption

Understanding how Amsterdam found itself conducting a high-stakes endeavor with AI-driven fraud prevention requires going back four decades, to a national scandal around welfare investigations gone too far. 

In 1984, Albine Grumböck, a divorced single mother of three, had been receiving welfare for several years when she learned that one of her neighbors, an employee at the social service’s local office, had been secretly surveilling her life. He documented visits from a male friend, who in theory could have been contributing unreported income to the family. On the basis of his observations, the welfare office cut Grumböck’s benefits. She fought the decision in court and won.

Albine Grumböck in the courtroom with her lawyer and assembled spectators
Albine Grumböck, whose benefits had been cut off, learns of the judgement for interim relief.
ROB BOGAERTS/ NATIONAAL ARCHIEF

Despite her personal vindication, Dutch welfare policy has continued to empower welfare fraud investigators, sometimes referred to as “toothbrush counters,” to turn over people’s lives. This has helped create an atmosphere of suspicion that leads to problems for both sides, says Marc van Hoof, a lawyer who has helped Dutch welfare recipients navigate the system for decades: “The government doesn’t trust its people, and the people don’t trust the government.”

Harry Bodaar, a career civil servant, has observed the Netherlands’ welfare policy up close throughout much of this time—first as a social worker, then as a fraud investigator, and now as a welfare policy advisor for the city. The past 30 years have shown him that “the system is held together by rubber bands and staples,” he says. “And if you’re at the bottom of that system, you’re the first to fall through the cracks.”

Making the system work better for beneficiaries, he adds, was a large motivating factor when the city began designing Smart Check in 2019. “We wanted to do a fair check only on the people we [really] thought needed to be checked,” Bodaar says—in contrast to previous department policy, which until 2007 was to conduct home visits for every applicant. 

But he also knew that the Netherlands had become something of a ground zero for problematic welfare AI deployments. The Dutch government’s attempts to modernize fraud detection through AI had backfired on a few notorious occasions.

In 2019, it was revealed that the national government had been using an algorithm to create risk profiles that it hoped would help spot fraud in the child care benefits system. The resulting scandal saw nearly 35,000 parents, most of whom were migrants or the children of migrants, wrongly accused of defrauding the assistance system over six years. It put families in debt, pushed some into poverty, and ultimately led the entire government to resign in 2021.  

front page of Trouw from January 16, 2021

COURTESY OF TROUW

In Rotterdam, a 2023 investigation by Lighthouse Reports into a system for detecting welfare fraud found it to be biased against women, parents, non-native Dutch speakers, and other vulnerable groups, eventually forcing the city to suspend use of the system. Other cities, like Amsterdam and Leiden, used a system called the Fraud Scorecard, which was first deployed more than 20 years ago and included education, neighborhood, parenthood, and gender as crude risk factors to assess welfare applicants; that program was also discontinued.

The Netherlands is not alone. In the United States, there have been at least 11 cases in which state governments used algorithms to help disperse public benefits, according to the nonprofit Benefits Tech Advocacy Hub, often with troubling results. Michigan, for instance, falsely accused 40,000 people of committing unemployment fraud. And in France, campaigners are taking the national welfare authority to court over an algorithm they claim discriminates against low-income applicants and people with disabilities. 

This string of scandals, as well as a growing awareness of how racial discrimination can be embedded in algorithmic systems, helped fuel the growing emphasis on responsible AI. It’s become “this umbrella term to say that we need to think about not just ethics, but also fairness,” says Jiahao Chen, an ethical-AI consultant who has provided auditing services to both private and local government entities. “I think we are seeing that realization that we need things like transparency and privacy, security and safety, and so on.” 

The approach, based on a set of tools intended to rein in the harms caused by the proliferating technology, has given rise to a rapidly growing field built upon a familiar formula: white papers and frameworks from think tanks and international bodies, and a lucrative consulting industry made up of traditional power players like the Big 5 consultancies, as well as a host of startups and nonprofits. In 2019, for instance, the Organisation for Economic Co-operation and Development, a global economic policy body, published its Principles on Artificial Intelligence as a guide for the development of “trustworthy AI.” Those principles include building explainable systems, consulting public stakeholders, and conducting audits. 

But the legacy left by decades of algorithmic misconduct has proved hard to shake off, and there is little agreement on where to draw the line between what is fair and what is not. While the Netherlands works to institute reforms shaped by responsible AI at the national level, Algorithm Audit, a Dutch NGO that has provided ethical-AI auditing services to government ministries, has concluded that the technology should be used to profile welfare recipients only under strictly defined conditions, and only if systems avoid taking into account protected characteristics like gender. Meanwhile, Amnesty International, digital rights advocates like de Zwart, and some welfare recipients themselves argue that when it comes to making decisions about people’s lives, as in the case of social services, the public sector should not be using AI at all.

Amsterdam hoped it had found the right balance. “We’ve learned from the things that happened before us,” says Bodaar, the policy advisor, of the past scandals. And this time around, the city wanted to build a system that would “show the people in Amsterdam we do good and we do fair.”

Finding a better way

Every time an Amsterdam resident applies for benefits, a caseworker reviews the application for irregularities. If an application looks suspicious, it can be sent to the city’s investigations department—which could lead to a rejection, a request to correct paperwork errors, or a recommendation that the candidate receive less money. Investigations can also happen later, once benefits have been dispersed; the outcome may force recipients to pay back funds, and even push some into debt.

Officials have broad authority over both applicants and existing welfare recipients. They can request bank records, summon beneficiaries to city hall, and in some cases make unannounced visits to a person’s home. As investigations are carried out—or paperwork errors fixed—much-needed payments may be delayed. And often—in more than half of the investigations of applications, according to figures provided by Bodaar—the city finds no evidence of wrongdoing. In those cases, this can mean that the city has “wrongly harassed people,” Bodaar says. 

The Smart Check system was designed to avoid these scenarios by eventually replacing the initial caseworker who flags which cases to send to the investigations department. The algorithm would screen the applications to identify those most likely to involve major errors, based on certain personal characteristics, and redirect those cases for further scrutiny by the enforcement team.

If all went well, the city wrote in its internal documentation, the system would improve on the performance of its human caseworkers, flagging fewer welfare applicants for investigation while identifying a greater proportion of cases with errors. In one document, the city projected that the model would prevent up to 125 individual Amsterdammers from facing debt collection and save €2.4 million annually. 

Smart Check was an exciting prospect for city officials like de Koning, who would manage the project when it was deployed. He was optimistic, since the city was taking a scientific approach, he says; it would “see if it was going to work” instead of taking the attitude that “this must work, and no matter what, we will continue this.”

It was the kind of bold idea that attracted optimistic techies like Loek Berkers, a data scientist who worked on Smart Check in only his second job out of college. Speaking in a cafe tucked behind Amsterdam’s city hall, Berkers remembers being impressed at his first contact with the system: “Especially for a project within the municipality,” he says, it “was very much a sort of innovative project that was trying something new.”

Smart Check made use of an algorithm called an “explainable boosting machine,” which allows people to more easily understand how AI models produce their predictions. Most other machine-learning models are often regarded as “black boxes” running abstract mathematical processes that are hard to understand for both the employees tasked with using them and the people affected by the results. 

The Smart Check model would consider 15 characteristics—including whether applicants had previously applied for or received benefits, the sum of their assets, and the number of addresses they had on file—to assign a risk score to each person. It purposefully avoided demographic factors, such as gender, nationality, or age, that were thought to lead to bias. It also tried to avoid “proxy” factors—like postal codes—that may not look sensitive on the surface but can become so if, for example, a postal code is statistically associated with a particular ethnic group.

In an unusual step, the city has disclosed this information and shared multiple versions of the Smart Check model with us, effectively inviting outside scrutiny into the system’s design and function. With this data, we were able to build a hypothetical welfare recipient to get insight into how an individual applicant would be evaluated by Smart Check.  

This model was trained on a data set encompassing 3,400 previous investigations of welfare recipients. The idea was that it would use the outcomes from these investigations, carried out by city employees, to figure out which factors in the initial applications were correlated with potential fraud. 

But using past investigations introduces potential problems from the start, says Sennay Ghebreab, scientific director of the Civic AI Lab (CAIL) at the University of Amsterdam, one of the external groups that the city says it consulted with. The problem of using historical data to build the models, he says, is that “we will end up [with] historic biases.” For example, if caseworkers historically made higher rates of mistakes with a specific ethnic group, the model could wrongly learn to predict that this ethnic group commits fraud at higher rates. 

The city decided it would rigorously audit its system to try to catch such biases against vulnerable groups. But how bias should be defined, and hence what it actually means for an algorithm to be fair, is a matter of fierce debate. Over the past decade, academics have proposed dozens of competing mathematical notions of fairness, some of which are incompatible. This means that a system designed to be “fair” according to one such standard will inevitably violate others.

Amsterdam officials adopted a definition of fairness that focused on equally distributing the burden of wrongful investigations across different demographic groups. 

In other words, they hoped this approach would ensure that welfare applicants of different backgrounds would carry the same burden of being incorrectly investigated at similar rates. 

Mixed feedback

As it built Smart Check, Amsterdam consulted various public bodies about the model, including the city’s internal data protection officer and the Amsterdam Personal Data Commission. It also consulted private organizations, including the consulting firm Deloitte. Each gave the project its approval. 

But one key group was not on board: the Participation Council, a 15-member advisory committee composed of benefits recipients, advocates, and other nongovernmental stakeholders who represent the interests of the people the system was designed to help—and to scrutinize. The committee, like de Zwart, the digital rights advocate, was deeply troubled by what the system could mean for individuals already in precarious positions. 

Anke van der Vliet, now in her 70s, is one longtime member of the council. After she sinks slowly from her walker into a seat at a restaurant in Amsterdam’s Zuid neighborhood, where she lives, she retrieves her reading glasses from their case. “We distrusted it from the start,” she says, pulling out a stack of papers she’s saved on Smart Check. “Everyone was against it.”

For decades, she has been a steadfast advocate for the city’s welfare recipients—a group that, by the end of 2024, numbered around 35,000. In the late 1970s, she helped found Women on Welfare, a group dedicated to exposing the unique challenges faced by women within the welfare system.

City employees first presented their plan to the Participation Council in the fall of 2021. Members like van der Vliet were deeply skeptical. “We wanted to know, is it to my advantage or disadvantage?” she says. 

Two more meetings could not convince them. Their feedback did lead to key changes—including reducing the number of variables the city had initially considered to calculate an applicant’s score and excluding variables that could introduce bias, such as age, from the system. But the Participation Council stopped engaging with the city’s development efforts altogether after six months. “The Council is of the opinion that such an experiment affects the fundamental rights of citizens and should be discontinued,” the group wrote in March 2022. Since only around 3% of welfare benefit applications are fraudulent, the letter continued, using the algorithm was “disproportionate.”

De Koning, the project manager, is skeptical that the system would ever have received the approval of van der Vliet and her colleagues. “I think it was never going to work that the whole Participation Council was going to stand behind the Smart Check idea,” he says. “There was too much emotion in that group about the whole process of the social benefit system.” He adds, “They were very scared there was going to be another scandal.” 

But for advocates working with welfare beneficiaries, and for some of the beneficiaries themselves, the worry wasn’t a scandal but the prospect of real harm. The technology could not only make damaging errors but leave them even more difficult to correct—allowing welfare officers to “hide themselves behind digital walls,” says Henk Kroon, an advocate who assists welfare beneficiaries at the Amsterdam Welfare Association, a union established in the 1970s. Such a system could make work “easy for [officials],” he says. “But for the common citizens, it’s very often the problem.” 

Time to test 

Despite the Participation Council’s ultimate objections, the city decided to push forward and put the working Smart Check model to the test. 

The first results were not what they’d hoped for. When the city’s advanced analytics team ran the initial model in May 2022, they found that the algorithm showed heavy bias against migrants and men, which we were able to independently verify. 

As the city told us and as our analysis confirmed, the initial model was more likely to wrongly flag non-Dutch applicants. And it was nearly twice as likely to wrongly flag an applicant with a non-Western nationality than one with a Western nationality. The model was also 14% more likely to wrongly flag men for investigation. 

In the process of training the model, the city also collected data on who its human case workers had flagged for investigation and which groups the wrongly flagged people were more likely to belong to. In essence, they ran a bias test on their own analog system—an important way to benchmark that is rarely done before deploying such systems. 

What they found in the process led by caseworkers was a strikingly different pattern. Whereas the Smart Check model was more likely to wrongly flag non-Dutch nationals and men, human caseworkers were more likely to wrongly flag Dutch nationals and women. 

The team behind Smart Check knew that if they couldn’t correct for bias, the project would be canceled. So they turned to a technique from academic research, known as training-data reweighting. In practice, that meant applicants with a non-Western nationality who were deemed to have made meaningful errors in their applications were given less weight in the data, while those with a Western nationality were given more.

Eventually, this appeared to solve their problem: As Lighthouse’s analysis confirms, once the model was reweighted, Dutch and non-Dutch nationals were equally likely to be wrongly flagged. 

De Koning, who joined the Smart Check team after the data was reweighted, said the results were a positive sign: “Because it was fair … we could continue the process.” 

The model also appeared to be better than caseworkers at identifying applications worthy of extra scrutiny, with internal testing showing a 20% improvement in accuracy.

Buoyed by these results, in the spring of 2023, the city was almost ready to go public. It submitted Smart Check to the Algorithm Register, a government-run transparency initiative meant to keep citizens informed about machine-learning algorithms either in development or already in use by the government.

For de Koning, the city’s extensive assessments and consultations were encouraging, particularly since they also revealed the biases in the analog system. But for de Zwart, those same processes represented a profound misunderstanding: that fairness could be engineered. 

In a letter to city officials, de Zwart criticized the premise of the project and, more specifically, outlined the unintended consequences that could result from reweighting the data. It might reduce bias against people with a migration background overall, but it wouldn’t guarantee fairness across intersecting identities; the model could still discriminate against women with a migration background, for instance. And even if that issue were addressed, he argued, the model might still treat migrant women in certain postal codes unfairly, and so on. And such biases would be hard to detect.

“The city has used all the tools in the responsible-AI tool kit,” de Zwart told us. “They have a bias test, a human rights assessment; [they have] taken into account automation bias—in short, everything that the responsible-AI world recommends. Nevertheless, the municipality has continued with something that is fundamentally a bad idea.”

Ultimately, he told us, it’s a question of whether it’s legitimate to use data on past behavior to judge “future behavior of your citizens that fundamentally you cannot predict.” 

Officials still pressed on—and set March 2023 as the date for the pilot to begin. Members of Amsterdam’s city council were given little warning. In fact, they were only informed the same month—to the disappointment of Elisabeth IJmker, a first-term council member from the Green Party, who balanced her role in municipal government with research on religion and values at Amsterdam’s Vrije University. 

“Reading the words ‘algorithm’ and ‘fraud prevention’ in one sentence, I think that’s worth a discussion,” she told us. But by the time that she learned about the project, the city had already been working on it for years. As far as she was concerned, it was clear that the city council was “being informed” rather than being asked to vote on the system. 

The city hoped the pilot could prove skeptics like her wrong.

Upping the stakes

The formal launch of Smart Check started with a limited set of actual welfare applicants, whose paperwork the city would run through the algorithm and assign a risk score to determine whether the application should be flagged for investigation. At the same time, a human would review the same application. 

Smart Check’s performance would be monitored on two key criteria. First, could it consider applicants without bias? And second, was Smart Check actually smart? In other words, could the complex math that made up the algorithm actually detect welfare fraud better and more fairly than human caseworkers? 

It didn’t take long to become clear that the model fell short on both fronts. 

While it had been designed to reduce the number of welfare applicants flagged for investigation, it was flagging more. And it proved no better than a human caseworker at identifying those that actually warranted extra scrutiny. 

What’s more, despite the lengths the city had gone to in order to recalibrate the system, bias reemerged in the live pilot. But this time, instead of wrongly flagging non-Dutch people and men as in the initial tests, the model was now more likely to wrongly flag applicants with Dutch nationality and women. 

Lighthouse’s own analysis also revealed other forms of bias unmentioned in the city’s documentation, including a greater likelihood that welfare applicants with children would be wrongly flagged for investigation. (Amsterdam officials did not respond to a request for comment about this finding, nor other follow up questions about general critiques of the city’s welfare system.)

The city was stuck. Nearly 1,600 welfare applications had been run through the model during the pilot period. But the results meant that members of the team were uncomfortable continuing to test—especially when there could be genuine consequences. In short, de Koning says, the city could not “definitely” say that “this is not discriminating.” 

He, and others working on the project, did not believe this was necessarily a reason to scrap Smart Check. They wanted more time—say, “a period of 12 months,” according to de Koning—to continue testing and refining the model. 

They knew, however, that would be a hard sell. 

In late November 2023, Rutger Groot Wassink—the city official in charge of social affairs—took his seat in the Amsterdam council chamber. He glanced at the tablet in front of him and then addressed the room: “I have decided to stop the pilot.”

The announcement brought an end to the sweeping multiyear experiment. In another council meeting a few months later, he explained why the project was terminated: “I would have found it very difficult to justify, if we were to come up with a pilot … that showed the algorithm contained enormous bias,” he said. “There would have been parties who would have rightly criticized me about that.” 

Viewed in a certain light, the city had tested out an innovative approach to identifying fraud in a way designed to minimize risks, found that it had not lived up to its promise, and scrapped it before the consequences for real people had a chance to multiply. 

But for IJmker and some of her city council colleagues focused on social welfare, there was also the question of opportunity cost. She recalls speaking with a colleague about how else the city could’ve spent that money—like to “hire some more people to do personal contact with the different people that we’re trying to reach.” 

City council members were never told exactly how much the effort cost, but in response to questions from MIT Technology Review, Lighthouse, and Trouw on this topic, the city estimated that it had spent some €500,000, plus €35,000 for the contract with Deloitte—but cautioned that the total amount put into the project was only an estimate, given that Smart Check was developed in house by various existing teams and staff members. 

For her part, van der Vliet, the Participation Council member, was not surprised by the poor result. The possibility of a discriminatory computer system was “precisely one of the reasons” her group hadn’t wanted the pilot, she says. And as for the discrimination in the existing system? “Yes,” she says, bluntly. “But we have always said that [it was discriminatory].” 

She and other advocates wished that the city had focused more on what they saw as the real problems facing welfare recipients: increases in the cost of living that have not, typically, been followed by increases in benefits; the need to document every change that could potentially affect their benefits eligibility; and the distrust with which they feel they are treated by the municipality. 

Can this kind of algorithm ever be done right?

When we spoke to Bodaar in March, a year and a half after the end of the pilot, he was candid in his reflections. “Perhaps it was unfortunate to immediately use one of the most complicated systems,” he said, “and perhaps it is also simply the case that it is not yet … the time to use artificial intelligence for this goal.”

“Niente, zero, nada. We’re not going to do that anymore,” he said about using AI to evaluate welfare applicants. “But we’re still thinking about this: What exactly have we learned?”

That is a question that IJmker thinks about too. In city council meetings she has brought up Smart Check as an example of what not to do. While she was glad that city employees had been thoughtful in their “many protocols,” she worried that the process obscured some of the larger questions of “philosophical” and “political values” that the city had yet to weigh in on as a matter of policy. 

Questions such as “How do we actually look at profiling?” or “What do we think is justified?”—or even “What is bias?” 

These questions are, “where politics comes in, or ethics,” she says, “and that’s something you cannot put into a checkbox.”

But now that the pilot has stopped, she worries that her fellow city officials might be too eager to move on. “I think a lot of people were just like, ‘Okay, well, we did this. We’re done, bye, end of story,’” she says. It feels like “a waste,” she adds, “because people worked on this for years.”

CHANTAL JAHCHAN

In abandoning the model, the city has returned to an analog process that its own analysis concluded was biased against women and Dutch nationals—a fact not lost on Berkers, the data scientist, who no longer works for the city. By shutting down the pilot, he says, the city sidestepped the uncomfortable truth—that many of the concerns de Zwart raised about the complex, layered biases within the Smart Check model also apply to the caseworker-led process.

“That’s the thing that I find a bit difficult about the decision,” Berkers says. “It’s a bit like no decision. It is a decision to go back to the analog process, which in itself has characteristics like bias.” 

Chen, the ethical-AI consultant, largely agrees. “Why do we hold AI systems to a higher standard than human agents?” he asks. When it comes to the caseworkers, he says, “there was no attempt to correct [the bias] systematically.” Amsterdam has promised to write a report on human biases in the welfare process, but the date has been pushed back several times.

“In reality, what ethics comes down to in practice is: nothing’s perfect,” he says. “There’s a high-level thing of Do not discriminate, which I think we can all agree on, but this example highlights some of the complexities of how you translate that [principle].” Ultimately, Chen believes that finding any solution will require trial and error, which by definition usually involves mistakes: “You have to pay that cost.”

But it may be time to more fundamentally reconsider how fairness should be defined—and by whom. Beyond the mathematical definitions, some researchers argue that the people most affected by the programs in question should have a greater say. “Such systems only work when people buy into them,” explains Elissa Redmiles, an assistant professor of computer science at Georgetown University who has studied algorithmic fairness. 

No matter what the process looks like, these are questions that every government will have to deal with—and urgently—in a future increasingly defined by AI. 

And, as de Zwart argues, if broader questions are not tackled, even well-intentioned officials deploying systems like Smart Check in cities like Amsterdam will be condemned to learn—or ignore—the same lessons over and over. 

“We are being seduced by technological solutions for the wrong problems,” he says. “Should we really want this? Why doesn’t the municipality build an algorithm that searches for people who do not apply for social assistance but are entitled to it?”


Eileen Guo is the senior reporter for features and investigations at MIT Technology Review. Gabriel Geiger is an investigative reporter at Lighthouse Reports. Justin-Casimir Braun is a data reporter at Lighthouse Reports.

Additional reporting by Jeroen van Raalte for Trouw, Melissa Heikkilä for MIT Technology Review, and Tahmeed Shafiq for Lighthouse Reports. Fact checked by Alice Milliken. 

You can read a detailed explanation of our technical methodology here. You can read Trouw‘s companion story, in Dutch, here.

Why humanoid robots need their own safety rules

Last year, a humanoid warehouse robot named Digit set to work handling boxes of Spanx. Digit can lift boxes up to 16 kilograms between trolleys and conveyor belts, taking over some of the heavier work for its human colleagues. It works in a restricted, defined area, separated from human workers by physical panels or laser barriers. That’s because while Digit is usually steady on its robot legs, which have a distinctive backwards knee-bend, it sometimes falls. For example, at a trade show in March, it appeared to be capably shifting boxes until it suddenly collapsed, face-planting on the concrete floor and dropping the container it was carrying.

The risk of that sort of malfunction happening around people is pretty scary. No one wants a 1.8-meter-tall, 65-kilogram machine toppling onto them, or a robot arm accidentally smashing into a sensitive body part. “Your throat is a good example,” says Pras Velagapudi, chief technology officer of Agility Robotics, Digit’s manufacturer. “If a robot were to hit it, even with a fraction of the force that it would need to carry a 50-pound tote, it could seriously injure a person.”

Physical stability—i.e., the ability to avoid tipping over—is the No. 1 safety concern identified by a group exploring new standards for humanoid robots. The IEEE Humanoid Study Group argues that humanoids differ from other robots, like industrial arms or existing mobile robots, in key ways and therefore require a new set of standards in order to protect the safety of operators, end users, and the general public. The group shared its initial findings with MIT Technology Review and plans to publish its full report later this summer. It identifies distinct challenges, including physical and psychosocial risks as well as issues such as privacy and security, that it feels standards organizations need to address before humanoids start being used in more collaborative scenarios.    

While humanoids are just taking their first tentative steps into industrial applications, the ultimate goal is to have them operating in close quarters with humans; one reason for making robots human-shaped in the first place is so they can more easily navigate the environments we’ve designed around ourselves. This means they will need to be able to share space with people, not just stay behind protective barriers. But first, they need to be safe.

One distinguishing feature of humanoids is that they are “dynamically stable,” says Aaron Prather, a director at the standards organization ASTM International and the IEEE group’s chair. This means they need power in order to stay upright; they exert force through their legs (or other limbs) to stay balanced. “In traditional robotics, if something happens, you hit the little red button, it kills the power, it stops,” Prather says. “You can’t really do that with a humanoid.” If you do, the robot will likely fall—potentially posing a bigger risk.

Slower brakes

What might a safety feature look like if it’s not an emergency stop? Agility Robotics is rolling out some new features on the latest version of Digit to try to address the toppling issue. Rather than instantly depowering (and likely falling down), the robot could decelerate more gently when, for instance, a person gets too close. “The robot basically has a fixed amount of time to try to get itself into a safe state,” Velagapudi says. Perhaps it puts down anything it’s carrying and drops to its hands and knees before powering down.

Different robots could tackle the problem in different ways. “We want to standardize the goal, not the way to get to the goal,” says Federico Vicentini, head of product safety at Boston Dynamics. Vicentini is chairing a working group at the International Organization for Standardization (ISO) to develop a new standard dedicated to the safety of industrial robots that need active control to maintain stability (experts at Agility Robotics are also involved). The idea, he says, is to set out clear safety expectations without constraining innovation on the part of robot and component manufacturers: “How to solve the problem is up to the designer.”

Trying to set universal standards while respecting freedom of design can pose challenges, however. First of all, how do you even define a humanoid robot? Does it need to have legs? Arms? A head? 

“One of our recommendations is that maybe we need to actually drop the term ‘humanoid’ altogether,” Prather says. His group advocates a classification system for humanoid robots that would take into account their capabilities, behavior, and intended use cases rather than how they look. The ISO standard Vicentini is working on refers to all industrial mobile robots “with actively controlled stability.” This would apply as much to Boston Dynamics’ dog-like quadruped Spot as to its bipedal humanoid Atlas, and could equally cover robots with wheels or some other kind of mobility.

How to speak robot

Aside from physical safety issues, humanoids pose a communication challenge. If they are to share space with people, they will need to recognize when someone’s about to cross their path and communicate their own intentions in a way everyone can understand, just as cars use brake lights and indicators to show the driver’s intent. Digit already has lights to show its status and the direction it’s traveling in, says Velagapudi, but it will need better indicators if it’s to work cooperatively, and ultimately collaboratively, with humans. 

“If Digit’s going to walk out into an aisle in front of you, you don’t want to be surprised by that,” he says. The robot could use voice commands, but audio alone is not practical for a loud industrial setting. It could be even more confusing if you have multiple robots in the same space—which one is trying to get your attention?

There’s also a psychological effect that differentiates humanoids from other kinds of robots, says Prather. We naturally anthropomorphize robots that look like us, which can lead us to overestimate their abilities and get frustrated if they don’t live up to those expectations. “Sometimes you let your guard down on safety, or your expectations of what that robot can do versus reality go higher,” he says. These issues are especially problematic when robots are intended to perform roles involving emotional labor or support for vulnerable people. The IEEE report recommends that any standards should include emotional safety assessments and policies that “mitigate psychological stress or alienation.”

To inform the report, Greta Hilburn, a user-centered designer at the US Defense Acquisition University, conducted surveys with a wide range of non-engineers to get a sense of their expectations around humanoid robots. People overwhelmingly wanted robots that could form facial expressions, read people’s micro-expressions, and use gestures, voice, and haptics to communicate. “They wanted everything—something that doesn’t exist,” she says.

Escaping the warehouse

Getting human-robot interaction right could be critical if humanoids are to move out of industrial spaces and into other contexts, such as hospitals, elderly care environments, or homes. It’s especially important for robots that may be working with vulnerable populations, says Hilburn. “The damage that can be done within an interaction with a robot if it’s not programmed to speak in a way to make a human feel safe, whether it be a child or an older adult, could certainly have different types of outcomes,” she says.

The IEEE group’s recommendations include enabling a human override, standardizing some visual and auditory cues, and aligning a robot’s appearance with its capabilities so as not to mislead users. If a robot looks human, Prather says, people will expect it to be able to hold a conversation and exhibit some emotional intelligence; if it can actually only do basic mechanical tasks, this could cause confusion, frustration, and a loss of trust. 

“It’s kind of like self-checkout machines,” he says. “No one expects them to chat with you or help with your groceries, because they’re clearly machines. But if they looked like a friendly employee and then just repeated ‘Please scan your next item,’ people would get annoyed.”

Prather and Hilburn both emphasize the need for inclusivity and adaptability when it comes to human-robot interaction. Can a robot communicate with deaf or blind people? Will it be able to adapt to waiting slightly longer for people who may need more time to respond? Can it understand different accents?

There may also need to be some different standards for robots that operate in different environments, says Prather. A robot working in a factory alongside people trained to interact with it is one thing, but a robot designed to help in the home or interact with kids at a theme park is another proposition. With some general ground rules in place, however, the public should ultimately be able to understand what robots are doing wherever they encounter them. It’s not about being prescriptive or holding back innovation, he says, but about setting some basic guidelines so that manufacturers, regulators, and end users all know what to expect: “We’re just saying you’ve got to hit this minimum bar—and we all agree below that is bad.”

The IEEE report is intended as a call to action for standards organizations, like Vicentini’s ISO group, to start the process of defining that bar. It’s still early for humanoid robots, says Vicentini—we haven’t seen the state of the art yet—but it’s better to get some checks and balances in place so the industry can move forward with confidence. Standards help manufacturers build trust in their products and make it easier to sell them in international markets, and regulators often rely on them when coming up with their own rules. Given the diversity of players in the field, it will be difficult to create a standard everyone agrees on, Vicentini says, but “everybody equally unhappy is good enough.”

The Download: Amsterdam’s welfare AI experiment, and making humanoid robots safer

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Inside Amsterdam’s high-stakes experiment to create fair welfare AI

Amsterdam thought it was on the right track. City officials in the welfare department believed they could build technology that would prevent fraud while protecting citizens’ rights. They followed these emerging best practices and invested a vast amount of time and money in a project that eventually processed live welfare applications. But in their pilot, they found that the system they’d developed was still not fair and effective. Why?

Lighthouse Reports, MIT Technology Review, and the Dutch newspaper Trouw have gained unprecedented access to the system to try to find out. Read about what we discovered.

—Eileen Guo, Gabriel Geiger & Justin-Casimir Braun

This story is a partnership between MIT Technology Review, Lighthouse Reports, and Trouw, and was supported by the Pulitzer Center. 

+ Can you make AI fairer than a judge? Play our courtroom algorithm game to find out.

Why humanoid robots need their own safety rules

While humanoid robots are taking their first tentative steps into industrial applications, the ultimate goal is to have them operating in close quarters with humans.

One reason for making robots human-shaped in the first place is so they can more easily navigate the environments we’ve designed around ourselves. This means they will need to be able to share space with people, not just stay behind protective barriers. But first, they need to be safe. Read the full story.

—Victoria Turk

MIT Technology Review Narrated: The surprising barrier that keeps us from building the housing we need

Sure, there’s too much red tape, but there is another reason building anything is so expensive: the construction industry’s “awful” productivity.

This is our latest story to be turned into a MIT Technology Review Narrated podcast, which 
we’re publishing each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Chatbots are getting facts about the LA riots wrong
AI systems can’t be relied upon at the best of times, let alone with fast-moving news. (Wired $)
+ What’s Trump’s goal here, exactly? (NY Mag $)

2 Gavin Newsom is becoming a meme
The California governor’s Trump clapbacks are winning him a legion of online fans. (WP $)
+ He’s accused the President of “pulling a military dragnet” across the city. (The Guardian)
+ Newsom has warned that other states are likely to be next. (Politico)

3 Trump’s Big Beautiful Bill could lead to more than 51,000 deaths a year
Due to the bill’s provisions for public health insurance. (Undark)

4 How Ukraine’s AI-guided drones hit Russia’s airfields
But its opponent is also stepping up its AI capabilities. (FT $)
+ Meet the radio-obsessed civilian shaping Ukraine’s drone defense. (MIT Technology Review)

5 US agencies tracked foreign nationals travelling to Elon Musk
Officials kept an eye on who visited him in 2022 and 2023. (WSJ $)

6 Snap’s new AR smart glasses will go on sale next year
Its sixth generation of Specs will enter an increasingly crowded field. (CNBC)
+ Qualcomm has made a new processor to power similar glasses. (Bloomberg $)
+ What’s next for smart glasses. (MIT Technology Review)

7 Each ChatGPT query uses ‘roughly one fifteenth of a teaspoon’ of water
That’s according to Sam Altman, at least. (The Verge)
+ We did the math on AI’s energy footprint. Here’s the story you haven’t heard. (MIT Technology Review)

8 Death Valley’s air could be a valuable water source
Scientists proved their hydrogel method worked in the real world. (New Scientist $)

9 Gen Z is choosing to skip college entirely
Increasing numbers of young tech workers are opting out and entering the workforce early. (Insider $)

10 How to fight back against a world of AI-generated choices
Good taste is your friend here. (The Atlantic $)

Quote of the day

“We’re probably going to have flying taxis before we have autonomous ones in London.”

—Steve McNamara, the general secretary of the UK’s Licensed Taxi Drivers’ Association, isn’t optimistic about London’s plans to trial autonomous cars, he tells the Guardian.

One more thing

Exosomes are touted as a trendy cure-all. We don’t know if they work.

There’s a trendy new cure-all in town—you might have seen ads pop up on social media or read rave reviews in beauty magazines.

Exosomes are being touted as a miraculous treatment for hair loss, aging skin, acne, eczema, pain conditions, long covid, and even neurological diseases like Parkinson’s and Alzheimer’s. That’s, of course, if you can afford the price tag—which can stretch to thousands of dollars.

But there’s a big problem with these big promises: We don’t fully understand how exosomes work—or what they even really are. Read our story.

—Jessica Hamzelou

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Here’s how to tap into your flow state and get things DONE.
+ Check out these must-see art shows and exhibitions of the year.
+ Everybody’s free (to listen to one of the best hits of the 90s) ☀
+ Turns out 10CC frontman Graham Gouldman doesn’t just like cricket—he’s just watched his first ever game and he really does love it 🏏