What it’s like to be banned from the US for fighting online hate

It was early evening in Berlin, just a day before Christmas Eve, when Josephine Ballon got an unexpected email from US Customs and Border Protection. The status of her ability to travel to the United States had changed—she’d no longer be able to enter the country. 

At first, she couldn’t find any information online as to why, though she had her suspicions. She was one of the directors of HateAid, a small German nonprofit founded to support the victims of online harassment and violence. As the organization has become a strong advocate of EU tech regulations, it has increasingly found itself attacked in campaigns from right-wing politicians and provocateurs who claim that it engages in censorship. 

It was only later that she saw what US Secretary of State Marco Rubio had posted on X:

Rubio was promoting a conspiracy theory about what he has called the “censorship-industrial complex,” which alleges widespread collusion between the US government, tech companies, and civil society organizations to silence conservative voices—the very conspiracy theory HateAid has recently been caught up in. 

Then Undersecretary of State Sarah B. Rogers posted on X the names of the people targeted by travel bans. The list included Ballon, as well as her HateAid co-director, Anna Lena von Hodenberg. Also named were three others doing similar or related work: former EU commissioner Thierry Breton, who had helped author Europe’s Digital Services Act (DSA); Imran Ahmed of the Center for Countering Digital Hate, which documents hate speech on social media platforms; and Clare Melford of the Global Disinformation Index, which provides risk ratings warning advertisers about placing ads on websites promoting hate speech and disinformation. 

It was an escalation in the Trump administration’s war on digital rights—fought in the name of free speech. But EU officials, freedom of speech experts, and the five people targeted all flatly reject the accusations of censorship. Ballon, von Hodenberg, and some of their clients tell me that their work is fundamentally about making people feel safer online. And their experiences over the past few weeks show just how politicized and besieged their work in online safety has become. They almost certainly won’t be the last people targeted in this way. 

Ballon was the one to tell von Hodenberg that both their names were on the list. “We kind of felt a chill in our bones,” von Hodenberg told me when I caught up with the pair in early January. 

But she added that they also quickly realized, “Okay, it’s the old playbook to silence us.” So they got to work—starting with challenging the narrative the US government was pushing about them.

Within a few hours, Ballon and von Hodenberg had issued a strongly worded statement refuting the allegations: “We will not be intimidated by a government that uses accusations of censorship to silence those who stand up for human rights and freedom of expression,” they wrote. “We demand a clear signal from the German government and the European Commission that this is unacceptable. Otherwise, no civil society organisation, no politician, no researcher, and certainly no individual will dare to denounce abuses by US tech companies in the future.” 

Those signals came swiftly. On X, Johann Wadephul, the German foreign minister, called the entry bans “not acceptable,” adding that “the DSA was democratically adopted by the EU, for the EU—it does not have extraterritorial effect.” Also on X, French president Emmanuel Macron wrote that “these measures amount to intimidation and coercion aimed at undermining European digital sovereignty.” The European Commission issued a statement that it “strongly condemns” the Trump administration’s actions and reaffirmed its “sovereign right to regulate economic activity in line with our democratic values.” 

Ahmed, Melford, Breton, and their respective organizations also made their own statements denouncing the entry bans. Ahmed, the only one of the five based in the United States, also successfully filed suit to preempt any attempts to detain him, which the State Department had indicated it would consider doing.  

But alongside the statements of solidarity, Ballon and von Hodenberg said, they also received more practical advice: Assume the travel ban was just the start and that more consequences could be coming. Service providers might preemptively revoke access to their online accounts; banks might restrict their access to money or the global payment system; they might see malicious attempts to get hold of their personal data or that of their clients. Perhaps, allies told them, they should even consider moving their money into friends’ accounts or keeping cash on hand so that they could pay their team’s salaries—and buy their families’ groceries. 

These warnings felt particularly urgent given that just days before, the Trump administration had sanctioned two International Criminal Court judges for “illegitimate targeting of Israel.” As a result, they had lost access to many American tech platforms, including Microsoft, Amazon, and Gmail. 

“If Microsoft does that to someone who is a lot more important than we are,” Ballon told me, “they will not even blink to shut down the email accounts from some random human rights organization in Germany.”   

“We have now this dark cloud over us that any minute, something can happen,” von Hodenberg added. “We’re running against time to take the appropriate measures.”

Helping navigate “a lawless place”

Founded in 2018 to support people experiencing digital violence, HateAid has since evolved to defend digital rights more broadly. It provides ways for people to report illegal online content and offers victims advice, digital security, emotional support, and help with evidence preservation. It also educates German police, prosecutors, and politicians about how to handle online hate crimes. 

Once the group is contacted for help, and if its lawyers determine that the type of harassment has likely violated the law, the organization connects victims with legal counsel who can help them file civil and criminal lawsuits against perpetrators, and if necessary, helps finance the cases. (HateAid itself does not file cases against individuals.) Ballon and von Hodenberg estimate that HateAid has worked with around 7,500 victims and helped them file 700 criminal cases and 300 civil cases, mostly against individual offenders.

For 23-year-old German law student and outspoken political activist Theresia Crone, HateAid’s support has meant that she has been able to regain some sense of agency in her life, both on and offline. She had reached out after she discovered entire online forums dedicated to making deepfakes of her. Without HateAid, she told me, “I would have had to either put my faith into the police and the public prosecutor to prosecute this properly, or I would have had to foot the bill of an attorney myself”—a huge financial burden for “a student with basically no fixed income.” 

In addition, working alone would have been retraumatizing: “I would have had to document everything by myself,” she said—meaning “I would have had to see all of these pictures again and again.” 

“The internet is a lawless place,” Ballon told me when we first spoke, back in mid-December, a few weeks before the travel ban was announced. In a conference room at the HateAid office in Berlin, she said there are many cases that “cannot even be prosecuted, because no perpetrator is identified.” That’s why the nonprofit also advocates for better laws and regulations governing technology companies in Germany and across the European Union. 

On occasion, they have also engaged in strategic litigation against the platforms themselves. In 2023, for example, HateAid and the European Union of Jewish Students sued X for failing to enforce its terms of service against posts that were antisemitic or that denied the Holocaust, which is illegal in Germany. 

This almost certainly put the organization in the crosshairs of X owner Elon Musk; it also made HateAid a frequent target of Germany’s far right party, the Alternative für Deutschland, which Musk has called “the only hope for Germany.” (X did not respond to a request to comment on this lawsuit.)

HateAid gets caught in Trump World’s dragnet

For better and worse, HateAid’s profile grew further when it took on another critical job in online safety. In June 2024, it was named as a trusted flagger organization under the Digital Services Act, a 2022 EU law that requires social media companies to remove certain content (including hate speech and violence) that violates national laws, and to provide more transparency to the public, in part by allowing more appeals on platforms’ moderation decisions. 

Trusted flaggers are entities designated by individual EU countries to point out illegal content, and they are a key part of DSA enforcement. While anyone can report such content, trusted flaggers’ reports are prioritized and legally require a response from the platforms. 

The Trump administration has loudly argued that the trusted flagger program and the DSA more broadly are examples of censorship that disproportionately affect voices on the right and American technology companies, like X. 

When we first spoke in December, Ballon said these claims of censorship simply don’t hold water: “We don’t delete content, and we also don’t, like, flag content publicly for everyone to see and to shame people. The only thing that we do: We use the same notification channels that everyone can use, and the only thing that is in the Digital Services Act is that platforms should prioritize our reporting.” Then it is on the platforms to decide what to do. 

Nevertheless, the idea that HateAid and like-minded organizations are censoring the right has become a powerful conspiracy theory with real-world consequences. (Last year, MIT Technology Review covered the closure of a small State Department office following allegations that it had conducted “censorship,” as well as an unusual attempt by State leadership to access internal records related to supposed censorship—including information about two of the people who have now been banned, Medford and Ahmed, and both of their organizations.) 

HateAid saw a fresh wave of harassment starting last February, when 60 Minutes aired a documentary on hate speech laws in Germany; it featured a quote from Ballon that “free speech needs boundaries,” which, she added, “are part of our constitution.” The interview happened to air just days before Vice President JD Vance attended the Munich Security Conference; there he warned that “across Europe, free speech … is in retreat.” This, Ballon told me, led to heightened hostility toward her and her organization. 

Fast-forward to July, when a report by Republicans in the US House of Representatives claimed that the DSA “compels censorship and infringes on American free speech.” HateAid was explicitly named in the report. 

All of this has made its work “more dangerous,” Ballon told me in December. Before the 60 Minutes interview, “maybe one and a half years ago, as an organization, there were attacks against us, but mostly against our clients, because they were the activists, the journalists, the politicians at the forefront. But now … we see them becoming more personal.” 

As a result, over the last year, HateAid has taken more steps to protect its reputation and get ahead of the damaging narratives. Ballon has reported the hate speech targeted at her—“More [complaints] than in all the years I did this job before,” she said—as well as defamation lawsuits on behalf of HateAid. 

All these tensions finally came to a head in December. At the start of the month, the European Commission fined X $140 million for DSA violations. This set off yet another round of recriminations about supposed censorship of the right, with Trump calling the fine “a nasty one” and warning: “Europe has to be very careful.”

Just a few weeks later, the day before Christmas Eve, retaliation against individuals finally arrived. 

Who gets to define—and experience—free speech

Digital rights groups are pushing back against the Trump administration’s narrow view of what constitutes free speech and censorship.

“What we see from this administration is a conception of freedom of expression that is not a human-rights-based conception where this is an inalienable, indelible right that’s held by every person,” says David Greene, the civil liberties director of the Electronic Frontier Foundation, a US-based digital rights group. Rather, he sees an “expectation that… [if] anybody else’s speech is challenged, there’s a good reason for it, but it should never happen to them.” 

Since Trump won his second term, social media platforms have walked back their commitments to trust and safety. Meta, for example, ended fact-checking on Facebook and adopted much of the administration’s censorship language, with CEO Mark Zuckerberg telling the podcaster Joe Rogan that it would “work with President Trump to push back on governments around the world” if they are seen as “going after American companies and pushing to censor more.”

Have more information on this story or a tip for something else that we should report? Using a non-work device, reach the reporter on Signal at eileenguo.15 or tips@technologyreview.com.

And as the recent fines on X show, Musk’s platform has gone even further in flouting European law—and, ultimately, ignoring the user rights that the DSA was written to protect. In perhaps one of the most egregious examples yet, in recent weeks X allowed people to use Grok, its AI generator, to create nonconsensual nude images of women and children, with few limits—and, so far at least, few consequences. (Last week, X released a statement that it would start limiting users’ ability to create explicit images with Grok; in response to a number of questions, X representative Rosemarie Esposito pointed me to that statement.) 

For Ballon, it makes perfect sense: “You can better make money if you don’t have to implement safety measures and don’t have to invest money in making your platform the safest place,” she told me.

“It goes both ways,” von Hodenberg added. “It’s not only the platforms who profit from the US administration undermining European laws … but also, obviously, the US administration also has a huge interest in not regulating the platforms … because who is amplified right now? It’s the extreme right.”

She believes this explains why HateAid—and Ahmed’s Center for Countering Digital Hate and Melford’s Global Disinformation Index, as well as Breton and the DSA—have been targeted: They are working to disrupt this “unholy deal where the platforms profit economically and the US administration is profiting in dividing the European Union,” she said. 

The travel restrictions intentionally send a strong message to all groups that work to hold tech companies accountable. “It’s purely vindictive,” Greene says. “It’s designed to punish people from pursuing further work on disinformation or anti-hate work.” (The State Department did not respond to a request for comment.)

And ultimately, this has a broad effect on who feels safe enough to participate online. 

Ballon pointed to research that shows the “silencing effect” of harassment and hate speech, not only for “those who have been attacked,” but also for those who witness such attacks. This is particularly true for women, who tend to face more online hate that is also more sexualized and violent. It’ll only be worse if groups like HateAid get deplatformed or lose funding. 

Von Hodenberg put it more bluntly: “They reclaim freedom of speech for themselves when they want to say whatever they want, but they silence and censor the ones that criticize them.”

Still, the HateAid directors insist they’re not backing down. They say they’re taking “all advice” they have received seriously, especially with regard to “becoming more independent from service providers,” Ballon told me.

“Part of the reason that they don’t like us is because we are strengthening our clients and empowering them,” said von Hodenberg. “We are making sure that they are not succeeding, and not withdrawing from the public debate.” 

“So when they think they can silence us by attacking us? That is just a very wrong perception.”

Martin Sona contributed reporting.

Correction: This article originally misstated the name of Germany’s far right party.

Going beyond pilots with composable and sovereign AI

Today marks an inflection point for enterprise AI adoption. Despite billions invested in generative AI, only 5% of integrated pilots deliver measurable business value and nearly one in two companies abandons AI initiatives before reaching production.

The bottleneck is not the models themselves. What’s holding enterprises back is the surrounding infrastructure: Limited data accessibility, rigid integration, and fragile deployment pathways prevent AI initiatives from scaling beyond early LLM and RAG experiments. In response, enterprises are moving toward composable and sovereign AI architectures that lower costs, preserve data ownership, and adapt to the rapid, unpredictable evolution of AI—a shift IDC expects 75% of global businesses to make by 2027.

The concept to production reality

AI pilots almost always work, and that’s the problem. Proofs of concept (PoCs) are meant to validate feasibility, surface use cases, and build confidence for larger investments. But they thrive in conditions that rarely resemble the realities of production.

Source: Compiled by MIT Technology Review Insights with data from Informatica, CDO Insights 2025 report, 2026

“PoCs live inside a safe bubble” observes Cristopher Kuehl, chief data officer at Continent 8 Technologies. Data is carefully curated, integrations are few, and the work is often handled by the most senior and motivated teams.

The result, according to Gerry Murray, research director at IDC, is not so much pilot failure as structural mis-design: Many AI initiatives are effectively “set up for failure from the start.”

Download the article.

The Download: the US digital rights crackdown, and AI companionship

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

What it’s like to be banned from the US for fighting online hate  

Just before Christmas the Trump administration dramatically escalated its war on digital rights by banning five people from entering the US. One of them, Josephine Ballon, is a director of HateAid, a small German nonprofit founded to support the victims of online harassment and violence. The organization is a strong advocate of EU tech regulations, and so finds itself attacked in campaigns from right-wing politicians and provocateurs who claim that it engages in censorship. 

EU officials, freedom of speech experts, and the five people targeted all flatly reject these accusations. Ballon told us that their work is fundamentally about making people feel safer online. But their experiences over the past few weeks show just how politicized and besieged their work in online safety has become. Read the full story

—Eileen Guo

TR10: AI companions

Chatbots are skilled at crafting sophisticated dialogue and mimicking empathetic behavior. They never get tired of chatting. It’s no wonder, then, that so many people now use them for companionship—forging friendships or even romantic relationships. 

72% of US teenagers have used AI for companionship, according to a study from the nonprofit Common Sense Media. But while chatbots can provide much-needed emotional support and guidance for some people, they can exacerbate underlying problems in others—especially vulnerable people or those with mental health issues. 

Although some early attempts to regulate this space are underway, AI companionship is going nowhere. Read why we made it one of our 10 Breakthrough Technologies this year, and check out the rest of the list.

And, if you want to learn more about what we predict for AI this year, sign up to join me for our free LinkedIn Live event tomorrow at 12.30pm ET.

Why inventing new emotions feels so good  

Have you ever felt “velvetmist”?  

It’s a “complex and subtle emotion that elicits feelings of comfort, serenity, and a gentle sense of floating.” It’s peaceful, but more ephemeral and intangible than contentment. It might be evoked by the sight of a sunset or a moody, low-key album.  

If you haven’t ever felt this sensation—or even heard of it—that’s not surprising. A Reddit user generated it with ChatGPT, along with advice on how to evoke the feeling. Don’t scoff: Researchers say more and more terms for these “neo-­emotions” are showing up online, describing new dimensions and aspects of feeling. Read our story to learn more about why

—Anya Kamenetz

This story is from the latest print issue of MIT Technology Review. If you haven’t already, subscribe now to receive the next edition as soon as it lands (and benefit from some hefty seasonal discounts too!)

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Ads are coming to ChatGPT 
For American users initially, with plans to expand soon. (CNN)
Here’s how they’ll work. (Wired $)

2 What will we be able to salvage after the AI bubble bursts? 
It will be ugly, but there are plenty of good uses for AI that we’ll want to keep. (The Guardian
What even is the AI bubble? (MIT Technology Review)

3 It’s almost impossible to mine Greenland’s natural resources 
It has vast supplies of rare earth elements, but its harsh climate and environment make them very hard to access. (The Week)

4 Iran is now 10 days into its internet shutdown
It’s one of the longest and most extreme we’ve ever witnessed. (BBC)
+  Starlink isn’t proving as helpful as hoped as the regime finds ways to jam it. (Reuters $)
Battles are raging online about what’s really going on inside Iran. (NYT $)

5 America is heading for a polymarket disaster 
Prediction markets are getting out of control, and some people are losing a lot of money. (The Atlantic $)
They were first embraced by political junkies, but now they’re everywhere. (NYT $)

6 How to fireproof a city 
Californians are starting to fight fires before they can even start. (The Verge $)
+ How AI can help spot wildfires. (MIT Technology Review)

7 Stoking ‘deep state’ conspiracy theories can be dangerous 
Especially if you’re then given the task of helping run one of those state institutions, as Dan Bongino is now learning. (WP $)
Why everything is a conspiracy now. (MIT Technology Review)

8 Why we’re suddenly all having a ‘Very Chinese Time’ 🇨🇳
It’s a fun, flippant trend—but it also shows how China’s soft power is growing around the globe. (Wired $) 

9 Why there’s no one best way to store information
Each one involves trade-offs between space and time. (Quanta $)

10 Meat may play a surprising role in helping people reach 100
Perhaps because it can assist with building stronger muscles and bones. (New Scientist $)

Quote of the day

“That’s the level of anxiety now – people watching the skies and the seas themselves because they don’t know what else to do.”

—A Greenlander tells The Guardian just how seriously she and her fellow compatriots are taking Trump’s threat to invade their country. 

One more thing

three silhouetted people in a boat crossing the water in the dark toward a beam of light

KATHERINE LAM

Inside a romance scam compound—and how people get tricked into being there

Gavesh’s journey started, seemingly innocently, with a job ad on Facebook promising work he desperately needed.

Instead, he found himself trafficked into a business commonly known as “pig butchering”—a form of fraud in which scammers form close relationships with targets online and extract money from them. The Chinese crime syndicates behind the scams have netted billions of dollars, and they have used violence and coercion to force their workers, many of them trafficked like Gavesh, to carry out the frauds from large compounds, several of which operate openly in the quasi-lawless borderlands of Myanmar.

Big Tech may hold the key to breaking up the scam syndicates—if these companies can be persuaded or compelled to act. Read the full story.

—Peter Guest & Emily Fishbein

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Blue Monday isn’t real (but it is an absolute banger of a track.) 
+ Some great advice here about how to be productive during the working day.
+ Twelfth Night is one of Shakespeare’s most fun plays—as these top actors can attest
+ If the cold and dark gets to you, try making yourself a delicious bowl of soup

Google’s Core Updates, Explained

Google released another Core Update to its search algorithm over the holidays. It was the most comprehensive update of 2025.

Google changes its algorithm frequently. Some are more widespread than others. Unlike Spam Updates, Core Updates generally do not penalize but, instead, alter how the algorithm treats certain queries and their intent.

For example, a Core Update may result in more “best of” listings (rather than product categories) in search results. Ecommerce sites may lose traffic, but not because of anything they’ve done, so no fix is required.

Yet a Core Update may result in higher rankings for certain types of content, which could prompt merchants to add those pages.

Core Updates can elevate a wide range of queries. The recent holiday update lowered the listings of large publishers and elevated niche sites. Search Engine Journal reported that Macy’s rankings decreased, while those of Columbia, The North Face, and Fragrance Market increased.

Content helpfulness

Google’s infamous Helpful Content algorithm is now part of its Core Updates and can, in theory, target an “unhelpful” site.

Google provides guidelines to human evaluators for what makes content helpful. It’s the best indicator for search optimizers as to Google’s definition of that term. To paraphrase from the guidelines:

  • Websites should place the most useful portions at the top of a page.
  • The amount of effort, originality, and skill determines the quality of the content.
  • Avoid unnecessary fluff or “filler” content that obscures what visitors are looking for.
  • Use clear titles and headings that inform, not oversell.

If a Core Update resulted in lost traffic, scrutinize your content helpfulness and on-page engagement.

How to recover

It’s often difficult to know why a Core Update lowered a site’s rankings. To diagnose, I typically start with the helpfulness of its pages and its overall engagement.

The first step is always to identify what was lost. Search Console will reveal the impacted queries:

  • Go to the full “Performance” report.
  • Choose “Compare” in the “More” filter.
  • Choose “Custom” and set start and end dates to expose the week before the change (early December for the most recent update) and the week after (beginning of January). Click “Apply.”
  • Sort the ensuing “Queries” column and the “Clicks Difference” column to see queries that now generate fewer clicks.
Screenshot of Search Console page for customizing a date range.

Select a before and after date range in Search Console to identify queries that generate fewer organic clicks.

Next, manually search Google for each affected query to determine if results shifted broadly or only for your page. The appearance of many new listings that answer a query in a new way may indicate a broad shift.

Semrush provides monthly snapshots of ranking URLs for each query. Refer to its archive to see how your overall SERPs have changed. If you see a widespread shift (i.e., 80% of listings are new for a given query), there is likely no fix needed. It’s Google changing its algorithm.

If only your site is downranked, most definitely look at the impacted pages and how to make them more helpful and engaging, such as:

  • Move the main portion, such as a quick answer to a search query, to the top.
  • Improve page structure and subheadings.
  • Remove ads, such as intrusive pop-ups, that block users from interacting with a page.
  • Add jump-to links that help visitors navigate the page.
  • Include social proof on the page.
  • Show the author’s name and bio.
  • Link to trusted sources.
  • Add helpful images and videos.
  • Update the page with recent data, trends, and stats (with sources).
  • Add explanatory sections, such as FAQs and definitions, tailored to the page’s purpose.

Helpfulness is subjective and vague. Nonetheless, consider your target audience and tailor your content accordingly.

Google announces only substantial Core Updates, those that affect many users. Lesser, unannounced updates occur more often and can result in recoveries.

How To Analyze Google Discover

TL;DR

  1. To generate the most value from Discover, view it through an entity-focused lens. People, places, organisations, teams, et al.
  2. Your best chance of success in Discover with an individual article is to make sure it outperforms its expected performance early. So share, share, share.
  3. Then analyze the type of content you create. What makes it clickable? What resonates? What headline and image combination works?
  4. High CTR is key for success, but “curiosity gap” headlines that fail to deliver kill long-term credibility. User satisfaction trumps clickiness over time.

Discover isn’t a completely black box. We have a decent idea of how it works and can reverse engineer more value with some smart analysis.

Yes, there’s always going to be some surprises. It’s a bit mental at times. But we can make the most of the platform without destroying our credibility by publishing articles about vitamin B12 titled:

“Outlive your children with this one secret trick the government don’t want you to know about.”

Key Tenets Of Discover

Before diving in headfirst, let’s check the depth of the pool.

“Sustained presence on search helps maintain your status as a trustworthy publisher.”

  • Discover feeds off fresh content. While evergreen content pops up, it is very closely related to the news.
  • More lifestyle-y, engaging content tends to thrive on the clickless platform.
  • Just like news, Discover is very entity, click, and early engagement driven.
  • The personalized platform groups cohorts of people together. If you satiate one, more of that cohort will likely follow.
  • If your content outperforms its predicted early-stage performance, it is more likely to be boosted.
  • Once the groups of potentially interested doomscrollers have been saturated, content performance naturally drops off.
  • Google is empowering our ability to find individual creators and video content on the platform, because people trust people and like watching stuff. Stunned.

Obviously, loads of people know how to game the system and have become pretty rich by doing so. If you want to laugh and cry in equal measure, see the state of Google’s spam problems here.

No sign of it being fixed either (Image Credit: Harry Clarkson-Bennett)

Most algorithms follow the Golden Hour Rule. Not to be confused with the golden shower rule, it means the first 60 minutes after posting determine whether algorithms will amplify or bury your content.

If you want to go viral, your best bet is to drive early stage engagement.

What Data Points Should You Analyze?

This is focused more on how you, as an SEO or analyst, can get more value out of the platform. So, let’s take conversions and click/impression data as read. We’re going deeper. This isn’t amateur hour.

I think you need to track the below and I’ll explain why.

  • CTR.
  • Entities.
  • Subfolders.
  • Authorship.
  • Headlines and images.
  • Content type (just a simple breakdown of news, how-tos, interviews, evergreen guides, etc.).
  • Publishing performance.

You need to already get traffic from Discover to generate value from this analysis. If you don’t, revert back to creating high-quality, unique content in your niche(s) and push it out to the wider world.

Create great content and get the right people sharing it.

Worth noting you can’t accurately identify Discover traffic in analytics platforms. You have to accept some of it will be mis-attributed. Most companies make an educated guess of sorts, using a combination of Google and mobile/android to group it together.

CTR

CTR is one of the foundational metrics of news SEO, Top Stories, Discover, and almost any form of real-time SEO. It is far more prevalent in news than traditional SEO because the algorithm is making decisions about what content should be promoted in almost real time.

Evergreen results are altered continuously, based on much longer-term engagement.

This is weighted alongside some kind of traditional Navboost engagement data – clicks, on-page interactions, session duration, et al. – to associate a clickable headline and image with content that serves the user effectively.

It’s also one of the reasons why clickbait has (broadly) started to die a death. Like rampant AI slop, even the mouth breathers will tire of it eventually.

To get the most out of CTR, you need to combine it with:

  • Image type.
  • Headline type (content type too).
  • And entity analysis.

Entity Analysis

Entities are more important in news than any other part of SEO. While entity SEO has been growing in popularity for years, news sites have been obsessed with entities (arguably without knowing it), for years.

Individual entity analysis based on the title and page content is perfect for Discover (Image Credit: Harry Clarkson-Bennett)

While it isn’t as easy to just frontload headlines with relevant entities to get traffic anymore, there’s still a real value in analyzing performance at an entity level.

Particularly in Discover.

You want to know what people, places, and organizations (arguably, these three make up 85%+ of all entities you need to care about) drive value for you and users in Discover.

To run proper entity analysis you cannot do this manually. At least not well or at scale.

My advice is to use a combination of your LLM of choice, an NER (Named Entity Recognition) tool and either Google’s Knowledge Graph or WikiData.

You can then extract the entity from the page in question (the title), disambiguate using the on page content (this helps you assess whether ‘apple’ is the computing company, the fruit or an idiotic celebrities daughter) and confirm it with WikiData or Google’s Knowledge Graph.

Bubble charts are a fantastic way of quickly visualizing opportunities for content, not just for Discover (Image Credit: Harry Clarkson-Bennett)

Subfolder

Relatively straightforward, but you want to know which subfolders tend to generate more impressions and clicks on average in Discover. This is particularly valuable if you work on larger sites with a lot of subfolders and high content production.

I like to break down entity performance at a subfolder level like so (Image Credit: Harry Clarkson-Bennett)

You want to make sure that everything you do maximizes value.

This becomes far more valuable when you combine this data with the type of headline and entities. If you begin to understand the type of headline (and content) that works for specific subfolders, you can help commissioners and writers make smarter decisions.

Subfolders that tend to perform better in Discover give individual articles a better chance of success.

Generate a list of all of your subfolders (or topics if your site isn’t setup particularly effectively) and tracking clicks, impressions and CTR over time. I’d use total clicks, impressions and CTR and an average per article as a starting point.

Authorship

Google tracks authorship in search. No ifs, no buts. The person who writes the content has significance when it comes to E-E-A-T, and good, reliable authorship makes a difference.

How much significance, I don’t know. And neither do you.

In breaking down all metrics from the leak that mention the word “author,” the below is how Google perceives and values authorship. As always, this is an imperfect science, but it’s interesting to note that of the 35 categories I reviewed, almost half are related just to identifying the author.

Not just who authored the article, but how clear is their online footprint (Image Credit: Harry Clarkson-Bennett)

Disambiguation is one of the most important components of modern-day search. Semantic SEO. Knowledge graphs. Structured data. E-E-A-T. A huge amount of this is designed to counter false documents, AI slop, and misinformation.

So, it’s really important for search (and Discover) that you provide undeniable clarity.

For Discover specifically, you should see authors through the prism of:

  • How many articles have they written that make it onto Discover (and that perform in Search)?
  • What topic/entities do they perform best with?
  • Ditto headline type.

Headline Type

This is a really good way of viewing the type of content that tends to perform for you. For example, you want to know whether curiosity gap headlines work well for you and whether headlines with numbers have a higher or lower CTR on average.

  • Do headlines with celebrities in the headline work well for you?
  • Does this differ by subfolder?
  • Do first-person headlines have a higher CTR in Money than in News?

These are all questions and hypotheses that you should be asking. Although you can’t scrape Discover directly (trust me, I’ve tried), you can hypothesize which H1, page title, and OG title is the clickiest.

The top headline is a list that piques my curiosity (although I’d add in a number here), and the bottom is more of a straight “how-to.” (Image Credit: Harry Clarkson-Bennett)

What’s interesting in this example is that “how-to” headlines are not portrayed as very Discover-friendly. But it’s the concept that sells it. It’s different.

Start by defining all the types of headlines you use – curiosity gap, localized, numbered lists, questions, how-to or utility type, emotional trigger, first person, et al. – and analyze how effective each one is.

Use a machine learning model (you can absolutely use ChatGPT’s API) to categorize each headline.

  • Train the model to identify place names, numbers, questions, and first-person style patterns.
  • Verify the quality of the categorization.
  • Break this down by subfolder, author, entity, or anything else you choose.

Worth noting that there are five different headlines you and Google can and should be using to determine how content is perceived. Discover is known to use the OG title more frequently than traditional search.

It’s an opportunity to create a “clickier” headline than you would typically use in the H1 or page title.

Images

Images fall into a similar category as headlines. They’re crucial. You can’t definitively prove which image gets pulled through into Discover. But as long as your featured image is 1200 px wide, it’s safe-ish to assume this is the one that’s used.

CTR is arguably the single biggest factor in determining early success. Continued success, I believe, is more Navboost-related – more traditional-ish engagement.

And CTR in Discover is determined by two things:

  1. The headline.
  2. The image.

Well, two things in your control. You could be pedantic and say, “Ooo, your brand is an important factor in CTR, actually. Psychologically, people always click on…”

And I’d tell you to bore off. We’re talking about an individual article. We’ve done a significant amount of image testing and know that in straight news, people like seeing people looking sad. They like real-ness.

In money, they like people looking at the camera, looking happy. It makes them feel safe in a financial decision.

People looking evocatively miserable, looking directly at the camera. Probably clickable, but you need to test (Image Credit: Harry Clarkson-Bennett)

Stupid, I know. But we’re not an intelligent race. Sure, there are a few outliers. Galileo. Einstein. Noel Edmonds. But the rest of us are just trying not to throw stuff at each other outside Yates’s on a Friday night.

It is actually why clickbait headlines have worked for years. It works until it doesn’t.

You’ll need to upload a set of images to help train the model, and please don’t take it as gospel. Check the outputs. For the basics – whether people are present, where they’re looking, color schemes, etc. – great. For more nuanced decisions like trustworthiness or emotional meaning, you’ll need to do that yourself.

Worth noting that lots of publishers trial badges and logos on images. And for good reason. Images with logos consistently click higher for larger brands (to the best of my knowledge), and if you’re a paywalled site, but have set live blogs to free, it’s worth telling people.

You should breakdown this image analysis into:

  • Human presence and gaze.
  • Facial expression.
  • Emotional resonance.
  • Composition and framing.
  • Colour schemes.
  • Photo-type.

Then you can use machine learning to bucket photos into groups to help determine CTR. For example, people directly looking at a camera + smiling could be one bucket. Not looking at a camera + scowling.

Publishing Performance

The more you publish, the more this matters.

Large newsrooms run analysis on publishing volumes, times, and content freshness fairly consistently and at a desk-level. If you only have 50 or fewer articles per month making it into Discover, you probably don’t need to do this.

But if we’re talking about hundreds or thousands of articles, these insights can be really useful to commissioners.

I would focus on:

  • Publishing days.
  • Publishing times.
  • Content freshness.
  • Republishing vs. publishing.
Breaking things down at a subfolder level is always crucial (Image Credit: Harry Clarkson-Bennett)
Day of the week data is always useful for larger publishers to get the most value out of their publishing (Image Credit: Harry Clarkson-Bennett)
Image Credit: Harry Clarkson-Bennett

Your output should give really clear guidance to desks, commissioners, and publishers around when is best to publish for peak Discover performance.

We never make direct recommendations solely for Discover for a number of reasons. Discover is a highly volatile platform and one that does reward nonsense. It can lead you down the garden path with all sorts of thin, curiousity gap style content if you just follow the numbers.

And it has limited direct impact on your bottom line.

How Do You Tie This All Together?

You need a clear set of goals. Goals that help you deliver analysis that directly impacts the value of your content in Discover. When you set your analysis, focus on elements you have more control over.

For example, you might not be able to control what commissioners choose to publish, but you can change the headline (H1, title, and/or OG) and image prior to publish.

  1. Set a clear goal around conversions and traffic.
  2. Understand what you have more control over.
  3. Deliver insights at a desk or subfolder level.

Understanding whether your role is more strategic or tactical is crucial. Strategic roles are more advisory in nature. You can offer some thoughts and advice on the type of headlines and entities to avoid or choose, but you may not be able to change them.

Tactical roles mean you have more say in the implementation of change. Headlines, publish times, entity targeting, etc.

Simple.

More Resources:


This post was originally published on Leadership in SEO.


Featured Image: Master1305/Shutterstock

How Much Of Your Paid Media Budget Should Be Allocated To Upper Funnel?

Determining a budget split between upper and lower-funnel is a recurring topic in paid media.

Upper-funnel campaigns (typically awareness and interest) create future demand, while lower-funnel campaigns capture existing demand and are built to drive action.

Knowing where the sweet spot is with budget allocation is a skill, and requires a sound knowledge of incrementality and how to balance immediate efficiency with long-term demand creation.

In this post, I’m going to explore the data, strategies, and channel considerations to help you find an optimal mix.

The Importance Of Upper Funnel Investment

Within paid media, it’s very tempting to pour the majority of budget into the quickest wins that yield the highest returns. It makes sense on many levels, especially when teams are budgeting (and working to) strict forecasts and targets.

However, neglecting upper-funnel spend can hurt your long-term growth, with research showing that cutting brand awareness campaigns to save money or simply avoiding this type of activity can backfire.

For example, a BCG analysis found companies that slashed brand marketing saw significantly worse outcomes, having to regain their lost market share later, requiring $1.85 in spend for every $1 saved from cutting back.

In a roundabout way, suggesting that saving a dollar today on branding can (in some cases) cost nearly two dollars tomorrow.

And it’s not just efficiency; the growth impact of neglecting brand building can be detrimental, too.

In the same study from BCG, bottom-quartile brand spenders had sales growth rates 13% lower than top-quartile brand spenders, indicating brands that underinvest in awareness suffer from lower sales growth in the long term.

They also converted aware consumers to buyers at a lower rate (a 6% weaker conversion from awareness to purchase than top-brand spenders).

Studies like this prove that upper-funnel activity isn’t just a nice-to-have, or a place to use budget left over from lower-funnel spending; it directly influences revenue trajectory, market share, and even shareholder returns.

At this point, you’re probably thinking, “What do you mean by upper-funnel activity?” So let’s have a top-level run-through.

Upper-funnel campaigns plant the seeds by reaching new audiences and generating interest in audiences who may not yet be familiar with your brand.

Think Meta or Pinterest campaigns serving ads to new users as part of broad audiences, interest-based cohorts, or lookalike lists, all excluding your current customer base and/or users who have interacted with your brand.

Think YouTube or GDN campaigns serving ads to in-market, affinity, or custom audiences, again, all while excluding your current customer base.

For this post, we’re focusing specifically on paid search and paid social, with a supporting role from display advertising served through Google and Microsoft.

While programmatic, out-of-home, TV, connected TV, PR, and other channels can all be effective for upper-funnel advertising, they fall outside the scope of this piece.

My aim here is to focus on how to allocate budget toward top-of-funnel activity, specifically through paid search and social platforms.

Balancing Short-Term Performance And Long-Term Brand Building

While the exact percentage will vary by business, a number of frameworks and studies offer guidance on balancing upper vs. lower-funnel spend.

The most well-known being Les Binet and Peter Field’s research into marketing effectiveness, which suggests roughly a 60/40 split.

This translates into 60% of ad budget for brand building (upper-funnel) and 40% to direct activation (lower-funnel) as a rough starting point.

This 60/40 rule isn’t rigid, but it underscores that at least half (if not more) of your spend should typically go toward awareness and brand in order to maximize long-term growth.

Other models follow suit and emphasize a hefty allocation to upper-funnel activities.

For instance, many marketers use a 70-20-10 rule  (adapted from a learning model) to diversify marketing investments: 70% on proven “always-on” channels, 20% on new or emerging channels, and 10% on experimental ideas.

Often, those proven channels include your core lower-funnel performers, while a portion of the 20% and 10% go toward upper-funnel initiatives.

Another approach, specific to paid media funnel stages and widely used in paid social campaign structuring, is a 60-30-10 funnel split: about 60% of budget for prospecting and awareness, 30% for mid- to lower-funnel retargeting, and 10% for closing at the bottom of the funnel.

This model ensures the majority of spend focuses on feeding the funnel with new prospects, while still dedicating budget to nurture them down to conversion.

Is every business other than yours running these exact models? Nope.

Does every business ensure it allocates sufficient media budget for upper funnel? Nope.

A 2024 CMO survey, found that only 31.2% of budget was allocated to long-term brand building vs. 68.8% to short-term performance on average, the opposite of what we’re told from industry leading studies, and this imbalance shows how pressure for quick ROI can overshadow brand investment and from working within paid media for a decade and a half, this is something I see time and time again.

Studies and guidelines are great, but in reality, there really isn’t a one-size-fits-all answer to the exact percentage of budget to allocate for upper-funnel, and it depends on factors like your industry, growth goals, and brand maturity.

For example, a new market entrant or a brand in a highly consideration-driven category (like automotive or B2B tech) may need to invest heavily in awareness and education since customers won’t convert without multiple touches and trust-building.

In contrast, a well-known brand in a transactional ecommerce vertical might get by with a lower percentage on upper-funnel, especially if it already benefits from high awareness.

Evaluate your current situation: If you’re in a crowded consumer goods market (e.g., retail fashion), strong branding and broad reach can differentiate you, whereas in a niche B2B service, thought leadership content and awareness efforts might be what fills the pipeline for your sales team.

The one certainty with this topic is that completely ignoring upper-funnel advertising with paid media is not good.

Even if short-term conversion pressures are high, dedicate a healthy portion of your budget to feeding the funnel.

A useful mindset is to treat awareness spend as an investment in future revenue.

As marketing effectiveness veteran Mark Ritson advocates, you must balance “the long and the short of it,” fund the brand for long-term growth and performance marketing for short-term sales.

Many successful companies treat brand marketing as “always-on” (continuous) rather than a luxury to add when times are good.

In practice, this could mean making sure, say, 20-30% of your paid search and social budget is consistently reaching new cold audiences at any given time, even if attribution for those dollars is not immediately obvious (more on that later).

What Does Upper-Funnel In Paid Search And Paid Social Look Like?

Translating budget allocation into channel strategy requires understanding how each paid media channel fits into the funnel.

Paid media is not one-size-fits-all; channels like paid search, paid social, video, and display each serve distinct roles across the funnel, from awareness to conversion.

Here are a few approaches to upper-funnel budget allocation across key channels:

Paid Search (Google & Microsoft Ads)

Paid search is typically considered a lower or mid-funnel channel; the reason being, this channel is often seen as a place to capture users who are actively searching for a product/service, often indicating intent.

Advertisers frequently split their campaign groupings into brand and non-brand, driving visibility in line with query types across search and shopping networks.

Imagine you run an ecommerce store for sneakers, you may want to serve brand ads to tailor messaging, control, brand protection, incrementality, etc., and for non-brand, you may want to serve ads for queries like “black Nike GT Blazer low” or “Asics Novablast 5,” the sole purpose being to drive direct sales.

There’s arguably an element of upper funnel in non-brand search as advertisers enter auctions for queries that do not contain their brand, and in many cases exclude their website visitor lists, so when a user searches for a query like “black size 10 running shoes” and click through, the advertiser will be getting their brand in-front of new audiences, however, the objective of the campaign is not one of awareness.

Read More: Tips For Running Competitor Campaigns In Paid Search

Display (Google & Microsoft Ads)

While not always front of mind for upper-funnel strategy, the Google Display Network (GDN) is great for reaching new audiences at scale as it spans over 35 million websites and apps, including YouTube, Gmail, and top-tier publisher inventory.

This breadth gives advertisers the ability to serve visually engaging ads across a vast portion of the open web, tapping into contextual, affinity, and in-market audiences.

For upper-funnel campaigns, display is often used to spark interest through static or video creative, product banners, or lifestyle-led visuals that introduce the brand to users in relevant contexts.

With options like responsive display ads, you can dynamically test creative combinations and reach a broad but targeted audience, saving time and money as resources can be freed up that would have been spent on creative development.

When allocating budget, display may not command as much as social or video initially, but it serves a valuable supporting role in prospecting and awareness.

Brands in verticals like consumer goods, travel, or SaaS can use Display as a cost-effective way to expand, reach new audiences, and drive visibility and traffic to site.

Read More: What Are Display Ads: A Complete Guide For Digital Marketers

Paid Social (Meta, Instagram, TikTok, LinkedIn & More)

Paid social is one of the most common types of advertising for upper-funnel marketing.

Platforms like Facebook/Instagram (Meta), TikTok, Pinterest, LinkedIn, and others offer rich targeting options to get your message in front of people who have never heard of you, but who fit the profile of your target customer.

Nearly three-quarters of the U.S. population (73%) were active social media users. For advertisers, this means the audience they want to reach is likely out there scrolling a feed.

For upper-funnel campaigns, social ads shine by allowing you to target based on interests, demographics, behaviors, lookalike audiences, and more, pushing visually engaging content to users who aren’t actively seeking your product yet.

When allocating budget, a significant chunk of your prospecting (new customer) budget will likely go into paid social.

You could use short-form video ads showcasing your brand story or product in use, carousel ads with inspirational lifestyle imagery, or interactive polls that get people interested.

The goal at this stage is not an immediate sale (though it’s great if it happens, and it does), but to introduce your brand, value proposition, or content to a relevant audience as efficiently as possible.

Read More: How Brands Are Measuring Social Media Impact

YouTube And Digital Video

No discussion of upper-funnel paid media budget allocation is complete without YouTube and online video platforms.

YouTube is effectively the new prime-time TV for many demographics, blending reach and targeting with the storytelling power of video.

YouTube ads can achieve massive scale, with 53% of marketers using YouTube to achieve various objectives such as reach, awareness, and conversions.

With YouTube’s advanced targeting (by interests, demographics, in-market intent, topics, etc.), you can home in on relevant audiences for your brand messaging at scale, and drive reams of valuable data.

Recent forecasts bolster advertisers’ confidence in YouTube’s ROI, with 44% of marketers planning to increase their YouTube marketing budget.

The momentum is driven by video’s effectiveness in lifting awareness and brand favorability.

Kantar research, for instance, has shown YouTube ads can substantially boost unaided brand awareness and other brand metrics, underlining the platform’s upper-funnel impact.

For practical budgeting, treat YouTube similarly to how you’d treat television in a media mix, a primary reach vehicle.

The difference is, YouTube allows flexible budgets (you can start small and scale) and measurable results (you can track views, clicks, and even use Brand Lift surveys to measure ad recall and brand interest).

If you’re in a consumer-facing vertical like electronics, fashion, or automotive, you might allocate additional budget to YouTube for big awareness pushes around new product launches or campaigns, too, in addition to always-on brand building.

Even in B2B or niche markets, consider using YouTube for educational top-of-funnel content (e.g., explainer videos, industry thought leadership) targeted to relevant audiences.

Read More: 10 New YouTube Marketing Strategies With Fresh Examples

Measuring Upper-Funnel Impact And Winning Buy-In

One reason many companies double down on lower-funnel spending is that it’s directly measurable; you see clicks and conversions, which please the performance dashboard and finance team.

Upper-funnel efforts often lack that immediate clarity on attribution, making it harder to justify budget to skeptics.

This is why measuring the impact of upper-funnel campaigns is crucial to determining the right budget allocation (and getting organizational buy-in to maintain and/or scale that spend).

Start by defining key performance indicators (KPIs) for upper-funnel campaigns that tie to your objectives.

These will be different from pure conversion metrics. Common upper-funnel KPIs include:

  • Reach and Impressions: How many unique people saw your ads? How many people did you reach?
  • Engagement Metrics: For example, video views (and view-through rates), social shares, comments, likes, or clicks on content. If people are engaging, your message is resonating at least enough to spark interest.
  • Click-Through Rate (CTR): While upper-funnel ads often have lower CTRs than the likes of Search Ads, a healthy CTR indicates the creative and targeting are attracting interest among a cold audience.
  • Brand Search Lift: Track the volume of searches for your brand name and/or direct traffic to your website during and after campaigns. An increase can signal that awareness efforts are causing more people to seek you out.
  • New User Acquisition: Look at the percentage of new visitors or new customers acquired. Upper-funnel campaigns should feed new people into the pipeline.
  • Brand Lift Studies: Use tools like Facebook’s Brand Lift or YouTube Brand Lift surveys, which can directly measure ad recall, brand awareness, and consideration among those exposed vs. a control group.

It’s also important to measure impact on a wider scale, taking a step back and analysing exactly how your upper-funnel spend impacted the business.

For example, you might find that regions where you ran a heavy awareness campaign see higher conversion rates in the subsequent weeks or months.

Techniques like marketing mix modeling or incrementality testing can help connect the dots.

Incrementality is essentially determining how much extra business an upper-funnel campaign drove that would not have happened otherwise.

You can test this by using holdout groups (e.g., show ads to 90% of your target audience but withhold them from 10% as a control, then compare behaviors), or by pausing campaigns and seeing if sales dip.

That means reporting beyond vanity metrics. For instance, instead of just saying, “Our video ad got 100,000 views,” translate that into, say, “Our brand lift study indicates an 8-point increase in awareness in our target market, which correlates with a 20% lift in branded search volume the following month.”

By connecting awareness metrics to leading indicators of sales, you make a case that those dollars are working hard.

And finally, adopt a test-and-learn approach.

If uncertainty is high, start by allocating a modest portion (say +5-10% shift) of your budget to upper-funnel campaigns for a period, then measure results.

If you can show that leads or branded searches grew, or cost per acquisition improved downstream, it will be easier to argue for maintaining or even increasing that allocation.

On the flip side, if an upper-funnel tactic isn’t performing, refine the creative or targeting rather than immediately cutting the budget, optimization is usually the answer, not abandonment, when it comes to new funnel initiatives.

Key Takeaways

Determining how much of your paid media budget to devote to the upper-funnel is a strategic decision that should be informed by both evidence and your unique context.

The data is clear that brand awareness and prospecting deserve a significant share of spend, even though many firms today allocate far less to it than they once did.

The exact figure will depend on your goals, industry, and growth stage, but the guiding principle is to invest enough in upper-funnel marketing to continually feed your future customer pipeline.

Underinvesting in awareness may boost short-term efficiency, but it eventually leads to stagnation and higher costs to reignite growth later.

In practice, this means making room in your plans for campaigns that build brand equity, engage new audiences, and create demand, even if they don’t convert immediately.

Whether it’s a YouTube video campaign reaching millions of potential customers, a series of TikTok ads riding the latest trend to put your brand on the map, or a broad Display campaign educating people about a problem your product solves, these efforts ensure your lower-funnel tactics have a steady stream of interested prospects to convert.

The upper-funnel and lower-funnel are interdependent; success comes from funding both appropriately and making them work in tandem.

So, how much of your budget should go to upper-funnel?

Enough that you’re confident you’re driving robust awareness and demand generation, not just scraping the bottom of the barrel.

For many, that will be a considerably larger portion than they currently allocate.

Aim for a balanced mix grounded in research and test data, adjust to your business needs, and then track the results.

With the right allocation, your paid media can both capture the immediate sales and expose your brand to new audiences, fueling both immediate performance and sustainable growth.

More Resources:


Featured Image: Anton Vierietin/Shutterstock

Head Of WordPress AI Team Explains SEO For AI Agents via @sejournal, @martinibuster

James LePage, Director Engineering AI at Automattic, and the co-lead of the WordPress AI Team, shared his insights into things publishers should be thinking about in terms of SEO. He’s the founder and co-lead of the WordPress Core AI Team, which is tasked with coordinating AI-related projects within WordPress, including how AI agents will interact within the WordPress ecosystem. He shared insights into what’s coming to the web in the context of AI agents and some of the implications for SEO.

AI Agents And Infrastructure

The first observation that he made was that AI agents will use the same web infrastructure as search engines. The main point he makes is that the data that the agents are using comes from the regular classic search indexes.

He writes, somewhat provocatively:

“Agents will use the same infrastructure the web already has.

  • Search to discover relevant entities.
  • “Domain authority” and trust signals to evaluate sources.
  • Links to traverse between entities.
  • Content to understand what each entity offers.

I find it interesting how much money is flowing into AIO and GEO startups when the underlying way agents retrieve information is by using existing search indexes. ChatGPT uses Bing. Anthropic uses Brave. Google uses Google. The mechanics of the web don’t change. What changes is who’s doing the traversing.”

AI SEO = Longtail Optimization

LePage also said that schema structured data, semantic density, and interlinking between pages is essential for optimizing for AI agents. Notable is that he said that AI optimization that AIO and GEO companies are doing is just basic longtail query optimization.

He explained:

“AI intermediaries doing synthesis need structured, accessible content. Clear schemas, semantic density, good interlinking. This is the challenge most publishers are grappling with now. In fact there’s a bit of FUD in this industry. Billions of dollars flowing into AIO and GEO when much of what AI optimization really is is simply long-tail keyword search optimization.”

What Optimized Content Looks Like For AI Agents

LePage, who is involved in AI within the WordPress ecosystem, said that content should be organized in an “intentional” manner for agent consumption, by which he means structured markdown, semantic markup, and content that’s easy to understand.

A little further he explains what he believes content should look like for AI agent consumption:

“Presentations of content that prioritize what matters most. Rankings that signal which information is authoritative versus supplementary. Representations that progressively disclose detail, giving agents the summary first with clear paths to depth. All of this still static, not conversational, not dynamic, but shaped with agent traversal in mind.

Think of it as the difference between a pile of documents and a well-organized briefing. Both contain the same information. One is far more useful to someone trying to quickly understand what you offer.”

A little later in the article he offers a seemingly contradictory prediction of the role of content in an agentic AI future, reversing today’s formula of a well organized briefing over a pile of documents, saying that agentic AI will not need a website, just the content, a pile of documents.

Nevertheless, he recommends that content have structure so that the information is well organized at the page level with clear hierarchical structure and at the site level as well where interlinking makes the relationships between documents clearer. He emphasizes that the content must communicate what it’s for.

He then adds that in the future websites will have AI agents that communicate with external AI agents, which gets into the paradigm he mentioned of content being split off from the website so that the data can be displayed in ways that make sense for a user, completely separated from today’s concept of visiting a website.

He writes:

“Think of this as a progression. What exists now is essentially Perplexity-style web search with more steps: gather content, generate synthesis, present to user. The user still makes decisions and takes actions. Near-term, users delegate specific tasks with explicit specifications, and agents can take actions like purchases or bookings within bounded authority. Further out, agents operate more autonomously based on standing guidelines, becoming something closer to economic actors in their own right.

The progression is toward more autonomy, but that doesn’t mean humans disappear from the loop. It means the loop gets wider. Instead of approving every action, users set guidelines and review outcomes.

…Before full site delegates exist, there’s a middle ground that matters right now.

The content an agent has access to can be presented in a way that makes sense for how agents work today. Currently, that means structured markdown, clean semantic markup, content that’s easy to parse and understand. But even within static content, there’s room to be intentional about how information is organized for agent consumption.”

His article, titled Agents & The New Internet (3/5), provides useful ideas of how to prepare for the agentic AI future.

Featured Image by Shutterstock/Blessed Stock

Google’s Mueller: Free Subdomain Hosting Makes SEO Harder via @sejournal, @MattGSouthern

Google’s John Mueller warns that free subdomain hosting services create unnecessary SEO challenges, even for sites doing everything else right.

The advice came in response to a Reddit post from a publisher whose site shows up in Google but doesn’t appear in normal search results, despite using Digitalplat Domains, a free subdomain service on the Public Suffix List.

What’s Happening

Mueller told the site owner that they likely aren’t making technical mistakes. The problem is the environment they chose to publish in.

He wrote:

“A free subdomain hosting service attracts a lot of spam & low-effort content. It’s a lot of work to maintain a high quality bar for a website, which is hard to qualify if nobody’s getting paid to do that.”

The issue comes down to association. Sites on free hosting platforms share infrastructure with whatever else gets published there. Search engines struggle to differentiate quality content from the noise surrounding it.

Mueller added:

“For you, this means you’re basically opening up shop on a site that’s filled with – potentially – problematic ‘flatmates’. This makes it harder for search engines & co to understand the overall value of the site – is it just like the others, or does it stand out in a positive way?”

He also cautioned against cheap TLDs for similar reasons. The same dynamics apply when entire domain extensions become overrun with low-quality content.

Beyond domain choice, Mueller pointed to content competition as a factor. The site in question publishes on a topic already covered extensively by established publishers with years of work behind them.

“You’re publishing content on a topic that’s already been extremely well covered. There are sooo many sites out there which offer similar things. Why should search engines show yours?”

Why This Matters

Mueller’s advice here fits a pattern I’ve covered repeatedly over the years. Previously, Google’s Gary Illyes warned against cheap TLDs for the same reason. Illyes put it bluntly at the time, telling publishers that when a TLD is overrun by spam, search engines might not want to pick up sitemaps from those domains.

The free subdomain situation creates a unique problem. While the Public Suffix List theoretically tells Google to treat these subdomains as separate sites, the neighborhood signal remains strong. If the vast majority of subdomains on that host are spam, Google’s systems may struggle to identify your site as the one diamond in the rough.

This matters for anyone considering free hosting as a way to test an idea before investing in a real domain. The test environment itself becomes the test. Search engines evaluate your site in the context of everything else published under that same domain.

The competitive angle also deserves attention. New sites on well-covered topics face a high bar regardless of domain choice. Mueller’s point about established publishers having years of work behind them is a reality check about where the effort needs to go.

Looking Ahead

Mueller suggested that search visibility shouldn’t be the first priority for new publishers.

“If you love making pages with content like this, and if you’re sure that it hits what other people are looking for, then I’d let others know about your site, and build up a community around it directly. Being visible in popular search results is not the first step to becoming a useful & popular web presence, and of course not all sites need to be popular.”

For publishers starting out, focus on building direct traffic through promotion and community engagement. Search visibility tends to follow after a site establishes itself through other channels.


Featured Image: Jozef Micic/Shutterstock

Amazon Rules Product Discovery, for Now

The Amazon marketplace is the world’s most popular product search engine. Yet its dominance faces emerging challenges from AI and social commerce.

For more than 20 years, Amazon has made it easy for shoppers to discover products, compare options, read reviews, and buy.

A 2024 Jungle Scout survey (PDF) of 1,000 U.S. online shoppers found that 56% initiated product searches on the Amazon marketplace, compared to 42% on traditional search engines (such as Google), and 29% on Walmart.com.

Why Amazon?

Amazon’s Prime membership was a stroke of ecommerce genius. The service changes the way some consumers think about prices and shipping.

Products on Amazon’s marketplace are often more expensive than competitors’, and Prime costs $139 per year. But to many shoppers, there’s little reason to look elsewhere when shipping is free, fast, and reliable.

Selection

Moreover, Amazon’s product selection is massive and all-inclusive. Amazon itself sells more than 12 million products. Third-party sellers add upwards of 600 million, according to published reports. A shopper looking for an item will likely find it on Amazon.

Trust

Consumers trust Amazon. They assume products will arrive on time, with returns and refunds issued without hassle.

This trust is worth a lot. A 2025 Salsify report (PDF) found that 87% of shoppers have paid more for a product because they trust the brand. Those same consumers would likely search for products on a trusted marketplace.

Reviews

The volume of reviews on Amazon attracts shoppers.

Reviews serve as decision insurance. They reduce uncertainty and shorten the research cycle, especially for products where use cases matter. Instead of reading a handful of articles, comparing retailer sites, and searching Reddit threads, shoppers can pull social proof from thousands of real buyers without leaving Amazon.

That convenience changes behavior. The marketplace becomes a place for decision-making, not just to buy. So why not start a product search where other shoppers can guide you?

Mobile app

Amazon’s mobile app provides an advantage.

Searching for products in a mobile web browser is frustrating, even in 2026. Pages load slowly. Pop-ups appear. Cookie prompts get in the way. Shoppers must pinch and zoom, navigate cluttered menus, and jump between tabs.

Amazon’s app eliminates much of that friction for mobile consumers. The search box is always one tap away, filters are quick to apply, product pages are consistent, and the comparison process happens naturally through scrolling rather than clicking across multiple sites.

It’s a good experience, and shoppers use it.

Search iteration

“Search iteration” is the refinement of a query.

Consumers in the product discovery mode typically have specific needs. Amazon search can route shoppers toward products they are likely to buy.

Brand and mindshare

Amazon is ubiquitous beyond products. Prime Video, Audible, Kindle, Fire TV, Echo devices, and Amazon’s creator and influencer content indirectly contribute to search dominance and habit.

Boston Consulting Group, for example, asserts that such “mindshare” is highly correlated with purchase consideration.

Put another way, the folks who watch Prime Video are likely to search for products on Amazon.

AI and Social

Taken together, these factors serve as a playbook for the leading product search engine and offer both lessons and dilemmas for merchants. A shop can, for example, decide to include products on Amazon solely for discovery benefits.

Another consideration is whether Amazon maintains its lead in product search.

Some 56% of respondents on the Jungle Scout 2024 survey began product searches on Amazon. But that percentage is down from the 61% reported by Jungle Scout in 2022 (PDF).

Something is chipping away at product search and discovery. In 2026, that “something” is likely AI and social.

AI commerce is likely to shift where the first query occurs, thus eroding Amazon’s product-search dominance.

As shoppers ask for “the best” product option, generative AI platforms will increasingly assemble shortlists from multiple sources, reducing the need to start with Amazon. AI will pull discovery and comparison out of the marketplace interface, although Amazon can still win the transaction.

Social commerce on TikTok, Instagram, and YouTube will increasingly resemble search engines for lifestyle-driven categories. Shoppers, especially younger ones, often arrive at Amazon with a product already selected.

In those cases, Amazon becomes the fulfillment destination rather than the discovery engine, which changes the economics of product search and advertising on the platform.

Google On Phantom Noindex Errors In Search Console via @sejournal, @martinibuster

Google’s John Mueller recently answered a question about phantom noindex errors reported in Google Search Console. Mueller asserted that these reports may be real.

Noindex In Google Search Console

A noindex robots directive is one of the few commands that Google must obey, one of the few ways that a site owner can exercise control over Googlebot, Google’s indexer.

And yet it’s not totally uncommon for search console to report being unable to index a page because of a noindex directive that seemingly does not have a noindex directive on it, at least none that is visible in the HTML code.

When Google Search Console (GSC) reports “Submitted URL marked ‘noindex’,” it is reporting a seemingly contradictory situation:

  • The site asked Google to index the page via an entry in a Sitemap.
  • The page sent Google a signal not to index it (via a noindex directive).

It’s a confusing message from Search Console that a page is preventing Google from indexing it when that’s not something the publisher or SEO can observe is happening at the code level.

The person asking the question posted on Bluesky:

“For the past 4 months, the website has been experiencing a noindex error (in ‘robots’ meta tag) that refuses to disappear from Search Console. There is no noindex anywhere on the website nor robots.txt. We’ve already looked into this… What could be causing this error?”

Noindex Shows Only For Google

Google’s John Mueller answered the question, sharing that there were always a noindex showing to Google on the pages he’s examined where this kind of thing was happening.

Mueller responded:

“The cases I’ve seen in the past were where there was actually a noindex, just sometimes only shown to Google (which can still be very hard to debug). That said, feel free to DM me some example URLs.”

While Mueller didn’t elaborate on what can be going on, there are ways to troubleshoot this issue to find out what’s going on.

How To Troubleshoot Phantom Noindex Errors

It’s possible that there is a code somewhere that is causing a noindex to show just for Google. For example, it may have happened that a page at one time had a noindex on it and a server-side cache (like a caching plugin) or a CDN (like Cloudflare) has cached the HTTP headers from that time, which in turn would cause the old noindex header to be shown to Googlebot (because it frequently visits the site) while serving a fresh version to the site owner.

Checking the HTTP Header is easy, there are many HTTP header checkers like this one at KeyCDN or this one at SecurityHeaders.com.

A 520 server header response code is one that’s sent by Cloudflare when it’s blocking a user agent.

Screenshot: 520 Cloudflare Response Code

Screenshot showing a 520 error response code

Below is a screenshot of a 200 server response code generated by cloudflare:

Screenshot: 200 Server Response Code

I checked the same URL using two different header checkers, with one header checker returning a a 520 (blocked) server response code and the other header checker sending a 200 (OK) response code. That shows how differently Cloudflare can respond to something like a header checker. Ideally, try checking with several header checkers to see if there’s a consistent 520 response from Cloudflare.

In the situation where a web page is showing something exclusively to Google that is otherwise not visible to someone looking at the code, what you need to do is to get Google to look at the page for you using an actual Google crawler and from a Google IP address. The way to do this is by dropping the URL into Google’s Rich Results Test. Google will dispatch a crawler from a Google IP address and if there’s something on the server (or a CDN) that’s showing a noindex, this will catch it. In addition to the structured data, the Rich Results test will also provide the HTTP response and a snapshot of the web page showing exactly what the server shows to Google.

When you run a URL through the Google Rich Results Test, the request:

  • Originates from Google’s Data Centers: The bot uses an actual Google IP address.
  • Passes Reverse DNS Checks: If the server, security plugin, or CDN checks the IP, it will resolve back to googlebot.com or google.com.

If the page is blocked by noindex, the tool will be unable to provide any structured data results. It should provide a status saying “Page not eligible” or “Crawl failed”. If you see that, click a link for “View Details” or expand the error section. It should show something like “Robots meta tag: noindex” or ‘noindex’ detected in ‘robots’ meta tag”.

This approach does not send the GoogleBot user agent, it uses the Google-InspectionTool/1.0 user agent string. That means if the server block is by IP address then this method will catch it.

Another angle to check is for the situation where a rogue noindex tag is specifically written to block GoogleBot, you can still spoof (mimic) the GoogleBot user agent string with Google’s own User Agent Switcher extension for Chrome or configure an app like Screaming Frog set to identify itself with the GoogleBot user agent and that should catch it.

Screenshot: Chrome User Agent Switcher

Phantom Noindex Errors In Search Console

These kinds of errors can feel like a pain to diagnose but before you throw your hands up in the air take some time to see if any of the steps outlined here will help identify the hidden reason that’s responsible for this issue.

Featured Image by Shutterstock/AYO Production