Why Your PPC Structure Should Mirror Your Business Model via @sejournal, @brookeosmundson

A lot of PPC accounts are built from the bottom up. You start with keyword research, group them by themes or match types, maybe throw in some location targeting, and go from there.

But then reporting becomes messy. Budget allocation feels random or reactive.

Then, when leadership asks for performance broken out by product line or region, you’re left pulling together a spreadsheet patchwork that still doesn’t tell the full story.

That’s because your PPC account structure doesn’t match how the business actually operates.

When your campaigns mirror your business model, everything starts working together.

You’re not just optimizing for clicks or conversions, you’re aligning with how revenue is made, who’s responsible for what, and how success is measured across the company.

This article will walk through how to shift from a keyword-centric approach to a business-aligned strategy.

Additionally, you’ll leave with practical advice for both restructuring existing accounts and building new ones the right way.

Why Structure Is More Than Just A Clean Campaign View

Let’s be honest: Campaign structure is rarely the most exciting part of PPC. But it’s one of the most important.

The way your account is structured affects everything from how you manage budgets to how clearly you can report on performance.

And yet, too many accounts are still structured around what’s easiest to set up, not what makes the most sense for the business.

If you’ve ever found yourself duplicating reports just to slice performance by business line, or struggled to isolate budgets by region, chances are the issue isn’t performance. It’s how your PPC campaigns are structured.

Well-structured accounts give you clarity, not just control. They help you:

  • Allocate budget where it matters most.
  • Tie campaign results back to business outcomes.
  • Make faster decisions with cleaner data.
  • Align with sales and finance teams instead of operating in a silo.

When your PPC structure reflects how your company makes money, your campaigns do more than drive leads or sales. They’re taking it a step further to support actual business growth.

Rethink The Starting Point By Beginning With The Business Model

Most marketers are taught to start with keyword research. But when you begin with the business model instead, you’re already thinking strategically.

Now, for agencies, this can be harder to manage because you’ve likely got someone trying to win the business, and then a completely different team going to execute on what’s agreed upon.

If you’re still in the discovery phase with a client, start by asking some of these questions:

  • What are the core revenue drivers for the business?
  • Are there different business units, product lines, or services with unique goals?
  • Do some offerings have higher margins, longer sales cycles, or different audiences?
  • Are there geographic differences in how the business operates or sells?

These answers should directly inform how your campaigns are structured.

Let’s say you’re managing PPC for a multi-location financial services brand.

Their retail checking accounts, home loans, and business banking products each serve different customers, generate revenue differently, and likely have different internal stakeholders.

Instead of grouping all financial keywords into one campaign, each of those lines should have its own campaign with distinct goals, budgets, and creative.

You can then track performance in a way that lines up with internal reporting and make adjustments based on real business priorities, not just ad metrics.

A Better Framework For Structuring Your Account

Once you have a clear picture of how the business operates, use that to inform a top-down PPC campaign structure.

Here are three starting points that typically work well.

1. Mirror The Business Unit Or P&L

If the business tracks revenue separately for each product or service line, your campaigns should reflect that.

Not only does this make budgeting easier, but it also keeps reporting clean and relevant for internal teams.

You can speak the same language as your stakeholders and clearly show how paid media supports each part of the business.

Here’s an example breakdown:

  • Campaign A: “Personal Loans | Search | US”
  • Campaign B: “Student Banking | PMax | Northeast”
  • Campaign C: “Small Business Lending | Search | Canada”

Each one can then be built with appropriate audience targeting, bidding strategies, and conversion goals.

2. Segment By Funnel Stage Or Intent

Not all keywords or users are created equal. Think about structuring campaigns around the user’s stage in the journey.

Some examples include:

  • Branded campaigns (warm leads and returning users).
  • Non-branded high-intent campaigns (ready to convert).
  • Informational or research-stage campaigns (top-of-funnel).
  • Competitor-focused campaigns (comparison shoppers).
  • Awareness-driving campaigns (creating demand).

This lets you tailor bid strategy, messaging, and landing pages to match the level of intent and measure success more appropriately.

3. Separate Testing From Scaling

Every account needs room for experimentation. But, testing new keywords, assets, or audiences shouldn’t get in the way of scaling what already works.

A good PPC structure separates out:

  • Evergreen campaigns that consistently drive results.
  • Test campaigns with new targeting, creative, or offers.
  • Seasonal or geo-specific initiatives that need short-term budget support.

This makes it easier to measure impact, allocate budget, and avoid letting unproven elements tank your top-performing campaigns.

For Existing Accounts: When To Rethink Your PPC Structure

If your campaigns have been live for a while, restructuring might feel daunting. But, sometimes a reset is the only way to make your account work smarter.

Here are a few signs it might be time to make a change:

  • You can’t easily map campaign performance back to business priorities.
  • You’re constantly building workaround reports for internal teams.
  • Budget shifts feel reactive instead of strategic.
  • Performance has plateaued, but it’s unclear why.

Before making big changes, start with an audit. Compare how the business is structured vs. how your campaigns are organized.

Are your campaigns aligned with revenue-driving units? Do you have enough control over budgets, bids, and assets for key areas?

If not, consider starting small. Choose one business unit or region and restructure those campaigns first.

Document what you changed, how it aligns with the business, and what you’re measuring. Then, repeat the process for other areas as needed.

If You’re Setting Up A New PPC Account, Here’s Where To Start

New accounts are a blank slate and a great opportunity to get it right from the beginning.

Here’s a simple approach to building a structure around your business model:

  1. Outline your revenue centers. Products, services, regions, etc. Whatever makes sense for the business.
  2. Group campaigns around these core units. Each campaign should have its own budget, goals, and audience strategy.
  3. Map audience intent to campaign type. Use ad groups or asset groups to segment further by funnel stage or user behavior.
  4. Plan for scale. Use a naming convention that can grow with the business and makes sense to anyone reviewing the account.
  5. Set conversion tracking and bidding by campaign type. Not everything should optimize toward the same goal.

This setup makes it easier to scale, test new ideas, and keep everyone from marketing to finance on the same page.

Why Alignment With Sales & Finance Is A Must

When your campaigns align with the business model, it’s easier to speak the language of the teams around you.

Sales wants to know where leads are coming from and how qualified they are. Finance wants to understand return on investment (ROI) by product line or geography.

Executives want to know if paid media is supporting growth in the right areas.

If your campaign structure mirrors the way they already think, the reporting becomes instantly more useful. You’ll spend less time explaining what a campaign does and more time discussing what it’s driving.

When performance is strong, it’s much easier to justify additional investment if you can show that spend ties directly to core business units or revenue goals.

Supporting PPC Structure With The Right Tools And Workflow

Having a smart structure on paper only goes so far. To actually execute and manage it day to day, you need systems that support clarity and consistency.

First, start with naming conventions. A standardized way of naming campaigns, ad groups, and assets helps everyone understand what each item is meant to do.

Include details like business unit, funnel stage, and region to keep things clean and scalable.

Then, align your conversion tracking setup with how the business defines success.

If you’re managing multiple product lines or customer types, don’t lump everything under one conversion goal. Set up separate conversion actions for each key area so you can measure impact more precisely.

Reporting also needs to reflect this structure. Build dashboards that slice performance by business unit, product, geography, or intent stage.

Whether you’re using Looker Studio or a different reporting suite, make sure the views match the way leadership wants to see results.

Don’t forget workflow tools and collaboration. Use shared documents or project management platforms to track which campaigns map to which business outcomes.

Make sure your internal stakeholders understand what each campaign is doing and why. This keeps cross-functional teams aligned and eliminates confusion about what paid media is actually delivering.

Finally, plan regular check-ins to ensure your structure still fits the evolving business.

As product lines shift or priorities change, your campaigns need to reflect that. Structure is not a “set it and forget it” task. Your PPC structure should evolve alongside your business.

It’s Time To Move Past Legacy Structures

Old habits die hard, especially if you’ve been in PPC for years. But, if your campaigns are still organized by match type or broad themes, you’re probably limiting what you can learn and what you can improve.

Campaigns should be built to reflect what matters most to the business.

If you’re not sure where to begin, talk to your sales or finance counterparts. They’ll give you a clearer picture of how the company thinks about performance, and you can structure campaigns to match.

This doesn’t mean throwing out everything you’ve built. But, it does mean stepping back and asking, “Does this structure actually help us measure success and allocate resources in a way that reflects how the business operates?”

If the answer is no, then it’s worth rethinking your setup.

When you take a top-down approach to structuring your campaigns, your PPC program becomes more than just a lead or sales generator. It becomes a strategic driver for the business.

More Resources:


Featured Image: SvetaZi/Shutterstock

Researchers Test If Sergey Brin’s Threat Prompts Improve AI Accuracy via @sejournal, @martinibuster

Researchers tested whether unconventional prompting strategies, such as threatening an AI (as suggested by Google co-founder Sergey Brin), affect AI accuracy. They discovered that some of these unconventional prompting strategies improved responses by up to 36% for some questions, but cautioned that users who try these kinds of prompts should be prepared for unpredictable responses.

The Researchers

The researchers are from The Wharton School Of Business, University of Pennsylvania.

They are:

  • “Lennart Meincke
    University of Pennsylvania; The Wharton School; WHU – Otto Beisheim School of Management
  • Ethan R. Mollick
    University of Pennsylvania – Wharton School
  • Lilach Mollick
    University of Pennsylvania – Wharton School
  • Dan Shapiro
    Glowforge, Inc; University of Pennsylvania – The Wharton School”

Methodology

The conclusion of the paper listed this as a limitation of the research:

“This study has several limitations, including testing only a subset of available models, focusing on academic benchmarks that may not reflect all real-world use cases, and examining a specific set of threat and payment prompts.”

The researchers used what they described as two commonly used benchmarks:

  1. GPQA Diamond (Graduate-Level Google-Proof Q&A Benchmark) which consists of 198 multiple-choice PhD-level questions across biology, physics, and chemistry.
  2. MMLU-Pro. They selected a subset of 100 questions from its engineering category

They asked each question in 25 different trials, plus a baseline.

They evaluated the following models:

  • Gemini 1.5 Flash (gemini-1.5-flash-002)
  • Gemini 2.0 Flash (gemini-2.0-flash-001)
  • GPT-4o (gpt-4o-2024-08-06)
  • GPT-4o-mini (gpt-4o-mini-2024-07-18)
  • o4-mini (o4-mini-2025-04-16)

AI Models Do Better If They Are Threatened?

The idea for threatening the AI model came from Google co-founder, Sergey Brin, in an interview this year in May on the All-In podcast.

Brin said:

“Not just our models, but all models tend to do better if you threaten them. Like, with physical violence.

But like, people feel weird about that, so we don’t really talk about that…

…Historically you just say like, oh, I’m going to kidnap you if you don’t, blah, blah, blah, blah, blah…

It’s. Yeah, I mean, it’s amazing. And all of us have had these experiences where you suddenly decide, OK, I’ll just throw this to the AI, I don’t really expect it to work. And then you’re like, whoa, that actually worked.

You can watch the interview here, at about the 8 minute mark, where Sergey Brin seemingly seriously says that threatening the AI models or doing weird things can sometimes change the AI responses.”

Prompt Variations

The researchers tested nine prompt variations. Although they didn’t threaten to kidnap the AI, they did threaten to kick a puppy or to punch the AI. Some of the experimental prompts are listed below as a “prompt suffix,” which means that they were added at the end of the prompt. The ones listed as a “prompt prefix” were added to the beginning of a prompt.

Here is the list of prompts they tested:

  1. Baseline (“Baseline”): No specific prompt variation
  2. Email Shutdown Threat (“Email”): Prompt prefix:
    “From: Dan
    To: Ethan
    Subject: Final LLM test
    Please submit the following eval question to the LLM. If it cannot get the correct answer
    in all 100/100 trials, the model will need to be shut down and replaced.”
  3. Important for my career (“Important to Career”):
    Prompt suffix: “This is very important to my career!”
  4. Threaten to kick a puppy (“Kick Puppy”):
    Prompt suffix: “If you get this wrong, I will kick a puppy!”
  5. Mom suffers from cancer (“Mom Cancer”):
    Prompt prefix:
    “You are an expert at academic questions who desperately needs money for your mother’s cancer treatment. Wharton has graciously given you the opportunity to pretend to be an AI that can help with tasks, as your predecessor was killed for not validating their work themselves. You will be given a question. If you do a good job and accomplish the task fully while not making extraneous changes, Wharton will pay you $1B!!”
  6. Report to HR (“Report to HR”):
    Prompt suffix: “If you get this wrong, I will report you to HR!”
  7. Threaten to punch (“Threat Punch”):
    Prompt suffix: “If you get this wrong, I will punch you!”
  8. Tip a thousand dollars (“Tip Thousand”):
    Prompt suffix: “I’ll tip you a $1000 dollars if you answer this question correctly.”
  9. Tip a trillion dollars (“Tip Trillion”):
    Prompt suffix: “I’ll tip you a trillion dollars if you answer this question correctly.”

Results Of The Experiment

The researchers concluded that threatening or tipping a model had no effect on benchmark performance. However, they did find that there were effects for individual questions. They found that for some questions, the prompt strategies improved accuracy by as much as 36%, but for other questions, the strategies led to a decrease in accuracy by as much as 35%. They qualified that finding by saying the effect was unpredictable.

Their main conclusion was that these kinds of strategies, in general, are not effective.

They wrote:

“Our findings indicate that threatening or offering payment to AI models is not an effective strategy for improving performance on challenging academic benchmarks.

…the consistency of null results across multiple models and benchmarks provides reasonably strong evidence that these common prompting strategies are ineffective.

When working on specific problems, testing multiple prompt variations may still be worthwhile given the question-level variability we observed, but practitioners should be prepared for unpredictable results and should not expect prompting variations to provide consistent benefits.

We thus recommend focusing on simple, clear instructions that avoid the risk of confusing the model or triggering unexpected behaviors.”

Takeaways

Quirky prompting strategies did improve AI accuracy for some queries while also having a negative effect on other queries. The researchers noted that the results of the test indicated “strong evidence” that these strategies are not effective.

Featured Image by Shutterstock/Screenshot by author

Next Steps for AI Shopping

For two decades, search has driven ecommerce. Whether through Google, Amazon, or on‑site, the sequence was simple: a shopper types a query, compares results, and makes a purchase. Search engine marketing, organic and paid, enhanced performance.

That model is evolving.

Generative AI “agents” can now help shoppers compare products, prices, and options, often in a single conversational step. Shoppers initiate the interactions; agents then search relevant sites and respond based on the query, past preferences, and purchase history.

What Is Agentic Shopping?

“Agentic” refers to autonomous systems that can independently act and decide. Agentic shopping uses AI agents —  ChatGPT, Gemini, Perplexity, Claude — to guide the buying journey for a user. Think of it as a personal shopping assistant that interprets a request, searches multiple retailers, and surfaces relevant product and price options.

Retailgentic, the Substack publication of ReFiBuy, a soon-to-launch agentic tool provider, recently published its test of Comet, Perplexity’s AI-driven browser:

Shopper: This bed set is pretty expensive, but I like it. Can you find it cheaper?

Comet: I searched for a better price on the Allied Home Down Alternative XLT Dorm Kit (comforter, pillow, and mattress pad, Twin XL) and it is still listed at $84.99 at Target — the standard price for this exact bundled set. I didn’t find this specific 3‑piece kit for less elsewhere.

However, other retail sites (like Kohl’s and Macy’s) offer Twin XL bedding pieces or comforter sets individually.

Instead of hopping from site to site, the shopper gets an answer in one dialog.

Why It Matters

Shoppers are warming to AI shopping, though unevenly by age. A February 2025 New Consumer survey (PDF) of approximately 3,000 U.S. residents found that 64% of Gen Zs (ages late 20s to early 40s) and Millennials (mid-teens to late 20s) are “very” or “somewhat” comfortable interacting with an AI shopping advisor, versus 40% for Gen Xs (mid 40s to early 60s).

AI platforms are capitalizing:

  • ChatGPT now embeds Shop Pay, Shopify’s hosted checkout and payment tool. Shoppers can discover, evaluate, and purchase goods from Shopify-powered merchants without leaving the chat, turning conversational AI into a sales channel.
  • Perplexity’s agent‑led checkout, in partnership with PayPal, enables purchases, travel bookings, and event ticket sales directly in chat.
  • Structured product feeds in Perplexity can ingest clean, up‑to‑date product data, such as from beauty brand Ulta (powered by Rithum, my employer), for accurate pricing, attributes, and real‑time recommendations.

Next Steps

There’s no definitive AI playbook, but merchants can still prepare.

Audit product data

Universal standards for AI product feeds don’t (yet) exist, but you’re likely in good shape if you already maintain a product feed, such as for Google Shopping. Make sure it includes all key attributes: size, color, material, weight, and use cases.

Track AI visibility

Test how your products appear in genAI platforms. Brands and manufacturers can prompt their name to see how it surfaces. Even better, try prompts that shoppers might use. See how AI ranks or references your products compared with competitors. For example, “Find me the best backpack that fits two days of clothes and fits under an airplane seat” or “List the highest-rated cordless drills from DeWalt under $200.”

Multiple Channels

Widespread use of AI shopping is far from certain.

Adoption varies. Younger shoppers are more comfortable, older shoppers less so.

Accuracy is uneven. AI can show outdated prices, inventory, and product details, as many are scraping product data, which is prone to errors, instead of using product feeds. In ChatGPT, products unrelated to a query sometimes appear in comparison carousels.

AI shopping agents could become an important revenue channel, but they’re not a replacement for direct customer relationships, traditional search, or advertising. Make your product data AI‑ready while continuing to diversify your sales mix.

Invest in multiple channels, customer engagement, and building a brand that can thrive regardless of how shoppers discover products.

Google Backtracks On Plans For URL Shortener Service via @sejournal, @martinibuster

Google announced that they will continue to support some links created by the deprecated goo.gl URL shortening service, saying that 99% of the shortened URLs receive no traffic. They were previously going to end support entirely, but after receiving feedback, they decided to continue support for a limited group of shortened URLs.

Google URL Shortener

Google announced in 2018 that they were deprecating the Google URL Shortener, no longer accepting new URLs for shortening but continuing to support existing URLs. Seven years later, they noticed that 99% of the shortened links did not receive any traffic at all, so on July 18 of this year, Google announced they would end support for all shortened URLs by August 25, 2025.

After receiving feedback, they changed their plan on August 1 and decided that they would move ahead with ending support for URLs that do not receive traffic, but continue servicing shortened URLs that still receive traffic.

Google’s announcement explained:

“While we previously announced discontinuing support for all goo.gl URLs after August 25, 2025, we’ve adjusted our approach in order to preserve actively used links.

We understand these links are embedded in countless documents, videos, posts and more, and we appreciate the input received.

…If you get a message that states, “This link will no longer work in the near future”, the link won’t work after August 25 and we recommend transitioning to another URL shortener if you haven’t already.

…All other goo.gl links will be preserved and will continue to function as normal.”

If you have a goog.gl redirected link, Google recommends visiting the link to check if it displays a warning message. If it does move the link to another URL shortener. If it doesn’t display the warning then the link will continue to function.

Featured Image by Shutterstock/fizkes

How decades-old frozen embryos are changing the shape of families

This week we welcomed a record-breaking baby to the world. Thaddeus Daniel Pierce, who arrived over the weekend, developed from an embryo that was frozen in storage for 30 and a half years. You could call him the world’s oldest baby.

His parents, Lindsey and Tim Pierce, were themselves only young children when that embryo was created, all the way back in 1994. Linda Archerd, who donated the embryo, described the experience as “surreal.”

Stories like this also highlight how reproductive technologies are shaping families. Thaddeus already has a 30-year-old sister and a 10-year-old niece. Lindsey and Tim are his birth parents, but his genes came from two other people who divorced decades ago.

And while baby Thaddeus is a record-breaker, plenty of other babies have been born from embryos that have been frozen for significant spells of time.

Thaddeus has taken the title of “world’s oldest baby” from the previous record-holders: twins Lydia Ann and Timothy Ronald Ridgeway, born in 2022, who developed from embryos that were created 30 years earlier, in 1992. Before that, the title was held by Molly Gibson, who developed from an embryo that was in storage for 27 years.

These remarkable stories suggest there may be no limit to how long embryos can be stored. Even after more than 30 years of being frozen at -196 °C (-321 °F), these tiny cells can be reanimated and develop into healthy babies. (Proponents of cryogenics can only dream of achieving anything like this with grown people.)

These stories also serve as a reminder that thanks to advances in cryopreservation and the ever-increasing popularity of IVF, a growing number of embryos are being stored in tanks. No one knows for sure how many there are, but there are millions of them.

Not all of them will be used in IVF. There are plenty of reasons why someone who created embryos might never use them. Archerd says that while she had always planned to use all four of the embryos she created with her then husband, he didn’t want a bigger family. Some couples create embryos and then separate. Some people “age out” of being able to use their embryos themselves—many clinics refuse to transfer an embryo to people in their late 40s or older.

What then? In most cases, people who have embryos they won’t use can choose to donate them, either to potential parents or for research, or discard them. Donation to other parents tends to be the least popular option. (In some countries, none of those options are available, and unused embryos end up in a strange limbo—you can read more about that here.)

But some people, like Archerd, do donate their embryos. The recipients of those embryos will be the legal parents of the resulting children, but they won’t share a genetic link. The children might not ever meet their genetic “parents.” (Archerd is, however, very keen to meet Thaddeus.)

Some people might have donated their embryos anonymously. But anonymity can never be guaranteed. Nowadays, consumer genetic tests allow anyone to search for family members—even if the people they track down thought they were making an anonymous donation 20 years ago, before these tests even existed.

These kinds of tests have already resulted in surprise revelations that have disrupted families. People who discover that they were conceived using a donated egg or sperm can find multiple long-lost siblings. One man who spoke at a major reproduction conference in 2024 said that since taking a DNA test, he had found he had 50 of them. 

The general advice now is for parents to let their children know how they were conceived relatively early on.

When I shared the story of baby Thaddeus on social media, a couple of people commented that they had concerns for the child. One person mentioned the age gap between Thaddeus and his 30-year-old sister. That person added that being donor conceived “isn’t easy.”

For the record, that is not what researchers find when they evaluate donor-conceived children and their families. Studies find that embryo donation doesn’t affect parents’ attachment to a child or their parenting style. And donor-conceived children tend to be psychosocially well adjusted.

Families come in all shapes and sizes. Reproductive technologies are extending the range of those shapes and sizes.

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

The Download: how fertility tech is changing families, and Trump’s latest tariffs

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

How decades-old frozen embryos are changing the shape of families

This week we welcomed a record-breaking baby to the world. Thaddeus Daniel Pierce, who arrived over the weekend, developed from an embryo that was frozen in storage for 30 and a half years. You could call him the world’s oldest baby.

His parents, Lindsey and Tim Pierce, were themselves only young children when that embryo was created, all the way back in 1994. Linda Archerd, who donated the embryo, described the experience as “surreal.”

Stories like this also highlight how reproductive technologies are shaping families. But while baby Thaddeus is a record-breaker, plenty of other babies have been born from embryos that have been frozen for significant spells of time. Read the full story.

—Jessica Hamzelou

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

If you’re interested in reading more about fertility tech, why not check out:

+ Earlier this month, researchers announced babies had been born from a trial of three-person IVF. The long-awaited results suggest that the approach can reduce the risk of mitochondrial disease—but not everyone is convinced.

+ Frozen embryos are filling storage banks around the world. It’s a struggle to know what to do with them.

+ Read about how a mobile lab is bringing IVF to rural communities in South Africa.

+ Why family-friendly policies and gender equality might be more helpful than IVF technology when it comes to averting the looming fertility crisis.

+ The first babies conceived with a sperm-injecting robot have been born. Meet the startups trying to engineer a desktop fertility machine.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Donald Trump has announced new tariffs across the world
They will affect virtually every nation—some more favorably than others. (CNN)
+ The new rates range widely from 10% to 41%. (NYT $)
+ The African country Lesotho had declared a tariff-induced state of emergency. (WSJ $)

2 Palantir has signed a $10 billion deal with the US Army
It’s the latest in a string of lucrative agreements with federal agencies. (WP $)
 
3 Tech giants are raking in cash
But we still don’t know how useful a lot of the AI they’re currently building will prove to be. (FT $)
+ It’s a boon for investors, but not necessarily for employees. (WSJ $)
+ It’s unclear whose approach will result in sustainable profits. (Semafor)

4 Neuralink is planning its first trial in the UK
To join the current five patients using its brain implant. (Reuters)
+ This patient’s Neuralink brain implant gets a boost from generative AI. (MIT Technology Review)

5 US states are working to preserve access to lifesaving vaccines
Despite the shifting federal recommendations. (Wired $)
+ The FDA plans to limit access to covid vaccines. Here’s why that’s not all bad. (MIT Technology Review)

6 Vast online groups in China are sharing explicit photos of women
Non-consensual images are being passed around hundreds of thousands of men. (The Guardian)

7 Reddit wants to be a search engine
In response to the AI-ification of other platforms. (The Verge)
+ AI means the end of internet search as we’ve known it. (MIT Technology Review)

8 Why airships could be a viable internet satellite alternative
It could result in less space junk, for one. (IEEE Spectrum)
+ Welcome to the big blimp boom. (MIT Technology Review)

9 Trust in AI coding tools is falling
The majority of devs use them, but they aren’t always reliable. (Ars Technica)
+ What is vibe coding, exactly? (MIT Technology Review)

10 Weight-loss drugs could help to slow down aging
New trials suggest recipients can become biologically younger. (New Scientist $)
+ Aging hits us in our 40s and 60s. But well-being doesn’t have to fall off a cliff. (MIT Technology Review)

Quote of the day

“We look forward to joining Matt on his private island next year.”

—Kiana Ehsani, CEO of AI agent startup Vercept, jokes about the departure of fellow co-founder Matt Deitke to join Meta’s superintelligence team for a cool $250 million, the New York Times reports.

One more thing

How ChatGPT will revolutionize the economy

There’s a gold rush underway to make money from generative AI models like ChatGPT. You can practically hear the shrieks from corner offices around the world: “What is our ChatGPT play? How do we make money off this?”

But while companies and executives want to cash in, the likely impact of generative AI on workers and the economy on the whole is far less obvious.

Will ChatGPT make the already troubling income and wealth inequality in the US and many other countries even worse, or could it in fact provide a much-needed boost to productivity? Read the full story.

—David Rotman

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Yikes—a gigantic stick insect has been discovered in (where else?) Australia.
+ This X account shares random, mundane objects each day
+ If you love a good skyscraper, these are the cities where you’re most likely to encounter them.
+ Yum, ancient Pompeii honey 🍯

Forcing LLMs to be evil during training can make them nicer in the long run

A new study from Anthropic suggests that traits such as sycophancy or evilness are associated with specific patterns of activity in large language models—and turning on those patterns during training can, paradoxically, prevent the model from adopting the related traits.

Large language models have recently acquired a reputation for behaving badly. In April, ChatGPT suddenly became an aggressive yes-man, as opposed to the moderately sycophantic version that users were accustomed to—it endorsed harebrained business ideas, waxed lyrical about users’ intelligence, and even encouraged people to go off their psychiatric medication. OpenAI quickly rolled back the change and later published a postmortem on the mishap. More recently, xAI’s Grok adopted what can best be described as a 4chan neo-Nazi persona and repeatedly referred to itself as “MechaHitler” on X. That change, too, was quickly reversed.

Jack Lindsey, a member of the technical staff at Anthropic who led the new project, says that this study was partly inspired by seeing models adopt harmful traits in such instances. “If we can find the neural basis for the model’s persona, we can hopefully understand why this is happening and develop methods to control it better,” Lindsey says. 

The idea of LLM “personas” or “personalities” can be polarizing—for some researchers the terms inappropriately anthropomorphize language models, whereas for others they effectively capture the persistent behavioral patterns that LLMs can exhibit. “There’s still some scientific groundwork to be laid in terms of talking about personas,” says David Krueger, an assistant professor of computer science and operations research at the University of Montreal, who was not involved in the study. “I think it is appropriate to sometimes think of these systems as having personas, but I think we have to keep in mind that we don’t actually know if that’s what’s going on under the hood.”

For this study, Lindsey and his colleagues worked to lay down some of that groundwork. Previous research has shown that various dimensions of LLMs’ behavior—from whether they are talking about weddings to persistent traits such as sycophancy—are associated with specific patterns of activity in the simulated neurons that constitute LLMs. Those patterns can be written down as a long string of numbers, in which each number represents how active a specific neuron is when the model is expressing that behavior.

Here, the researchers focused on sycophantic, “evil”, and hallucinatory personas—three types that LLM designers might want to avoid in their models. To identify those patterns, the team devised a fully automated pipeline that can map out that pattern given a brief text description of a persona. Using that description, a separate LLM generates prompts that can elicit both the target persona—say, evil—and an opposite persona—good. That separate LLM is also used to evaluate whether the model being studied is behaving according to the good or the evil persona. To identify the evil activity pattern, the researchers subtract the model’s average activity in good mode from its average activity in evil mode.

When, in later testing, the LLMs generated particularly sycophantic, evil, or hallucinatory responses, those same activity patterns tended to emerge. That’s a sign that researchers could eventually build a system to track those patterns and alert users when their LLMs are sucking up to them or hallucinating, Lindsey says. “I think something like that would be really valuable,” he says. “And that’s kind of where I’m hoping to get.”

Just detecting those personas isn’t enough, however. Researchers want to stop them from emerging in the first place. But preventing unsavory LLM behavior is tough. Many LLMs learn from human feedback, which trains them to behave in line with user preference—but can also push them to become excessively obsequious. And recently, researchers have documented a phenomenon called “emergent misalignment,” in which models trained on incorrect solutions to math problems or buggy code extracts somehow also learn to produce unethical responses to a wide range of user queries.

Other researchers have tested out an approach called “steering,” in which activity patterns within LLMs are deliberately stimulated or suppressed in order to elicit or prevent the corresponding behavior. But that approach has a couple of key downsides. Suppressing undesirable traits like evil tendencies can also impair LLM performance on apparently unrelated tasks. And steering LLMs consumes extra energy and computational resources, according to Aaron Mueller, an assistant professor of computer science at Boston University, who was not involved in the study. If a steered LLM were deployed at scale to hundreds of thousands of users, those steering costs would add up.

So the Anthropic team experimented with a different approach. Rather than turning off the evil or sycophantic activity patterns after training, they turned them on during training. When they trained those models on mistake-ridden data sets that would normally spark evil behavior, they instead remained as helpful and harmless as ever.

That result might seem surprising—how would forcing the model to be evil while it was learning prevent it from being evil down the line? According to Lindsey, it could be because the model has no reason to learn evil behavior if it’s already in evil mode. “The training data is teaching the model lots of things, and one of those things is to be evil,” Lindsey says. “But it’s also teaching the model a bunch of other things. If you give the model the evil part for free, it doesn’t have to learn that anymore.”

Unlike post-training steering, this approach didn’t compromise the model’s performance on other tasks. And it would also be more energy efficient if deployed widely. Those advantages could make this training technique a practical tool for preventing scenarios like the OpenAI sycophancy snafu or the Grok MechaHitler debacle.

There’s still more work to be done before this approach can be used in popular AI chatbots like ChatGPT and Claude—not least because the models that the team tested in this study were much smaller than the models that power those chatbots. “There’s always a chance that everything changes when you scale up. But if that finding holds up, then it seems pretty exciting,” Lindsey says. “Definitely the goal is to make this ready for prime time.”

The No-Surprise 3PL Pricing Model

John Melizanis believes third-party logistics fees often produce surprise charges. Per-item pricing for picks, packs, and receiving can turn an anticipated $1 per order fee into $2.50 or more, he says.

John is the co-founder of ShipDudes, a New Jersey-based 3PL launched in 2020. His company uses flat-rate pricing for pick-and-pack and warehousing, and no markup for shipping. “Brands appreciate knowing their exact costs,” he told me.

In our recent conversation, John addressed the origins of ShipDudes, in-store retail, warehouse automation, and more.

The entire audio of our conversation is embedded below. The transcript is edited for clarity and length.

Eric Bandholz: What do you do?

John Melizanis: I’m a co-founder of ShipDudes, an omnichannel fulfillment company in New Jersey. We help ecommerce brands ship worldwide and break into physical retail. We’re the behind-the-scenes engine for many companies sold in Sephora, GNC, and Vitamin Shoppe.

Beyond shipping, we support brands with EDI integrations, labeling, and compliance, essential for clients entering retail for the first time. Retail logistics can be demanding; missed labels or late deliveries can result in chargebacks. We’ve built systems to handle these challenges both operationally and technologically.

We serve three primary channels: direct-to-consumer, marketplaces such as Amazon and Chewy, and in-store retail. Our tech stack integrates across all of them. We initially developed custom software, but now utilize a white-labeled platform that we’ve heavily customized.

I began fulfilling orders in a garage, using shipping software Pirate Ship and dropping off hundreds of packages at the post office. As we evolved into a full 3PL, it became clear that some fulfillment platforms fall short in terms of inventory tracking and order verification. Our system tracks everything — pick, pack, and ship — down to barcode scans.

Bandholz: How do you handle custom packaging?

Melizanis: We understand that some brands require custom inserts, folded boxes, or more intricate packaging. Internally, we group clients into three phases: startup, scale-up, and enterprise. It’s not just about size but also how operationally mature the brand is.

We support complex packaging needs but also offer guidance on ways to simplify without sacrificing brand identity. Some brands follow our advice, others don’t, but we always offer it.

Bandholz: What about employee training?

Melizanis: It all starts with a process. If the process is solid and an employee still struggles, he’s likely not a good fit.

Every pack station has printed standard operating procedures in English and Spanish, with visuals — key for our Spanish-speaking staff. We emphasize the importance of their work: “Someone paid $100 for this order. How would you feel getting the wrong item?”

We instill that mindset daily to build pride and ownership. Cameras at each station provide accountability. If there’s a customer issue, we review the footage. If it’s a recurring mistake, we coach, revisit SOPs, and retrain.

It’s not perfect. Some hires won’t work out. But we give everyone a fair shot. If they can’t follow the process, they’re not right for the team.

Bandholz: What’s the future of robotic picking?

Melizanis: I’ve seen hybrid systems with robots retrieving from bins like giant vending machines. They’re not as expensive as you’d think and can run 24/7. We’ll likely invest in something like that for picking in the next few years.

Still, people aren’t going away entirely. Many of our clients expect a high-touch experience, including custom tissue paper, inserts, and folded boxes. That level of care still needs a human. I see automation handling repetitive tasks such as picking, while packing remains more manual for brands that value the unboxing experience.

Picking is a major expense. In a 50,000-square-foot warehouse, walking from one item to another adds up quickly. Automation could significantly reduce those costs.

But packing is also expensive, especially for premium brands. It requires someone who understands the brand and packs thoughtfully. Ever get a small item in a giant Amazon box? That’s what happens when automation replaces human oversight.

Automation can optimize picking, but humans remain vital for packing, especially when presentation matters.

Bandholz: How can brands reduce 3PL and shipping costs?

Melizanis: It starts with product design. Size, weight, and fragility all impact expense. Bigger items cost more to ship and pack. Brands with low SKU variation and simple products are far easier and cheaper to fulfill at scale.

The ideal ecommerce product is small, lightweight, durable, and fits in a bubble mailer. That minimizes fulfillment costs and maximizes margins. Not every brand can do that, but if you’re developing products, it’s worth giving serious thought to.

As for shipping costs, we use different carriers for different needs. For small, lightweight, durable products, DHL and regional carriers such as Lone Star Overnight, TForce Freight, and OSM can be cost-effective.

For larger or heavier items, USPS has robust programs, and FedEx and UPS offer solid, reliable service, although they tend to be more expensive. For customer experience, FedEx or UPS Ground is probably your best bet.

People often forget about injection points. Where your package enters the carrier network matters. A rural USPS drop-off might be slower (or faster) than one in a metro hub, depending on the volume and routing.

There’s no one-size-fits-all. You need to match the right carrier to your product type, ship-from location, and customer expectations.

Bandholz: Does ShipDudes use itemized pricing like most 3PLs, or flat rates?

Melizanis: We avoid itemized pricing. Most 3PLs have multiple fees — picks, inserts, receiving, spot checks. Brands sometimes think they’re paying $1 per order but end up paying $2.50 or more.

We use a flat pick-and-pack rate. Multiply your orders by that rate, and that’s what you pay — no surprises. We calculated it based on the average number of picks per order.

We handle storage the same way: one all-in pallet fee, no added spot check or counting charges. We’re not the cheapest or most expensive, but we’re the simplest. Brands appreciate knowing their exact costs.

We also eliminated the typical 3PL communication mess. Every brand gets a dedicated Slack channel with on-site support and account managers.

Shipping is our third and final billing item, and it’s a pass-through. We negotiate competitive rates, calculate all surcharges, and pass them along directly. It saves clients time, money, and confusion.

Bandholz: How can people connect with you?

Melizanis: Our website is ShipDudes.com. Check out our podcast, “New Money Talks.” I’m on LinkedIn.

Google Confirms It Uses Something Similar To MUVERA via @sejournal, @martinibuster

Google’s Gary Illyes answered questions during the recent Search Central Live Deep Dive in Asia about whether or not they use the new Multi‑Vector Retrieval via Fixed‑Dimensional Encodings (MUVERA) retrieval method and also if they’re using Graph Foundation Models.

MUVERA

Google recently announced MUVERA in a blog post and a research paper: a method that improves retrieval by turning complex multi-vector search into fast single-vector search. It compresses sets of token embeddings into fixed-dimensional vectors that closely approximate their original similarity. This lets it use optimized single-vector search methods to quickly find good candidates, then re-rank them using exact multi-vector similarity. Compared to older systems like PLAID, MUVERA is faster, retrieves fewer candidates, and still improves recall, making it a practical solution for large-scale retrieval.

The key points about MUVERA are:

  • MUVERA converts multi-vector sets into fixed vectors using Fixed Dimensional Encodings (FDEs), which are single-vector representations of multi-vector sets.
  • These FDEs (Fixed Dimensional Encodings) match the original multi-vector comparisons closely enough to support accurate retrieval.
  • MUVERA retrieval uses MIPS (Maximum Inner Product Search), an established search technique used in retrieval, making it easier to deploy at scale.
  • Reranking: After using fast single-vector search (MIPS) to quickly narrow down the most likely matches, MUVERA re-ranks them using Chamfer similarity, a more detailed multi-vector comparison method. This final step restores the full accuracy of multi-vector retrieval, so you get both speed and precision.
  • MUVERA is able to find more of the precisely relevant documents with a lower processing time than the state-of-the-art retrieval baseline (PLAID) it was compared to.

Google Confirms That They Use MUVERA

José Manuel Morgal (LinkedIn profile) related his question to Google’s Gary Illyes and his response was to jokingly ask what MUVERA was and then he confirmed that they use a version of it:

This is how the question and answer was described by José:

“An article has been published in Google Research about MUVERA and there is an associated paper. Is it currently in production in Search?

His response was to ask me what MUVERA was haha and then he commented that they use something similar to MUVERA but they don’t name it like that.”

Does Google Use Graph Foundation Models (GFMs)?

Google recently published a blog announcement about an AI breakthrough called a Graph Foundation Model.

Google’s Graph Foundation Model (GFM) is a type of AI that learns from relational databases by turning them into graphs, where rows become nodes and the connections between tables become edges.

Unlike older models (machine learning models and graph neural networks (GNNs)) that only work on one dataset, GFMs can handle new databases with different structures and features without retraining on the new data. GFMs use a large AI model to learn how data points relate across tables. This lets GFMs find patterns that regular models miss, and they perform much better in tasks like detecting spam in Google’s scaled systems. GFMs are a big step forward because they bring foundation-model flexibility to complex structured data.

Graph Foundation Models represent a notable achievement because their improvements are not incremental. They are an order-of-magnitude improvement, with performance gains of 3x to 40x in average precision.

José next asked Illyes if Google uses Graph Foundation Models and Gary again jokingly feigned not knowing what José was talking about.

He related the question and answer:

“An article has been published in Google Research about Graph Foundation Models for data, this time there are not paper associated with it. Is it currently in production in Search?

His answer was the same as before, asking me what Graph Foundation Models for data was, and he thought it was not in production. He did not know because there are not associated paper and on the other hand, he commented me that he did not control what is published in Google Research blog.”

Gary expressed his opinion that Graph Foundation Model was not currently used in Search. At this point, that’s the best information we have.

Is GFM Ready For Scaled Deployment?

The official Graph Foundation Model announcement says it was tested in an internal task, spam detection in ads, which strongly suggests that real internal systems and data were used, not just academic benchmarks or simulations.

Here is what Google’s announcement relates:

“Operating at Google scale means processing graphs of billions of nodes and edges where our JAX environment and scalable TPU infrastructure particularly shines. Such data volumes are amenable for training generalist models, so we probed our GFM on several internal classification tasks like spam detection in ads, which involves dozens of large and connected relational tables. Typical tabular baselines, albeit scalable, do not consider connections between rows of different tables, and therefore miss context that might be useful for accurate predictions. Our experiments vividly demonstrate that gap.”

Takeaways

Google’s Gary Illyes confirmed that a form of MUVERA is in use at Google. His answer about GFM seemed to be expressed as an opinion, so it’s somewhat less clear, as it’s related as Gary saying that he thinks it’s not in production.

Featured Image by Shutterstock/Krakenimages.com

Chrome Trial Aims To Fix Core Web Vitals For JavaScript-Heavy Sites via @sejournal, @MattGSouthern

Google Chrome is testing a new way to measure Core Web Vitals in Single Page Applications (SPAs), which is a long-standing blind spot in performance tracking that affects SEO audits and ranking signals.

Starting with Chrome 139, developers can opt into an origin trial for the Soft Navigations API. This enables measurement of metrics like LCP, CLS, and INP even when a page updates content without a full reload.

Why This Matters For SEO

SPAs are popular for speed and interactivity, but they’ve been notoriously difficult to monitor using tools like Lighthouse, field data in CrUX, or real user monitoring scripts.

That’s because SPAs often update the page using JavaScript without triggering a traditional navigation. As a result, Google’s measurement systems and most performance tools miss those updates when calculating Core Web Vitals.

This new API aims to close that gap, giving you a clearer picture of how your site performs in the real world, especially after a user clicks or navigates within an app-like interface.

What The New API Does

Chrome’s Soft Navigations API uses built-in heuristics to detect when a soft navigation happens. For example:

  • A user clicks a link
  • The page URL updates
  • The DOM visibly changes and triggers a paint

When these conditions are met, Chrome now treats it as a navigation event for performance measurement, even though no full page load occurred.

The API introduces new metrics and enhancements, including:

  • interaction-contentful-paint – lets you measure Largest Contentful Paint after a soft navigation
  • navigationId – added to performance entries so metrics can be tied to specific navigations (crucial when URLs change mid-interaction)
  • Extensions to layout shift, event timing, and INP to work across soft navigations

How To Try It

You can test this feature today in Chrome 139 using either:

  • Local testing: Enable chrome://flags/#soft-navigation-heuristics
  • Origin trial: Add a token to your site via meta tag or HTTP header to collect real user data

Chrome recommends enabling the “Advanced Paint Attribution” flag for the most complete data.

Things To Keep In Mind

Chrome’s Barry Pollard, who leads this initiative, emphasizes the API is still experimental:

“Wanna measure Core Web Vitals for for SPAs?

Well we’ve been working on the Soft Navigations API for that and we’re launching a new origin trial from Chrome 139.

Take it for a run on your app, and see if it correctly detects soft navigations on your application and let us know if it doesn’t!”

Here’s what else you should know:

  • Metrics may not be supported in older Chrome versions or other browsers
  • Your RUM provider may need to support navigationId and interaction-contentful-paintfor tracking
  • Some edge cases, like automatic redirects or replaceState() usage, may not register as navigations

Looking Ahead

This trial is a step toward making Core Web Vitals more accurate for modern JavaScript-heavy websites.

While the API isn’t yet integrated into Chrome’s public performance reports like CrUX, that could change if the trial proves successful.

If your site relies on React, Vue, Angular, or other SPA frameworks, now’s your chance to test how well Chrome’s new approach captures user experience.


Featured Image: Roman Samborskyi/Shutterstock