Is The SEO Job Market Undergoing A Major Shift? via @sejournal, @martinibuster

Anecdotal reports and an SEO jobs study describe a search marketing industry undergoing profound changes, not only in the skills in demand but also in hiring practices that may be making it difficult for experienced SEOs to get the jobs they are well qualified for.

Short History Of SEO Jobs

Twenty five years ago getting into SEO and earning a living was relatively easy. Many top corporations across all industries were hiring freelancers and agencies for specialized SEO assistance. I suspect that marketing departments didn’t view SEOs as a subset of marketing and that many didn’t have SEO staff. That gradually changed as more organizations hired dedicated SEO staff with third party SEOs providing specialized assistance.

What’s Going On With SEO Jobs?

A recent report on the state of SEO jobs provided by SEOJobs.com shared the state of SEO jobs in 2024.

The following insights show that the job of SEO continues to evolve:

  • SEO job openings declined in 2024
  • Median SEO salaries dropped
  • 65% of SEO jobs are in-house
  • Remote SEOs jobs dropped
  • SEO job titles related to content strategy and writing dropped by 28%
  • SEO Analyst job titles dropped by 12%.
  • Technical SEO and related titles dropped by a small percentage
  • Senior level titles like manager, director, and VP had the strongest increases.

The report says that job titles related to Technical SEO dropped:

“Positions in the Technical SEO and related title group represented 5.8 percent of all SEO jobs during the first quarter of 2024, falling slightly to 5.4 percent by the end of the fourth quarter – a decrease of seven percent.”

But the report also states that Technical SEO is still an in-demand skill:

“…demand for skill in technical SEO grew at the fastest rate of any skill during the fourth quarter, rising to 75 percent from 71 percent the previous quarter.”

Experienced SEOs Having Trouble Getting Hired… By AI?

Keith Goode read the above referenced report and commented that he believes the reason many highly experienced SEOs are failing to get a job is because of a poor implementation of AI into the hiring process.

He shared his insights on a LinkedIn post:

“I have seen superior SEOs languish amongst thousands of candidates, immediately rejected for a lack of experience (??) or funneled through multiple rounds of interviews and work assignments, only to be rudely ghosted by the recruiters.

The cause? I guess you could blame AI if you wanted to shoot the messenger. But the reality is that companies have overinvested in an unproven technology to handle things that it’s not yet ready to handle. I get that recruitment teams are deluged with thousands of resumes for every opening, and I understand they need a way to streamline the screening process.

However, AI has proven to be more of an enemy within than a helper. Anecdotally, I’ve heard about a hiring manager who applied for their own job opening (presumably one they were more than qualified for) only to receive an immediate rejection from the AI-powered ATS. That person fired their hiring team.

(By the way, I’m not anti-AI. I’m anti-foolishness, and a lot of companies are acting like fools.)”

Experienced SEOs Are Getting Ghosted

It may be true that SEOs with decades of experience are being left behind by poor AI vetting. A glaring example is the one shared by Brian Harnish, an SEO with decades of hands-on experience.

Brian recently published the following on LinkedIn and Facebook:

“In this job market, for me it simply appears that nothing matters.

  • You can apply at 6:15 a.m. the day the job posting pops up and be one of the first.
  • You can change your resume 15 times like I have.
  • You can use ResumeWorded. com for an ATS version of your resume.
  • You can write your resume yourself until you’re blue in the face.
  • You can follow up on the interview with thank yous immediately after.
  • You can follow up on interview decisions later.
  • You can agree to their salary ranges exactly. Even when it’s a pay cut for you.
  • A/B testing long vs. short resumes yield the same results.
  • You can tie in all of your achievements with task > impact > website statements on your resume.
  • You write an entirely customized LinkedIn profile.
  • You can know all the right people.
  • You can network up the wazoo.
  • You can have the greatest interview that you feel you’ve ever put forth.

But companies don’t provide feedback. It’s always the same form letter: “while your qualifications are impressive, we went with another candidate.” Or you’re ghosted.

This market is brutal. I really want a job. Not a handout. But nobody appears to want to hire me. At all. Despite doing EVERYthing right. I used to get hired on the spot. Now it’s just crickets.”

What The Heck Is Going On?

I know of other SEOs, also with decades of experience across all areas of SEO who should have just bounced to a new job in a matter of days but took months to get hired. I’m talking about people with SEO director level experience at top Fortune 500 companies.

How does this happen?

Are you experiencing something similar?

Featured Image by Shutterstock/Ollyy

Google’s AI Max Ads Hone Search Intent

I wrote in December that Google would launch keywordless Search ads in 2025. I based my prediction on Google’s evolving assessment of searchers’ intent. Keywords used to be the sole factor. Now, they are one of many variables that dictate the ads a user sees.

Last week, Google introduced a campaign type called “AI Max for Search.” Keywords are present but as themes instead of the leading indicator.

Artificial intelligence drives the new campaign type. Performance Max campaigns and smart bidding already rely on AI. AI Max moves beyond query matching to capture signals of what searchers seek.

The main features of AI Max are already available in Search as options to turn on or off. With AI Max, advertisers go all in. Google determines:

  • Ads that show from the initial broad match keyword list.
  • Ad text to convert the most searchers.
  • The URL for top performance.

Let’s review each of these components.

Screenshot of AI Max settings in Google Ads admin

With AI Max, Google determines the ad text and the final URL. Click image to enlarge.

Search term matching

Existing Search campaigns include an option for broad match keywords solely — pausing phrase and exact match. Google claims the combination of broad match and smart bidding will improve targeting and thus performance.

Search term matching is a logical iteration of pairing queries with ads. A keyword list is only the beginning. Google’s AI analyzes the keywords, assets, and landing pages to determine the ads to show.

Search term matching is akin to “you might like” suggestions on Netflix based on a user’s viewing history. The same principle applies here. Google will show an ad if it’s relevant to a searcher, regardless of the advertiser’s keyword.

Text customization

Formerly called automatically created assets, text customization allows Google’s AI to use the verbiage from ads, landing pages, and assets to produce customized headlines and descriptions.

For example, an advertiser selling picture frames may write a headline of “All Sizes of Picture Frames.” Google may instead show “4 x 6 Picture Frames” if it determines the searcher wants that dimension and the advertiser carries it.

Advertisers can view Google’s headlines and descriptions via “asset performance” at the account or campaign level and filter by “automatically created” to see the exact assets that showed. Advertisers can remove an asset as needed.

Advertisers can filter by “automatically created” at the account or campaign level to see the exact assets.

Final URL expansion

Google has long altered advertisers’ URLs in Performance Max and Dynamic Search Ads campaigns. Like text customization, Google changes the final URL to improve performance. Combined, the two features produce an entirely new ad.

Advertisers can now provide URL inclusions and exclusions. Inclusions instruct the AI what URLs to target, such as those manually submitted or in a page feed. URL exclusions can target blog pages, for example, if an advertiser doesn’t want to pay for that traffic.

Other features

Google’s announcement of AI Max for Search included a slew of additional features. One is brand settings, wherein advertisers can include or exclude brand names from their ads.

For example, an advertiser selling only Nike and Adidas shoes could designate a brand inclusion for those names and a brand exclusion for “Reebok” queries.

Google is upgrading reporting transparency, a much-needed improvement after the launch of Performance Max campaigns, which did not include a search term report. AI Max campaigns correct this by including an “AI Max” column in the report to show the query and the combination of query, assets, and the final URL.

In short, Google Ads continues to evolve. Keywords are no longer the primary targeting method. AI has and will reshape the platform and performance.

Did solar power cause Spain’s blackout?

At roughly midday on Monday, April 28, the lights went out in Spain. The grid blackout, which extended into parts of Portugal and France, affected tens of millions of people—flights were grounded, cell networks went down, and businesses closed for the day.

Over a week later, officials still aren’t entirely sure what happened, but some (including the US energy secretary, Chris Wright) have suggested that renewables may have played a role, because just before the outage happened, wind and solar accounted for about 70% of electricity generation. Others, including Spanish government officials, insisted that it’s too early to assign blame.

It’ll take weeks to get the full report, but we do know a few things about what happened. And even as we wait for the bigger picture, there are a few takeaways that could help our future grid.

Let’s start with what we know so far about what happened, according to the Spanish grid operator Red Eléctrica:

  • A disruption in electricity generation took place a little after 12:30 p.m. This may have been a power plant flipping off or some transmission equipment going down.
  • A little over a second later, the grid lost another bit of generation.
  • A few seconds after that, the main interconnector between Spain and southwestern France got disconnected as a result of grid instability.
  • Immediately after, virtually all of Spain’s electricity generation tripped offline.

One of the theories floating around is that things went wrong because the grid diverged from its normal frequency. (All power grids have a set frequency: In Europe the standard is 50 hertz, which means the current switches directions 50 times per second.) The frequency needs to be constant across the grid to keep things running smoothly.

There are signs that the outage could be frequency-related. Some experts pointed out that strange oscillations in the grid frequency occurred shortly before the blackout.

Normally, our grid can handle small problems like an oscillation in frequency or a drop that comes from a power plant going offline. But some of the grid’s ability to stabilize itself is tied up in old ways of generating electricity.

Power plants like those that run on coal and natural gas have massive rotating generators. If there are brief issues on the grid that upset the balance, those physical bits of equipment have inertia: They’ll keep moving at least for a few seconds, providing some time for other power sources to respond and pick up the slack. (I’m simplifying here—for more details I’d highly recommend this report from the National Renewable Energy Laboratory.)

Solar panels don’t have inertia—they rely on inverters to change electricity into a form that’s compatible with the grid and matches its frequency. Generally, these inverters are “grid-following,” meaning if frequency is dropping, they follow that drop.

In the case of the blackout in Spain, it’s possible that having a lot of power on the grid coming from sources without inertia made it more possible for a small problem to become a much bigger one.

Some key questions here are still unanswered. The order matters, for example. During that drop in generation, did wind and solar plants go offline first? Or did everything go down together?

Whether or not solar and wind contributed to the blackout as a root cause, we do know that wind and solar don’t contribute to grid stability in the same way that some other power sources do, says Seaver Wang, climate lead of the Breakthrough Institute, an environmental research organization. Regardless of whether renewables are to blame, more capability to stabilize the grid would only help, he adds.

It’s not that a renewable-heavy grid is doomed to fail. As Wang put it in an analysis he wrote last week: “This blackout is not the inevitable outcome of running an electricity system with substantial amounts of wind and solar power.”

One solution: We can make sure the grid includes enough equipment that does provide inertia, like nuclear power and hydropower. Reversing a plan to shut down Spain’s nuclear reactors beginning in 2027 would be helpful, Wang says. Other options include building massive machines that lend physical inertia and using inverters that are “grid-forming,” meaning they can actively help regulate frequency and provide a sort of synthetic inertia.

Inertia isn’t everything, though. Grid operators can also rely on installing a lot of batteries that can respond quickly when problems arise. (Spain has much less grid storage than other places with a high level of renewable penetration, like Texas and California.)

Ultimately, if there’s one takeaway here, it’s that as the grid evolves, our methods to keep it reliable and stable will need to evolve too.

If you’re curious to hear more on this story, I’d recommend this Q&A from Carbon Brief about the event and its aftermath and this piece from Heatmap about inertia, renewables, and the blackout.

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

How to build a better AI benchmark

It’s not easy being one of Silicon Valley’s favorite benchmarks. 

SWE-Bench (pronounced “swee bench”) launched in November 2024 to evaluate an AI model’s coding skill, using more than 2,000 real-world programming problems pulled from the public GitHub repositories of 12 different Python-based projects. 

In the months since then, it’s quickly become one of the most popular tests in AI. A SWE-Bench score has become a mainstay of major model releases from OpenAI, Anthropic, and Google—and outside of foundation models, the fine-tuners at AI firms are in constant competition to see who can rise above the pack. The top of the leaderboard is a pileup between three different fine tunings of Anthropic’s Claude Sonnet model and Amazon’s Q developer agent. Auto Code Rover—one of the Claude modifications—nabbed the number two spot in November, and was acquired just three months later.

Despite all the fervor, this isn’t exactly a truthful assessment of which model is “better.” As the benchmark has gained prominence, “you start to see that people really want that top spot,” says John Yang, a researcher on the team that developed SWE-Bench at Princeton University. As a result, entrants have begun to game the system—which is pushing many others to wonder whether there’s a better way to actually measure AI achievement.

Developers of these coding agents aren’t necessarily doing anything as straightforward cheating, but they’re crafting approaches that are too neatly tailored to the specifics of the benchmark. The initial SWE-Bench test set was limited to programs written in Python, which meant developers could gain an advantage by training their models exclusively on Python code. Soon, Yang noticed that high-scoring models would fail completely when tested on different programming languages—revealing an approach to the test that he describes as “gilded.”

“It looks nice and shiny at first glance, but then you try to run it on a different language and the whole thing just kind of falls apart,” Yang says. “At that point, you’re not designing a software engineering agent. You’re designing to make a SWE-Bench agent, which is much less interesting.”

The SWE-Bench issue is a symptom of a more sweeping—and complicated—problem in AI evaluation, and one that’s increasingly sparking heated debate: The benchmarks the industry uses to guide development are drifting further and further away from evaluating actual capabilities, calling their basic value into question. Making the situation worse, several benchmarks, most notably FrontierMath and Chatbot Arena, have recently come under heat for an alleged lack of transparency. Nevertheless, benchmarks still play a central role in model development, even if few experts are willing to take their results at face value. OpenAI cofounder Andrej Karpathy recently described the situation as “an evaluation crisis”: the industry has fewer trusted methods for measuring capabilities and no clear path to better ones. 

“Historically, benchmarks were the way we evaluated AI systems,” says Vanessa Parli, director of research at Stanford University’s Institute for Human-Centered AI. “Is that the way we want to evaluate systems going forward? And if it’s not, what is the way?”

A growing group of academics and AI researchers are making the case that the answer is to go smaller, trading sweeping ambition for an approach inspired by the social sciences. Specifically, they want to focus more on testing validity, which for quantitative social scientists refers to how well a given questionnaire measures what it’s claiming to measure—and, more fundamentally, whether what it is measuring has a coherent definition. That could cause trouble for benchmarks assessing hazily defined concepts like “reasoning” or “scientific knowledge”—and for developers aiming to reach the muchhyped goal of artificial general intelligence—but it would put the industry on firmer ground as it looks to prove the worth of individual models.

“Taking validity seriously means asking folks in academia, industry, or wherever to show that their system does what they say it does,” says Abigail Jacobs, a University of Michigan professor who is a central figure in the new push for validity. “I think it points to a weakness in the AI world if they want to back off from showing that they can support their claim.”

The limits of traditional testing

If AI companies have been slow to respond to the growing failure of benchmarks, it’s partially because the test-scoring approach has been so effective for so long. 

One of the biggest early successes of contemporary AI was the ImageNet challenge, a kind of antecedent to contemporary benchmarks. Released in 2010 as an open challenge to researchers, the database held more than 3 million images for AI systems to categorize into 1,000 different classes.

Crucially, the test was completely agnostic to methods, and any successful algorithm quickly gained credibility regardless of how it worked. When an algorithm called AlexNet broke through in 2012, with a then unconventional form of GPU training, it became one of the foundational results of modern AI. Few would have guessed in advance that AlexNet’s convolutional neural nets would be the secret to unlocking image recognition—but after it scored well, no one dared dispute it. (One of AlexNet’s developers, Ilya Sutskever, would go on to cofound OpenAI.)

A large part of what made this challenge so effective was that there was little practical difference between ImageNet’s object classification challenge and the actual process of asking a computer to recognize an image. Even if there were disputes about methods, no one doubted that the highest-scoring model would have an advantage when deployed in an actual image recognition system.

But in the 12 years since, AI researchers have applied that same method-agnostic approach to increasingly general tasks. SWE-Bench is commonly used as a proxy for broader coding ability, while other exam-style benchmarks often stand in for reasoning ability. That broad scope makes it difficult to be rigorous about what a specific benchmark measures—which, in turn, makes it hard to use the findings responsibly. 

Where things break down

Anka Reuel, a PhD student who has been focusing on the benchmark problem as part of her research at Stanford, has become convinced the evaluation problem is the result of this push toward generality. “We’ve moved from task-specific models to general-purpose models,” Reuel says. “It’s not about a single task anymore but a whole bunch of tasks, so evaluation becomes harder.”

Like the University of Michigan’s Jacobs, Reuel thinks “the main issue with benchmarks is validity, even more than the practical implementation,” noting: “That’s where a lot of things break down.” For a task as complicated as coding, for instance, it’s nearly impossible to incorporate every possible scenario into your problem set. As a result, it’s hard to gauge whether a model is scoring better because it’s more skilled at coding or because it has more effectively manipulated the problem set. And with so much pressure on developers to achieve record scores, shortcuts are hard to resist.

For developers, the hope is that success on lots of specific benchmarks will add up to a generally capable model. But the techniques of agentic AI mean a single AI system can encompass a complex array of different models, making it hard to evaluate whether improvement on a specific task will lead to generalization. “There’s just many more knobs you can turn,” says Sayash Kapoor, a computer scientist at Princeton and a prominent critic of sloppy practices in the AI industry. “When it comes to agents, they have sort of given up on the best practices for evaluation.”

In a paper from last July, Kapoor called out specific issues in how AI models were approaching the WebArena benchmark, designed by Carnegie Mellon University researchers in 2024 as a test of an AI agent’s ability to traverse the web. The benchmark consists of more than 800 tasks to be performed on a set of cloned websites mimicking Reddit, Wikipedia, and others. Kapoor and his team identified an apparent hack in the winning model, called STeP. STeP included specific instructions about how Reddit structures URLs, allowing STeP models to jump directly to a given user’s profile page (a frequent element of WebArena tasks).

This shortcut wasn’t exactly cheating, but Kapoor sees it as “a serious misrepresentation of how well the agent would work had it seen the tasks in WebArena for the first time.” Because the technique was successful, though, a similar policy has since been adopted by OpenAI’s web agent Operator. (“Our evaluation setting is designed to assess how well an agent can solve tasks given some instruction about website structures and task execution,” an OpenAI representative said when reached for comment. “This approach is consistent with how others have used and reported results with WebArena.” STeP did not respond to a request for comment.)

Further highlighting the problem with AI benchmarks, late last month Kapoor and a team of researchers wrote a paper that revealed significant problems in Chatbot Arena, the popular crowdsourced evaluation system. According to the paper, the leaderboard was being manipulated; many top foundation models were conducting undisclosed private testing and releasing their scores selectively.

Today, even ImageNet itself, the mother of all benchmarks, has started to fall victim to validity problems. A 2023 study from researchers at the University of Washington and Google Research found that when ImageNet-winning algorithms were pitted against six real-world data sets, the architecture improvement “resulted in little to no progress,” suggesting that the external validity of the test had reached its limit.

Going smaller

For those who believe the main problem is validity, the best fix is reconnecting benchmarks to specific tasks. As Reuel puts it, AI developers “have to resort to these high-level benchmarks that are almost meaningless for downstream consumers, because the benchmark developers can’t anticipate the downstream task anymore.” So what if there was a way to help the downstream consumers identify this gap?

In November 2024, Reuel launched a public ranking project called BetterBench, which rates benchmarks on dozens of different criteria, such as whether the code has been publicly documented. But validity is a central theme, with particular criteria challenging designers to spell out what capability their benchmark is testing and how it relates to the tasks that make up the benchmark.

“You need to have a structural breakdown of the capabilities,” Reuel says. “What are the actual skills you care about, and how do you operationalize them into something we can measure?”

The results are surprising. One of the highest-scoring benchmarks is also the oldest: the Arcade Learning Environment (ALE), established in 2013 as a way to test models’ ability to learn how to play a library of Atari 2600 games. One of the lowest-scoring is the Massive Multitask Language Understanding (MMLU) benchmark, a widely used test for general language skills; by the standards of BetterBench, the connection between the questions and the underlying skill was too poorly defined.

BetterBench hasn’t meant much for the reputations of specific benchmarks, at least not yet; MMLU is still widely used, and ALE is still marginal. But the project has succeeded in pushing validity into the broader conversation about how to fix benchmarks. In April, Reuel quietly joined a new research group hosted by Hugging Face, the University of Edinburgh, and EleutherAI, where she’ll develop her ideas on validity and AI model evaluation with other figures in the field. (An official announcement is expected later this month.) 

Irene Solaiman, Hugging Face’s head of global policy, says the group will focus on building valid benchmarks that go beyond measuring straightforward capabilities. “There’s just so much hunger for a good benchmark off the shelf that already works,” Solaiman says. “A lot of evaluations are trying to do too much.”

Increasingly, the rest of the industry seems to agree. In a paper in March, researchers from Google, Microsoft, Anthropic, and others laid out a new framework for improving evaluations—with validity as the first step. 

“AI evaluation science must,” the researchers argue, “move beyond coarse grained claims of ‘general intelligence’ towards more task-specific and real-world relevant measures of progress.” 

Measuring the “squishy” things

To help make this shift, some researchers are looking to the tools of social science. A February position paper argued that “evaluating GenAI systems is a social science measurement challenge,” specifically unpacking how the validity systems used in social measurements can be applied to AI benchmarking. 

The authors, largely employed by Microsoft’s research branch but joined by academics from Stanford and the University of Michigan, point to the standards that social scientists use to measure contested concepts like ideology, democracy, and media bias. Applied to AI benchmarks, those same procedures could offer a way to measure concepts like “reasoning” and “math proficiency” without slipping into hazy generalizations.

In the social science literature, it’s particularly important that metrics begin with a rigorous definition of the concept measured by the test. For instance, if the test is to measure how democratic a society is, it first needs to establish a definition for a “democratic society” and then establish questions that are relevant to that definition. 

To apply this to a benchmark like SWE-Bench, designers would need to set aside the classic machine learning approach, which is to collect programming problems from GitHub and create a scheme to validate answers as true or false. Instead, they’d first need to define what the benchmark aims to measure (“ability to resolve flagged issues in software,” for instance), break that into subskills (different types of problems or types of program that the AI model can successfully process), and then finally assemble questions that accurately cover the different subskills.

It’s a profound change from how AI researchers typically approach benchmarking—but for researchers like Jacobs, a coauthor on the February paper, that’s the whole point. “There’s a mismatch between what’s happening in the tech industry and these tools from social science,” she says. “We have decades and decades of thinking about how we want to measure these squishy things about humans.”

Even though the idea has made a real impact in the research world, it’s been slow to influence the way AI companies are actually using benchmarks. 

The last two months have seen new model releases from OpenAI, Anthropic, Google, and Meta, and all of them lean heavily on multiple-choice knowledge benchmarks like MMLU—the exact approach that validity researchers are trying to move past. After all, model releases are, for the most part, still about showing increases in general intelligence, and broad benchmarks continue to be used to back up those claims. 

For some observers, that’s good enough. Benchmarks, Wharton professor Ethan Mollick says, are “bad measures of things, but also they’re what we’ve got.” He adds: “At the same time, the models are getting better. A lot of sins are forgiven by fast progress.”

For now, the industry’s long-standing focus on artificial general intelligence seems to be crowding out a more focused validity-based approach. As long as AI models can keep growing in general intelligence, then specific applications don’t seem as compelling—even if that leaves practitioners relying on tools they no longer fully trust. 

“This is the tightrope we’re walking,” says Hugging Face’s Solaiman. “It’s too easy to throw the system out, but evaluations are really helpful in understanding our models, even with these limitations.”

Russell Brandom is a freelance writer covering artificial intelligence. He lives in Brooklyn with his wife and two cats.

This story was supported by a grant from the Tarbell Center for AI Journalism.

The Download: AI benchmarks, and Spain’s grid blackout

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

How to build a better AI benchmark

It’s not easy being one of Silicon Valley’s favorite benchmarks. 

SWE-Bench (pronounced “swee bench”) launched in November 2024 as a way to evaluate an AI model’s coding skill. It has since quickly become one of the most popular tests in AI. A SWE-Bench score has become a mainstay of major model releases from OpenAI, Anthropic, and Google—and outside of foundation models, the fine-tuners at AI firms are in constant competition to see who can rise above the pack.

Despite all the fervor, this isn’t exactly a truthful assessment of which model is “better.” Entrants have begun to game the system—which is pushing many others to wonder whether there’s a better way to actually measure AI achievement. Read the full story.

—Russell Brandom

Did solar power cause Spain’s blackout?

At roughly midday on Monday, April 28, the lights went out in Spain. The grid blackout, which extended into parts of Portugal and France, affected tens of millions of people—flights were grounded, cell networks went down, and businesses closed for the day.

Over a week later, officials still aren’t entirely sure what happened, but some have suggested that renewables may have played a role, because just before the outage happened, wind and solar accounted for about 70% of electricity generation. Others, including Spanish government officials, insist that it’s too early to assign blame.

It’ll take weeks to get the full report, but we do know a few things about what happened. Here are a few takeaways that could help our future grid. 

—Casey Crownhart

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 The Trump administration will repeal some global chip curbs 
It’s drawing up new rules that prioritize direct negotiations with various nations. (Bloomberg $)
+ The curbs have always been leaky anyway. (Economist $)

2 India and Pakistan have accused each other of overnight drone attacks
The conflict between the two countries is rapidly escalating. (The Guardian)
+ Pakistan claims to have shot down 25 drones in its airspace. (Reuters)
+ Mass-market military drones have changed the way wars are fought. (MIT Technology Review)

3 The FDA is interested in using AI for drug evaluation
And has met with OpenAI to hear more about how to do it. (Wired $)
+ An AI-driven “factory of drugs” claims to have hit a big milestone. (MIT Technology Review)

4 The US is pushing nations facing its tariffs to adopt Starlink
Government officials in India and other countries have fast tracked approvals. (WP $)
+ India recently announced new rules for satellite internet providers. (Rest of World)

5 Apple is overhauling its Safari browser to focus on AI search
Its search volume is down for the first time in 22 years. (The Verge)
+ Apple exec Eddy Cue thinks AI search will replace traditional search engines. (Bloomberg $)
+ AI means the end of internet search as we’ve known it. (MIT Technology Review)

6 Mark Zuckerberg is betting big on AI chatbots
He’s on a media charm offensive to convince us that AI friends are the future. (WSJ $)
+ The AI relationship revolution is already here. (MIT Technology Review)

7 Students can’t wean themselves off ChatGPT
And experts fear that they’ll emerge into the workforce essentially illiterate. (NY Mag $)
+ Some educators believe that AI highlights how the ways we teach need to change. (MIT Technology Review)

8 We don’t really know how memory works 🧠
But these researchers are doing their best to find out. (Quanta Magazine)

9 The vast majority of the sea depths are still unexplored
What lies beneath is a mystery. (New Scientist $)
+ Meet the divers trying to figure out how deep humans can go. (MIT Technology Review)

10 Pet psychics are taking over TikTok 🔮
But does your furry friend have anything to say?(NYT $)
+ Humans are still better than AI at futuregazing—for now. (Vox)
+ How DeepSeek became a fortune teller for China’s youth. (MIT Technology Review)

Quote of the day

“It’s like living in hell.”

—Elizabeth Martorana, a Virginia resident, describes what it’s like to live in a development zone for Amazon, Microsoft, and Google data centers, Semafor reports.

One more thing

How Antarctica’s history of isolation is ending—thanks to Starlink

“This is one of the least visited places on planet Earth and I got to open the door,” Matty Jordan, a construction specialist at New Zealand’s Scott Base in Antarctica, wrote in the caption to the video he posted to Instagram and TikTok in October 2023.

In the video, he guides viewers through the hut, pointing out where the men of Ernest Shackleton’s 1907 expedition lived and worked.

The video has racked up millions of views from all over the world. It’s also kind of a miracle: until very recently, those who lived and worked on Antarctic bases had no hope of communicating so readily with the outside world.

That’s starting to change, thanks to Starlink, the satellite constellation developed by Elon Musk’s company SpaceX to service the world with high-speed broadband internet. Read the full story.

—Allegra Rosenberg

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Does Boston still drink? Not in the same way it used to.
+ Where in the US you should set up camp to stargaze right now.
+ Wow: this New Zealand snail lays eggs from its neck. 🐌
+ Jurassic World Rebirth is coming: and it looks suitably bonkers.

Your gut microbes might encourage criminal behavior

A few years ago, a Belgian man in his 30s drove into a lamppost. Twice. Local authorities found that his blood alcohol level was four times the legal limit. Over the space of a few years, the man was apprehended for drunk driving three times. And on all three occasions, he insisted he hadn’t been drinking.

He was telling the truth. A doctor later diagnosed auto-brewery syndrome—a rare condition in which the body makes its own alcohol. Microbes living inside the man’s body were fermenting the carbohydrates in his diet to create ethanol. Last year, he was acquitted of drunk driving.

His case, along with several other scientific studies, raises a fascinating question for microbiology, neuroscience, and the law: How much of our behavior can we blame on our microbes?

Each of us hosts vast communities of tiny bacteria, archaea (which are a bit like bacteria), fungi, and even viruses all over our bodies. The largest collection resides in our guts, which play home to trillions of them. You have more microbial cells than human cells in your body. In some ways, we’re more microbe than human.

Microbiologists are still getting to grips with what all these microbes do. Some seem to help us break down food. Others produce chemicals that are important for our health in some way. But the picture is extremely complicated, partly because of the myriad ways microbes can interact with each other.

But they also interact with the human nervous system. Microbes can produce compounds that affect the way neurons work. They also influence the functioning of the immune system, which can have knock-on effects on the brain. And they seem to be able to communicate with the brain via the vagus nerve.

If microbes can influence our brains, could they also explain some of our behavior, including the criminal sort? Some microbiologists think so, at least in theory. “Microbes control us more than we think they do,” says Emma Allen-Vercoe, a microbiologist at the University of Guelph in Canada.

Researchers have come up with a name for applications of microbiology to criminal law: the legalome. A better understanding of how microbes influence our behavior could not only affect legal proceedings but also shape crime prevention and rehabilitation efforts, argue Susan Prescott, a pediatrician and immunologist at the University of Western Australia, and her colleagues.

“For the person unaware that they have auto-brewery syndrome, we can argue that microbes are like a marionettist pulling the strings in what would otherwise be labeled as criminal behavior,” says Prescott.

Auto-brewery syndrome is a fairly straightforward example (it has been involved in the acquittal of at least two people so far), but other brain-microbe relationships are likely to be more complicated. We do know a little about one microbe that seems to influence behavior: Toxoplasmosis gondii, a parasite that reproduces in cats and spreads to other animals via cat feces.

The parasite is best known for changing the behavior of rodents in ways that make them easier prey—an infection seems to make mice permanently lose their fear of cats. Research in humans is nowhere near conclusive, but some studies have linked infections with the parasite to personality changes, increased aggression, and impulsivity.

“That’s an example of microbiology that we know affects the brain and could potentially affect the legal standpoint of someone who’s being tried for a crime,” says Allen-Vercoe. “They might say ‘My microbes made me do it,’ and I might believe them.”

There’s more evidence linking gut microbes to behavior in mice, which are some of the most well-studied creatures. One study involved fecal transplants—a procedure that involves inserting fecal matter from one animal into the intestines of another. Because feces contain so much gut bacteria, fecal transplants can go some way to swap out a gut microbiome. (Humans are doing this too—and it seems to be a remarkably effective way to treat persistent C. difficile infections in people.)

Back in 2013, scientists at McMaster University in Canada performed fecal transplants between two strains of mice, one that is known for being timid and another that tends to be rather gregarious. This swapping of gut microbes also seemed to swap their behavior—the timid mice became more gregarious, and vice versa.

Microbiologists have since held up this study as one of the clearest demonstrations of how changing gut microbes can change behavior—at least in mice. “But the question is: How much do they control you, and how much is the human part of you able to overcome that control?” says Allen-Vercoe. “And that’s a really tough question to answer.”

After all, our gut microbiomes, though relatively stable, can change. Your diet, exercise routine, environment, and even the people you live with can shape the communities of microbes that live on and in you. And the ways these communities shift and influence behavior might be slightly different for everyone. Pinning down precise links between certain microbes and criminal behaviors will be extremely difficult, if not impossible. 

“I don’t think you’re going to be able to take someone’s microbiome and say ‘Oh, look—you’ve got bug X, and that means you’re a serial killer,” says Allen-Vercoe.

Either way, Prescott hopes that advances in microbiology and metabolomics might help us better understand the links between microbes, the chemicals they produce, and criminal behaviors—and potentially even treat those behaviors.

“We could get to a place where microbial interventions are a part of therapeutic programming,” she says.

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

A new AI translation system for headphones clones multiple voices simultaneously

Imagine going for dinner with a group of friends who switch in and out of different languages you don’t speak, but still being able to understand what they’re saying. This scenario is the inspiration for a new AI headphone system that translates the speech of multiple speakers simultaneously, in real time.

The system, called Spatial Speech Translation, tracks the direction and vocal characteristics of each speaker, helping the person wearing the headphones to identify who is saying what in a group setting. 

“There are so many smart people across the world, and the language barrier prevents them from having the confidence to communicate,” says Shyam Gollakota, a professor at the University of Washington, who worked on the project. “My mom has such incredible ideas when she’s speaking in Telugu, but it’s so hard for her to communicate with people in the US when she visits from India. We think this kind of system could be transformative for people like her.”

While there are plenty of other live AI translation systems out there, such as the one running on Meta’s Ray-Ban smart glasses, they focus on a single speaker, not multiple people speaking at once, and deliver robotic-sounding automated translations. The new system is designed to work with existing, off-the shelf noise-canceling headphones that have microphones, plugged into a laptop powered by Apple’s M2 silicon chip, which can support neural networks. The same chip is also present in the Apple Vision Pro headset. The research was presented at the ACM CHI Conference on Human Factors in Computing Systems in Yokohama, Japan, this month.

Over the past few years, large language models have driven big improvements in speech translation. As a result, translation between languages for which lots of training data is available (such as the four languages used in this study) is close to perfect on apps like Google Translate or in ChatGPT. But it’s still not seamless and instant across many languages. That’s a goal a lot of companies are working toward, says Alina Karakanta, an assistant professor at Leiden University in the Netherlands, who studies computational linguistics and was not involved in the project. “I feel that this is a useful application. It can help people,” she says. 

Spatial Speech Translation consists of two AI models, the first of which divides the space surrounding the person wearing the headphones into small regions and uses a neural network to search for potential speakers and pinpoint their direction. 

The second model then translates the speakers’ words from French, German, or Spanish into English text using publicly available data sets. The same model extracts the unique characteristics and emotional tone of each speaker’s voice, such as the pitch and the amplitude, and applies those properties to the text, essentially creating a “cloned” voice. This means that when the translated version of a speaker’s words is relayed to the headphone wearer a few seconds later, it sounds as if it’s coming from the speaker’s direction and the voice sounds a lot like the speaker’s own, not a robotic-sounding computer.

Given that separating out human voices is hard enough for AI systems, being able to incorporate that ability into a real-time translation system, map the distance between the wearer and the speaker, and achieve decent latency on a real device is impressive, says Samuele Cornell, a postdoc researcher at Carnegie Mellon University’s Language Technologies Institute, who did not work on the project.

“Real-time speech-to-speech translation is incredibly hard,” he says. “Their results are very good in the limited testing settings. But for a real product, one would need much more training data—possibly with noise and real-world recordings from the headset, rather than purely relying on synthetic data.”

Gollakota’s team is now focusing on reducing the amount of time it takes for the AI translation to kick in after a speaker says something, which will accommodate more natural-sounding conversations between people speaking different languages. “We want to really get down that latency significantly to less than a second, so that you can still have the conversational vibe,” Gollakota says.

This remains a major challenge, because the speed at which an AI system can translate one language into another depends on the languages’ structure. Of the three languages Spatial Speech Translation was trained on, the system was quickest to translate French into English, followed by Spanish and then German—reflecting how German, unlike the other languages, places a sentence’s verbs and much of its meaning at the end and not at the beginning, says Claudio Fantinuoli, a researcher at the Johannes Gutenberg University of Mainz in Germany, who did not work on the project. 

Reducing the latency could make the translations less accurate, he warns: “The longer you wait [before translating], the more context you have, and the better the translation will be. It’s a balancing act.”

The Download: AI headphone translation, and the link between microbes and our behavior

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

A new AI translation system for headphones clones multiple voices simultaneously

What’s new: Imagine going for dinner with a group of friends who switch in and out of different languages you don’t speak, but still being able to understand what they’re saying. This scenario is the inspiration for a new AI headphone system that translates the speech of multiple speakers simultaneously, in real time.

How it works: The system tracks the direction and vocal characteristics of each speaker, helping the person wearing the headphones to identify who is saying what in a group setting. Read the full story.

—Rhiannon Williams

Your gut microbes might encourage criminal behavior

A few years ago, a Belgian man in his 30s drove into a lamppost. Twice. Local authorities found that his blood alcohol level was four times the legal limit. Over the space of a few years, the man was apprehended for drunk driving three times. And on all three occasions, he insisted he hadn’t been drinking.

He was telling the truth. A doctor later diagnosed auto-brewery syndrome—a rare condition in which the body makes its own alcohol. Microbes living inside the man’s body were fermenting the carbohydrates in his diet to create ethanol. Last year, he was acquitted of drunk driving.

His case, along with several other scientific studies, raises a fascinating question for microbiology, neuroscience, and the law: How much of our behavior can we blame on our microbes? Read the full story.

—Jessica Hamzelou

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 How the Gates Foundation will end
Bill Gates will wind it down in 2045, after distributing most of his remaining fortune. (NYT $)
+ He estimates he’ll give away $200 billion in the next 20 years. (Semafor)
+ The foundation is shuttering several decades earlier than he expected. (BBC)

2 US Customs and Border Protection will no longer protect pregnant women
It’s rolled back policies designed to protect vulnerable people, including infants. (Wired $)
+ The US wants to use facial recognition to identify migrant children as they age. (MIT Technology Review)

3 DOGE is readying software to turbo-charge mass layoffs
After some 260,000 government workers have already been let go. (Reuters)
+ DOGE’s math doesn’t add up. (The Atlantic $)
+ One of its biggest inspirations is no fan of the program. (WP $)
+ Can AI help DOGE slash government budgets? It’s complex. (MIT Technology Review)

4 Scientists are using AI to predict cancer survival outcomes
In some cases, it’s outperforming clinicians’ forecasts. (FT $)
+ Why it’s so hard to use AI to diagnose cancer. (MIT Technology Review)

5 Apple is reportedly working on new chips for its smart glasses
But we’ll have to wait a few more years. (Bloomberg $)
+ What’s next for smart glasses. (MIT Technology Review)

6 Silicon Valley has a vision for the future of warfare
Military technologies are no longer solely the preserve of governments. (Bloomberg $)
+ Palmer Luckey on the Pentagon’s future of mixed reality. (MIT Technology Review)

7 AI companies don’t want regulation any more
Just a few short years after they claimed regulation was the best way of making AI safe. (WP $)

8 Forget SEO, GEO is where it’s at these days
Marketers are scrambling to adopt best Generative Engine Optimization practices now that AI is upending how we search the web. (WSJ $)
+ Your most important customer may be AI. (MIT Technology Review)

9 AI-generated recruiters are making job hunting even worse
Avatars can glitch out and stumble over their words. (404 Media)

10 A Soviet-era spacecraft is reentering Earth’s atmosphere
More than 50 years after it misfired on a journey to Venus. (Ars Technica)
+ The world’s next big environmental problem could come from space. (MIT Technology Review)

Quote of the day

“The picture of the world’s richest man killing the world’s poorest children is not a pretty one.”

—Bill Gates lashes out at Elon Musk’s cuts to USAID in an interview with the Financial Times.

One more thing

The great commercial takeover of low Earth orbit

NASA designed the International Space Station to fly for 20 years. It has lasted six years longer than that, though it is showing its age, and NASA is currently studying how to safely destroy the space laboratory by around 2030.

The ISS never really became what some had hoped: a launching point for an expanding human presence in the solar system. But it did enable fundamental research on materials and medicine, and it helped us start to understand how space affects the human body.

To build on that work, NASA has partnered with private companies to develop new, commercial space stations for research, manufacturing, and tourism. If they are successful, these companies will bring about a new era of space exploration: private rockets flying to private destinations. They’re already planning to do it around the moon. One day, Mars could follow. Read the full story.

—David W. Brown

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ It’s almost pasta salad time!
+ Who is the better fictional archaeologist: Indiana Jones or Lara Croft?
+ How a good night’s sleep could help to give you a long-lasting memory boost. 😴
+ How millennials became deeply uncool (allegedly)

How cloud and AI transform and improve customer experiences

As AI technologies become increasingly mainstream, there’s mounting competitive pressure to transform traditional infrastructures and technology stacks. Traditional brick-and-mortar companies are finding cloud and data to be the foundational keys to unlocking their paths to digital transformation, and to competing in modern, AI-forward industry landscapes. 

In this exclusive webcast, experts discuss the building blocks for digital transformation, approaches for upskilling employees and putting digital processes in place, and data management best practices. The discussion also looks at what the near future holds and emphasizes the urgency for companies to transform now to stay relevant. 

Learn from the experts

  • Digital transformation, from the ground up, starts by moving infrastructure and data to the cloud
  • AI implementation requires a talent transformation at scale, across the organization
  • AI is a company-wide initiative—everyone in the company will become either an AI creator or consumer

Featured speakers

Mohammed Rafee Tarafdar, Chief Technology Officer, Infosys

Rafee is Infosys’s Chief Technology Officer. He is responsible for the technology vision and strategy, sensing & scaling emerging technologies, advising and partnering with clients to help them succeed in their AI transformation journey and building high technology talent density. He is leading the AI First transformation journey for Infosys and has implemented population and enterprise scale platforms. He is the co-author of “The Live Enterprise” book and has been recognized as a top 50 technology global leader by Forbes in 2023 and Top 25 Tech Wavemaker by Entrepreneur India magazine in 2024.

Sam Jaddi, Chief Information Officer, ADT

Sam Jaddi is the Chief Information Officer for ADT. With more than 26 years of experience in technology innovation, Sam has deep knowledge of the security and smart home industry. His team helps to drive ADT’s business platforms and processes to improve both customer and employee experiences in the future. Sam has helped set the technology strategy, vision and direction for the company’s Digital transformation. Prior to Sam’s role at ADT, he served as Chief Technology Officer at Stanley, overseeing the company’s new security division, leading global integration initiatives, IT strategy, transformation and international operations.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

Gentleman’s Gazette Thrives on YouTube

Raphael Schneider launched Gentleman’s Gazette in 2010 as a blog for men’s style and apparel. He added his own product line in 2013, selling menswear and accessories.

He emphasizes quality in clothing and content. His YouTube channel, launched in 2015, remains essential for traffic and conversions, owing, he says, to the quality of the videos. “Ten years from now, someone can still benefit from a video we produce today,” he told me.

Raphael first appeared on the podcast in 2018. In this our latest conversation, he addressed the company’s origins, Google search, and, yes, his focus on YouTube.

Our entire audio is embedded below. The transcript is edited for clarity and style.

Eric Bandholz: Tell us about your journey to Gentleman’s Gazette.

Raphael Schneider:  As a teenager in Germany, I earned money selling items on eBay. That’s when I discovered cufflinks. I learned they required French cuff shirts and jackets, which sparked my interest in classic men’s clothing.

I went to law school thinking I could dress well every day, but I realized I hated law, so I moved to the U.S. in 2009 through an exchange program and married my girlfriend, whom I had met earlier as a student. The job market was tough, especially for foreigners, so I turned to clothing and style, my passions.

Blogging was booming then. I launched Gentleman’s Gazette in 2010 to publish articles on men’s style. Readers kept asking where I got my clothes. That led me to create Fort Belvedere, our menswear brand, in 2013. I had no product development experience, and it took time and money to get going.

Early on, I wrongly assumed that building an audience produced easy sales. But I’ve learned a lot, and I love the creativity and independence of entrepreneurship.

Bandholz: Your YouTube channel is impressive.

Schneider: Early on, our main traffic generator was Google organic search. It was our bread and butter for a while, generating around 1.5 million page views annually. Traffic ebbed, so we explored other options.

I dabbled in video as early as 2012, but we fully committed to YouTube in 2015. Being early helped, and video is a strong medium for clothing and style. You can show fabric drape, fit, and personality, which articles can’t always convey.

We chose a personality-driven approach, featuring different hosts to appeal to a broader audience. Some may like me, others might not, so variety helps. YouTube’s landscape has changed. Now there are shorts, algorithms, and more creators, but we adapt. We’re experimenting with travel-style content, allowing viewers to experience places vicariously and inspire their journeys.

Our content isn’t just “look at this pocket square.” It’s about educating and connecting with a niche audience that values classic menswear. While most people wear leisure clothes and aren’t interested in cufflinks, we serve those who are. We continue to produce foundational style content while evolving to keep advanced users engaged.

So, yes, YouTube is an essential marketing tool for us.

Bandholz: Does investing in higher-production travel videos pay off?

Schneider: Last year, we visited London to test travel content. Not everything new pays off immediately, but we track performance carefully — click-throughs via YouTube Shopping, affiliate links in descriptions, and customer feedback. While it’s hard to tie direct sales to a single video, the response from viewers has been positive.

Some videos generate more revenue than others; we analyze patterns and adjust. But we’ve realized it’s about having a range of content: top-of-funnel to raise awareness, mid-funnel like product guides, and bottom-funnel content for ready-to-buy customers, like a deep dive into pocket squares. That mix still works well for us.

Attribution is tricky. A lot gets credited to organic Google search, but we know it’s multi-touch. Someone might discover us through YouTube on mobile, but check out later on desktop via branded search.

With AI and easier video creation on the rise, content production will become cheaper, but we still see value in YouTube. Competing on Instagram is tough. There are millions of creators.

The field is smaller on YouTube, especially with location-specific travel content. Few people can travel to Vienna, speak the language, and do in-depth style content. That’s where we want to stand out — a big fish in a small pond.

Bandholz: How is AI affecting your blog traffic and strategy?

Schneider: Our focus remains on original YouTube content, though we may test YouTube ads since we have an in-house production team. AI is changing things. I’ve always believed in creating timeless value — we make our products and content to last. Ten years from now, someone can still benefit from a video we produce today.

In the past, Google reliably sent traffic if you made comprehensive content. But now, AI tools give people instant answers. They don’t want to click through multiple sites to find their needs. We’ve noticed Google is leaning more on AI Overviews and keeping users on their platform, which doesn’t help small creators like us.

We’re seeing our brand in those AI summaries, which recognize Gentleman’s Gazette as reputable, but we’ve yet to see significant traffic or conversions.

So it’s a major shift. Big players are gaming the system and flooding the web with AI content from old domains. But I still think there’s a market for real, human passion. If you’re into photography, do you want advice from AI or from someone who lives it?

We are using AI in practical ways. For example, I created a voice clone for product videos. For a tie that comes in 14 colors, I recorded just once, and AI handles the rest. Tools like that save time without sacrificing personality. I think that’s the key.

People still connect with people. We’re continuing to invest in that, stay curious, and adapt. The danger isn’t change — it’s resisting change.

Bandholz:  Is organic search traffic from Google still viable for premium-priced products?

Schneider: I’ve spoken to several search-engine experts, and they all say we’ll struggle to rank for high-purchase-intent keywords if you’re selling premium products. Conversion rates are lower for expensive items, so Google favors cheaper alternatives since it prioritizes click-throughs.

That said, we still get organic search traffic. We analyze landing pages and reverse-engineer what people might be searching for.

If someone searches for a niche item, such as a specific silk necktie, we can still rank because few apparel merchants offer those products. The key is to clarify that a premium product has specific features and a cheaper item does not.

Ranking for high-intent short-tail keywords is nearly impossible, but long-tail SEO is still viable. For example, “unlined driving gloves in lamb nappa” is specific enough to rank and reach the right buyer.

Ultimately, though, it’s more about brand affinity, like with Beardbrand, your company. People come for the lifestyle, the philosophy — they connect to the identity. That’s where premium brands still have power.

Bandholz: Where can people buy your stuff?

Schneider: Our site is GentlemansGazette.com. Follow us on YouTube, Instagram, and Facebook. I’m on LinkedIn.