Creating psychological safety in the AI era

Rolling out enterprise-grade AI means climbing two steep cliffs at once. First, understanding and implementing the tech itself. And second, creating the cultural conditions where employees can maximize its value. While the technical hurdles are significant, the human element can be even more consequential; fear and ambiguity can stall momentum of even the most promising initiatives.

Psychological safety—feeling free to express opinions and take calculated risks without worrying about career repercussions1—is essential for successful AI adoption. In psychologically safe workspaces, employees are empowered to challenge assumptions and raise concerns about new tools without fear of reprisal. This is nothing short of a necessity when introducing a nascent and profoundly powerful technology that still lacks established best practices.

“Psychological safety is mandatory in this new era of AI,” says Rafee Tarafdar, executive vice president and chief technology officer at Infosys. “The tech itself is evolving so fast—companies have to experiment, and some things will fail. There needs to be a safety net.”

To gauge how psychological safety influences success with enterprise-level AI, MIT Technology Review Insights conducted a survey of 500 business leaders. The findings reveal high self-reported levels of psychological safety, but also suggest that fear still has a foothold. Anecdotally, industry experts highlight a reason for the disconnect between rhetoric and reality: while organizations may promote a safe to experiment message publicly, deeper cultural undercurrents can counteract that intent.

Building psychological safety requires a coordinated, systems-level approach, and human resources (HR) alone cannot deliver such transformation. Instead, enterprises must deeply embed psychological safety into their collaboration processes.

Key findings for this report include:

  • Companies with experiment-friendly cultures have greater success with AI projects. The majority of executives surveyed (83%) believe a company culture that prioritizes psychological safety measurably improves the success of AI initiatives. Four in five leaders agree that organizations fostering such safety are more successful at adopting AI, and 84% have observed connections between psychological safety and tangible AI outcomes.
  • Psychological barriers are proving to be greater obstacles to enterprise AI adoption than technological challenges. Encouragingly, nearly three-quarters (73%) of respondents indicated they feel safe to provide honest feedback and express opinions freely in their workplace. Still, a significant share (22%) admit they’ve hesitated to lead an AI project because they might be blamed if it misfires.
  • Achieving psychological safety is a moving target for many organizations. Fewer than half of leaders (39%) rate their organization’s current level of psychological safety as “very high.” Another 48%report a “moderate” degree of it. This may mean that some enterprises are pursuing AI adoption on cultural foundations that are not yet fully stable.

Download the report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

The Top Conversion Barrier in the E.U.

European consumers shop amid strict regulatory transparency and protection. Since its inception in 2018, the E.U. General Data Protection Regulation has imposed fines totaling €5.6 billion ($6.6 billion), raising public awareness of data rights and privacy. Consumers expect clear policies, compliant data handling, straightforward returns, and transparent pricing, especially from an unfamiliar seller.

Lack of trust is the top conversion barrier for non-European merchants.

Expectations

The GDPR requirements on sellers are explicit: the ability of consumers to grant and withdraw consent, and clear explanations on the use of personally identifiable data. These requirements shape shoppers’ expectations.

For example, the E.U.’s Consumer Rights Directive grants shoppers a 14-day right of withdrawal for most online purchases — a de facto baseline for returns across member states.

Under the E.U.’s consumer protection laws, the final price shown to shoppers must include all taxes and fees, including VAT. The Omnibus Directive adds further requirements, such as prices from advertised discounts must not exceed the lowest amount charged in the previous 30 days.

The United States has no comparable federal regulatory equivalent. There’s no nationwide right to withdraw from online purchases, and fewer mandatory disclosures about business identity or tax-inclusive pricing. U.S. shoppers often evaluate trust based on brand familiarity, convenience, and store-specific policies rather than legal guarantees.

Screenshot of Zalando's cookie disclosure, which reads:We’ll tailor your experience Zalando, Lounge by Zalando, and Outlets (referred to as “we”), use cookies and other technologies to keep our websites reliable and secure, to measure their performance, and to deliver a personalised shopping experience and personalised advertising. To do this, we collect information about users, their behaviour, and their devices. If you select “Accept all”, you accept this and agree that we share this information with third parties, such as our marketing partners. This may mean that your data is also processed in the USA and China. If you select “Only essential” we will use only the essential cookies and you will not receive any personalised ads. Select “Set preferences” for further details and to manage your options. You can adjust your preferences at any time. For more information, please read our privacy notice and legal notice. Only essential Set preferences Accept all

Berlin-based Zalando, a fashion retailer and marketplace, lets visitors control their cookie settings.

Reviews: Local and Verified

Customer reviews play a significant role in how European shoppers assess an unfamiliar merchant. Cross-border ecommerce is common, and many consumers buy from retailers they don’t know, increasing the reliance on third-party validation. The Omnibus Directive reinforces this behavior by requiring merchants to disclose whether customer reviews are verified and by prohibiting misleading practices related to authenticity.

For example, shoppers in Germany and other parts of Central Europe rely heavily on Trustpilot and Trusted Shops as indicators of merchant reliability.

All shoppers, notably those in France, prefer reviews in their native language.

The volume of reviews matters, too, especially from unfamiliar merchants — the more reviews, the better, especially when clearly verified and in the buyer’s native language.

Conversely, U.S. consumers purchase with limited review volume when the seller is recognizable or the experience is convenient.

Policy Pages and Disclosures

For European shoppers, credibility often starts in the footer. Before they buy from an unfamiliar merchant, many will scroll to the bottom of the page to check the company behind the site and what rights they have if something goes wrong. That behavior is reinforced by law.

Under the E-Commerce Directive, online sellers and other service providers must disclose specific business information “easily, directly, and permanently accessible.” At a minimum, that includes:

  • Legal name,
  • Physical address,
  • Contact details,
  • Applicable trade or VAT registration numbers.

Several countries go further. Germany, Austria, and Switzerland, for example, require an Impressum — a legal statement — that consolidates this information on a single page.

In the U.S., shoppers typically accept limited company details.

Reliable, speedy contact is also a trust signal. Per the E-Commerce Directive, a website can’t rely solely on an email address; sellers must communicate in a “rapid and effective” channel. Sites that offer no quick method of immediate dialogue raise questions about their reliability.

Payment Security

European shoppers rely on local payment methods. Pay-by-invoice and Klarna’s buy-now-pay-later are common in Germany. iDEAL dominates in the Netherlands, Bancontact is standard in Belgium, and Nordic consumers expect Klarna, MobilePay, or Vipps (pay by phone number).

In parts of Central and Eastern Europe, bank transfers, cash-on-delivery, and marketplace-specific payments remain popular.

For many buyers, familiar payment logos and clear fee transparency are essential to completing a purchase.

U.S. retailers often underestimate E.U. payment methods, launching only with credit cards and PayPal.

In short, ecommerce traction in Europe starts with understanding the trust factors that shape the customer journey. Clear disclosures, verified reviews, familiar payment methods, and compliance with regional standards all contribute to credibility and success.

Apple Safari Update Enables Tracking Two Core Web Vitals Metrics via @sejournal, @martinibuster

Safari 26.2 adds support for measuring Largest Contentful Paint (LCP) and the Event Timing API, which is used to calculate Interaction to Next Paint (INP). This enables site owners to collect Largest Contentful Paint (LCP) and Interaction to Next Paint (INP) data from Safari users through the browser Performance API using their own analytics and real user monitoring tools.

LCP And INP In Apple Safari Browser

LCP is a Core Web Vital and a ranking signal. Interaction To Next Paint (INP), also a Core Web Vitals metric, measures how quickly your website responds to user interactions. Native Safari browser support enables accurate measurement, which closes a long-standing blind spot for performance diagnostics of site visitors using Apple devices.

INP is a particularly critical measurement because it reports on the total time between a user’s action (click, tap, or key press) and the visual update on the screen. It tracks the slowest interaction observed during a user’s visit. INP is important because it enables site owners to know if the page feels “frozen” or laggy for site visitors. Fast INP scores translate to a positive user experience for site visitors who are interacting with the website.

This change will have no effect on public tools like PageSpeed Insights and CrUX data because they are Chrome-based.

However, Safari site visitors can now be included in field performance data where site owners have configured measurement, such as in Google Analytics or other performance monitoring platforms.

The following analytics packages can now be configured to surface these metrics from Safari browser site visitors:

  • Google Analytics (GA4, via Web Vitals or custom event collection)
  • Adobe Analytics
  • Matomo
  • Amplitude (with performance instrumentation)
  • Mixpanel (with custom event pipelines)
  • Custom / In-House Monitoring

Apple Safari’s update also enables Real User Monitoring (RUM) platforms to surface this data for site owners:

  • Akamai mPulse
  • Cloudflare Web Analytics
  • Datadog RUM
  • Dynatrace
  • Elastic Observability (RUM)
  • New Relic Browser
  • Raygun
  • Sentry Performance
  • SpeedCurve
  • Splunk RUM

Apple’s official documentation explains:

“Safari 26.2 adds support for two tools that measure the performance of web applications, Event Timing API and Largest Contentful Paint.

The Event Timing API lets you measure how long it takes for your site to respond to user interactions. When someone clicks a button, types in a field, or taps on a link, the API tracks the full timeline — from the initial input through your event handlers and any DOM updates, all the way to when the browser paints the result on screen. This gives you insight into whether your site feels responsive or sluggish to users. The API reports performance entries for interactions that take longer than a certain threshold, so you can identify which specific events are causing delays. It makes measuring “Interaction to Next Paint” (INP) possible.

Largest Contentful Paint (LCP) measures how long it takes for the largest visible element to appear in the viewport during page load. This is typically your main image, a hero section, or a large block of text — whatever dominates the initial view. LCP gives you a clear signal about when your page feels loaded to users, even if other resources are still downloading in the background.”

Safari 26.2 provides new data that is critical for SEO and for monitoring the user experience, information that site owners rely on. Safari traffic represents a significant share of site visits. These improvements make it possible for site owners to have a more complete view of the real user experience across more devices and browsers.

Why Google’s Spam Problem Is Getting Worse

Spam is back in search. And in a big way.

Honestly, I don’t think Google can handle this at all. The scale is unprecedented. They went after publishers manually with the site reputation abuse update. More expired domain abuse is reaching the top of the SERPs than at any time I can remember in recent history. They’re fighting a losing battle, and they’ve taken their eye off the ball.

In a microcosm, this is what’s happening (Image Credit: Harry Clarkson-Bennett)

A few years ago, search was getting on top of the various spam issues “creative” SEOs were trialling. The prospect of being nerfed by a spam update and Google’s willingness to invest and care in the quality of search seemed to be winning the war. Trying to recover from these penalties is nothing short of disastrous. Just ask anybody hit by the Helpful Content update.

But things have shifted. AI is haphazardly rewriting the rules, and big tech has bigger, more poisonous fish to fry. This is not a great time to be a white hat SEO.

TL;DR

  1. Google is currently losing the war against spam, with unprecedented scale driven by AI-generated slop, and expired domain and PBN abuse.
  2. Google’s spam detection monitors four key groups of signals – content, links, reputational, and behavioral.
  3. Data from the Google Leak suggests its most capable detection focuses on link velocity and anchor text.
  4. AI “search” is dozens of times more expensive than traditional search. This enormous cost and focus on new AI products is leading to underinvestment in core spam-fighting.

How Does Google’s Spam Detection System Work?

Via SpamBrain. Previously, the search giant rolled out PenguinPanda, and RankBrain to make better decisions based on links and keywords.

And right now, badly.

SpamBrain is designed to identify content and websites engaging in spammy activities with apparently “shocking” accuracy. I don’t know whether shocking in this sense is meant in a positive or negative way right now, but I can only parrot what is said.

Over time, the algorithm learns what is and isn’t spam. Once it has clearly established signals associated with spammy sites, it’s able to create a neural network.

Much like the concept of seed sites, if you have the spammiest websites mapped out, you can accurately score everyone else against them. Then you can analyse signals at scale – content, links, behavioral, and reputational signals – to group sites together.

  • Inputs (content, linking reputational and behavioral signals).
  • Hidden layer (clustering and comparing each site to known spam ones).
  • Outputs (spam or not spam).

If your site is bucketed in the same group as obviously spammy sites when it comes to any of the above, that is not a good sign. The algorithm works on thresholds. I imagine you need to sail pretty close to the wind for long enough to get hit by a spam update.

But if your content is relatively thin and low value add, you’re probably halfway there. Add some dangerous links into the mix, some poor business decisions (parasite SEO being the most obvious example), and scaled content abuse, and you’re doomed.

What Type Of Spam Are We Talking About Here?

Google notes the most egregious activities here. We’re talking:

  • Cloaking.
  • Doorway abuse.
  • Expired domain abuse.
  • Hacked content.
  • Hidden text and content.
  • Keyword stuffing.
  • Link spam.
  • Scaled content abuse.
  • Site reputation abuse.
  • Thin affiliate content.
  • UGC spam.

Lots of these are grossly intertwined. Expired domain abuse and PBNs. Keyword stuffing is a little old hat, but link spam is still very much alive and well. Scaled content abuse is at an all-time high across the internet.

The more content you have spread across multiple, semantically similar websites, the more effective you can be. Using exact and partial match anchors to leverage your authority towards “money” pages, the richer you will become.

Let’s dive into the big ones below.

Fake News

Google Discover – Google’s engagement baiting, social network-lite platform – has been hit by the unscrupulous spammers in recent times. There have been several instances of fake, AI-driven content reaching the masses. It’s become so prevalent, it has even reached legacy media sites (woohoo).

Millions of page views have been sent to expired and drop domain abusers (Image Credit: Harry Clarkson-Bennett)

From changing the state pension age to free bus passes and TV licenses, the spammers know the market. They know how to incite emotions. Hell hath no fury like a pensioner scorned, and while you can forgive the odd slip-up, nobody can be this generous.

The people who have been working by the book are being sidelined. But the opportunities in the black hat world are booming. Which is, in fairness, quite fun.

Scaled Content Abuse

At the time of writing, over 50% of the content on the internet is AI slop. Some say more. From nearly a million pages analyzed this year, Ahrefs says 74% contain AI-content. What we see is just what slips through the mammoth-sized cracks.

Not hard to see what the problem is… (Image Credit: Harry Clarkson-Bennett)

According to award-winning journalist Jean-Marc Manach’s research, he has found over 8,300 AI-generated news websites in French and over 300 in English (the tip of the iceberg, trust me).

He estimates two of these site owners have become millionaires.

By leveraging authoritative, expired domains and PBNs (more on that next), SEOs – the people still ruining the internet – know how to game the system. By faking clicks, manipulating engagement signals, and utilizing past link equity effectively.

Expired Domain Abuse

The big daddy. Black hat ground zero.

If you engage even a little bit with a black hat community, you’ll know how easy it is right now to leverage expired domains. In the example below, someone had bought the London Road Safety website (a once highly authoritative domain) and turned it into a single-page “best betting sites not on GamStop” site.

This is just one example of many (Image Credit: Harry Clarkson-Bennett)

Betting and crypto are ground zero for all things black hat, just because there’s so much money involved.

I’m not an expert here, but I believe the process is as follows:

  1. Purchase an expired, valuable domain with a strong, clean backlink history (no manual penalties). Ideally, a few of them.
  2. Then you can begin to create your own PBN with unique hosting providers, nameservers, and IP addresses, with a variety of authoritative, aged, and newer domains.
  3. This domain(s) then becomes your equity/authority stronghold.
  4. Spin up multiple TLD variations of the domain, i.e., instead of .com it becomes .org.uk.
  5. Add a mix of exact and partial match anchors from a PBN to the money site to signal its new focus.
  6. Either add a 301 redirect for a short period of time to the money variation of the domain or canonicalize to the variation.

These scams are always short-term plays. But they can be worth tens of hundreds of thousands of pounds when done well. And they are back, and I believe more valuable than ever.

Right now, I think it’s as simple as buying an old charity domain, adding a quick reskin and voila. A 301 or equity passing tactic and your single page site about ‘best casinos not on gamstop’ is printing money. Even in the English speaking market.

According to notorious black hat fella Charles Floate, some of these companies are laundering hundreds of thousands of pounds a month.

PBNs

A PBN (or Private Blog Network) is a network of websites that someone controls that link back to the money site. The variation of the site designed to generate typically advertising or affiliate revenue.

A private blog network has to be completely unique from each other. They cannot share breadcrumbs that Google can trace. Each site needs a standalone:

  • Hosting provider.
  • IP address.
  • Nameserver.

The reason PBNs are so valuable is you can build up an enormous amount of link equity and falsified topical authority to mitigate risk. Expired domains are risky because they’re expensive, and once they get a penalty, they’re doomed. PBNs spread the risk. Like the head of a Hydra, one dies; another rises up.

Protecting the tier 1 asset (the purchased aged or expired domain) is paramount. Instead of pointing links directly to the money site, you can link to the sites that link to the money site.

This indirectly boosts the value of the money site, protecting it from Google’s prying eyes.

What Does The Google Leak Show About Spam?

As always, this is an inexact science. Barely even pseudo-science really. I’ve got the tinfoil hat on and a lot of string connecting wild snippets of information around the room to make this work. You should follow Shaun Anderson here.

If I take every mention of the word “spam” in the module names and descriptions, there are around 115, once I’ve removed any nonsense. Then we can categorize those into content, links, reputational, and behavioral signals.

Taking it one step further, these modules can be classified as relating to things like link building, anchor text, content quality, et al. This gives us a rough sense of what matters in terms of scale.

Anchor text makes up the lion’s share of spammy modules based on data from the Google Leak (and my own flawed categorization)(Image Credit: Harry Clarkson-Bennett)

A few examples:

  • spambrainTotalDocSpamScore calculates a document’s overall spam score.
  • IndexingDocjoinerAnchorPhraseSpamInfo and IndexingDocjoinerAnchorSpamInfo modules identify spammy anchor phrases by looking at the number, velocity, the days the links were discovered, and the time the spike ended.
  • GeostoreSourceTrustProto helps evaluate the trustworthiness of a source.

Really, the takeaway is how important links are from a spam sense. Particularly, anchor text. The velocity at which you gain links matters. As does the text and surrounding content. Linking seems to be where Google’s algorithm is most capable of identifying red and amber flags.

If your link velocity graph spiked with exact match anchors to highly commercial pages, that’s a flag. Once a site is pinged for this type of content or link-related abuse, the behavioral and reputational signals are analysed as part of SpamBrain.

If these corroborate and your site exceeds certain thresholds, you’re doomed. It’s why this has (until recently) been a relatively fine art.

Ultimately, They’re Just Investing Less In Traditional Search

As Martin McGarry pointed out, they just care a bit less … They have bigger, more hallucinogenic fish to fry.

Image Credit: Harry Clarkson-Bennett

In 2025, we have had four updates, with a duration of c. 70 days. In 2024, we had seven that lasted almost 130 days. Productivity levels we can all aspire to.

It’s Not Hard To Guess Why…

The bleeding-edge search experience is changing. Google is rolling out preferred publisher sources globally and inline linking more effectively in its AI products. Much-needed changes.

I think we’re seeing the real-time moulding of the new search experience in the form of The Google Web Guide. A personalized mix of trusted sources, AI Mode, a more classic search interface, and something inspirational. I suspect this might be a little like a Discover-lite feed. A place in the traditional search interface where content you will almost certainly like is fed to you to keep you engaged.

Unconfirmed, but apparently, Google has added persona-driven recommendation signals and a private publisher entity layer, among other things. Grouping users into cohorts is I believe a fundamental part of Discover. It’s what allows content to go viral.

Once you understand enough about a user to bucket them into specific groups, you can saturate a market over the course of a few days Discover. Less even. But the problem is the economics of it all. Ten blue links are cheap. AI is not. At any level.

According to Google, when someone chooses a preferred source, they click through to that site twice as often on average. So I suspect it’s worth taking seriously.

Why Are AI Searches So Much More Expensive?

Google is going to spend $10 billion more this year than expected due to the growing demand for cloud services. YoY, Google’s CAPEX spend is nearly double 2024’s $52.5 billion.

It’s not just Google. It’s a Silicon Valley race to the bottom.

2025 has been extrapolated, but on course for $92 billion this year (Image Credit: Harry Clarkson-Bennett)

While Google hasn’t released public information on this, it’s no secret that AI searches are significantly more expensive than the classic 10 blue links. Traditional search is largely static and retrieval-based. It relies on pre-indexed pages to serve a list of links and is very cheap to run.

An AI Overview is generative. Google has to run a large language model to summarize and generate a natural language answer. AI Mode is significantly worse. The multi-turn, conversational interface processes the entire dialogue in addition to the new query.

Given the query fan-out technique – where dozens of searches are run in parallel – this process demands significantly more computational power.

Custom chips, efficiencies, and caching can reduce the cost of this. But this is one of Google’s biggest challenges. I suspect exactly why Barry believes AI Mode won’t be the default search experience. I’d be surprised if it isn’t just applied at a search/personalization level, too. There are plenty of branded and navigational searches where this would be an enormous waste of money.

And these guys really love money.

According to The IET, if the population of London (>9.7 million) asked ChatGPT to write a 100-word email this would require 4,874,000 litres of water to cool the servers – equivalent to filling over seven 25m swimming pools

LLMs Already Have A Spam Problem

This is pretty well documented. LLMs seem to be driven at least in part by the sheer volume of mentions in the training data. Everything is ingested and taken as read.

Image Credit: Harry Clarkson-Bennett

When you add a line in your footer describing something you or your business did, it’s taken as read. Spammy, low-quality tactics work more effectively than heavy lifting.

Ideally, we wouldn’t live in a world where low-lift shit outperforms proper marketing efforts. But here we are.

Like in 2012, “best” lists are on the tip of everyone’s tongue. Basic SEO is making a comeback because that’s what is currently working in LLMs. Paid placements, reciprocal link exchanges. You name it.

Image Credit: Harry Clarkson-Bennett

If it’s half-arsed, it’s making a comeback.

As these models rely on Google’s index for searches that the model cannot confidently answer (RAG), Google’s spam engine matters more than ever. In the same way that I think publishers need to take a stand against big tech and AI, Google needs to step up and take this seriously.

I’m Not Sure Anyone Is Going To…

I’m not even sure they want to right now. OpenAI has signed some pretty extraordinary contracts, and its revenue is light-years away from where it needs to be. And Google’s CAPEX expenditure is through the roof.

So, things like quality and accuracy are not at the top of the list. Consumer and investor confidence is not that high. They need to make some money. And private companies can be a bit laissez-faire when it comes to reporting on revenue and profits.

According to HSBC, OpenAI needs to raise at least $207 billion by 2030 so it can continue to lose money. Being described as ‘a money pit with a website on top’ isn’t a great look.

New funding has to be thrown at data centres (Image Credit: Harry Clarkson-Bennett)

Let’s see them post-hoc rationalize their way out of this one. That’s it. Thank you for reading and subscribing to my last update of the year. Certainly been a year.

More Resources:


This post was originally published on Leadership in SEO.


Featured Image: Khaohom Mali/Shutterstock

A new SEO Task list to guide you inside Yoast SEO

Doing SEO well often means knowing what to focus on and when to do so. That is not always easy, especially when you are juggling content, updates, and day-to-day site management. That is why we are introducing a new SEO task list in the Yoast plugin. 
 
The Task List helps you improve your SEO step by step, directly inside your dashboard. It turns best practices into clear, actionable tasks, so you can make progress with confidence and without second-guessing your work. 
 

Why the SEO checklist matters: 

Turn SEO advice into clear actions 
 

Instead of vague recommendations or long documentation, the Task List shows you exactly what to do next. Each item focuses on a crucial SEO fundamental, helping you take meaningful action rather than getting lost in details that don’t move the needle. 
 
This makes SEO more approachable, especially if you are not an expert. You do not need to keep up with every update or technique. The Task List guides you through what matters most. 

Build better SEO habits over time

The Task List is not just about finishing tasks. By following it regularly, you start to recognize patterns and best practices that lead to stronger content and a healthier site. Over time, this helps you build better SEO habits that carry over into everything you publish. 
 
For teams, the Task List also brings consistency. It helps everyone follow the same SEO standards, regardless of skill level or experience. 

SEO guidance where you already work 

Because the Task List lives inside Yoast SEO, you can improve your SEO without switching tools or breaking your workflow. It supports you where the work happens, making SEO a natural part of creating and maintaining your content. 
 
The foundational version of the SEO Task List is available in Yoast SEO, and a more comprehensive list is available for Yoast SEO Premium users.

Eight Overlooked Reasons Why Sites Lose Rankings In Core Updates via @sejournal, @martinibuster

There are multiple reasons why a site can drop in rankings due to a core algorithm update. The reasons may reflect specific changes to the way Google interprets content, a search query, or both. The change could also be subtle, like an infrastructure update that enables finer relevance and quality judgments. Here are eight commonly overlooked reasons for why a site may have lost rankings after a Google core update.

Ranking Where It’s Supposed To Rank?

If the site was previously ranking well and now it doesn’t, it could be what I call “it’s ranking where it’s supposed to rank.” That means that some part of Google’s algorithm has caught up to a loophole that the page was intentionally or accidentally taking advantage of and is currently ranking it where it should have been ranking in the first place.

This is difficult to diagnose because a publisher might believe that the web pages or links were perfect the way they previously were, but in fact there was an issue.

Topic Theming Defines Relevance

A part of the ranking process is determining what the topic of a web page is. Google admitted a year ago that a core topicality system is a part of the ranking process. The concept of topicality as part of the ranking algorithm is real.

The so-called Medic Update of 2018 brought this part of Google’s algorithm into sharp focus. Suddenly, sites that were previously relevant for medical keywords were nowhere to be found because they dealt in folk remedies, not medical ones. What happened was that Google’s understanding of what keyword phrases were about became more topically focused.

Bill Slawski wrote about a Google patent (Website representation vector) that describes a way to classify websites by knowledge domains and expertise levels that sounds like a direct match to what the Medic Update was about.

The patent describes part of what it’s doing:

“The search system can use information for a search query to determine a particular website classification that is most responsive to the search query and select only search results with that particular website classification for a search results page. For example, in response to receipt of a query about a medical condition, the search system may select only websites in the first category, e.g., authored by experts, for a search results page.”

Google’s interpretation of what it means to be relevant became increasingly about topicality in 2018 and continued to be refined in successive updates over the years. Instead of relying on links and keyword similarity, Google introduced a way to identify and classify sites by knowledge domain (the topic) in order to better understand how search queries and content are relevant to each other.

Returning to the medical queries, the reason many sites lost rankings during the Medic Update was that their topics were outside the knowledge domain of medical remedies and science. Sites about folk and alternative healing were permanently locked out of ranking for medical phrases, and no amount of links could ever restore their rankings. The same thing happened across many other topics and continues to affect rankings as Google’s ability to understand the nuances of topical relevance is updated.

Example Of Topical Theming

A way to think of topical theming is to consider that keyword phrases can be themed by topic. For example, the keyword phrase “bomber jacket” is related to both military clothing, flight clothing, and men’s jackets. At the time of writing, Alpha Industries, a manufacturer of military clothing, is ranked number one in Google. Alpha Industries is closely related to military clothing because the company not only focuses on selling military style clothing, it started out as a military contractor producing clothing for America’s military, so it’s closely identified by consumers with military clothing.

Screenshot Showing Topical Theming

Screenshot of SERPs showing how Google interprets a keyword phrase and web pages

So it’s not surprising that Alpha Industries ranks #1 for bomber jacket because it ticks both boxes for the topicality of the phrase Bomber Jacket:

  • Shopping > Military clothing
  • Shopping > Men’s clothing

If your page was previously ranking and now it isn’t, then it’s possible that the topical theme was redefined more sharply. The only way to check this is to review the top ranked sites, focusing, for example, on the differences between ranges such as position one and two, or sometimes positions one through three or positions one through five. The range depends on how the topic is themed. In the example of the Bomber Jacket rankings, positions one through three are themed by “military clothing” and “Men’s clothing.” Position three in my example is held by the Thursday Boot Company, which is themed more closely with “men’s clothing” than it is with military clothing. Perhaps not coincidentally, the Thursday Boot Company is closely identified with men’s fashion.

This is a way to analyze the SERPs to understand why sites are ranking and why others are not.

Topic Personalization

Sometimes the topical themes are not locked into place because user intents can change. In that case, opening a new browser or searching a second time in a different tab might cause Google to change the topical theme to a different topical intent.

In the case of the “bomber jacket” search results, the hierarchy of topical themes can change to:

  • Informational > Article About Bomber Jackets
  • Shopping > Military clothing
  • Shopping > Men’s clothing

The reason for that is directly related to the user’s information need which informs the intent and the correct topic. In the above case it looks like the military clothing theme may be the dominant user intent for this topic but the informational/discovery intent may be a close tie that’s triggered by personalization. This can vary by previous searches but also by geographic location, a user’s device, and even by the time of day.

The takeaway is that there may not be anything wrong with a site. It’s just ranking for a more specific topical intent. So if the topic is getting personalized so that your page no longer ranks, a solution may be to create another page to focus on the additional topic theme that Google is ranking.

Authoritativeness

In one sense, authoritativeness can be seen as an external validation of expertise of a website as a go-to source for a product, service, or content topic. While the expertise of the author contributes to authoritativeness and authoritativeness in a topic can be inherent to a website, ultimately it’s third-party recognition from readers, customers, and other websites (in the form of citations and links) that communicate a website’s authoritativeness back to Google as a validating signal.

The above can be reduced to these four points:

  1. Expertise and topical focus originate within the website.
  2. Authoritativeness is the recognition of that expertise.
  3. Google does not assess that recognition directly.
  4. Third-party signals can validate a site’s authoritativeness.

To that we can add the previously discussed Website Representation Vector patent that shows how Google can identify expertise and authoritativeness.

What’s going on then is that Google selects relevant content and then winnows that down by prioritizing expert content.

Here’s how Google explains how it uses E-E-A-T:

“Google’s automated systems are designed to use many different factors to rank great content. After identifying relevant content, our systems aim to prioritize those that seem most helpful. To do this, they identify a mix of factors that can help determine which content demonstrates aspects of experience, expertise, authoritativeness, and trustworthiness, or what we call E-E-A-T.”

Authoritativeness is not about how often a site publishes about a topic; any spammer can do that. It has to be about more than that. E-E-A-T is a standard to hold your site up to.

Stuck On Page Two Of Search Results? Try Some E-E-A-T

Speaking of E-E-A-T, many SEOs have the mistaken idea that it’s something they can add to websites. That’s not how it works. At the 2025 New York City Search Central Live event, Google’s John Mueller confirmed that E-E-A-T is not something you add to web pages.

He said:

“Sometimes SEOs come to us or like mention that they’ve added EEAT to their web pages. That’s not how it works. Sorry, you can’t sprinkle some experiences on your web pages. It’s like, that doesn’t make any sense.”

Clearly, content reflects qualities of authoritativeness, trustworthiness, expertise, and experience, but it’s not something that you add to content. So what is it?

E-E-A-T is just a standard to hold your site up to. It’s also a subjective judgment made by site visitors. A subjective judgment is like how a sandwich can taste great, with the “great” part being the subjective judgment. It is a matter of opinion.

One thing that is difficult for SEOs to diagnose is when their content is missing that extra something to push their site onto the first page of the SERPs. It can feel unfair to see competitors ranking on the first page of the SERPs even though your content is just as good as theirs.

Those differences indicate that their top-ranked web pages are optimized for people. Another reason is that more people know about them because they have a multimodal approach to content, whereas the site on page two of the SERPs mainly communicates via textual content.

In SERPs where Google prefers to rank government and educational sites for a particular keyword phrase, except for one commercial site, I almost always find evidence that their content and their outreach are resonating with site visitors in ways that the competitor websites do not. Websites that focus on multimodal, people-optimized content and experiences are usually what I find in those weird outlier rankings.

So if your site is stuck on page two, revisit the top-ranked web pages and identify ways that those sites are optimized for people and multimodal content. You may be surprised to see what makes those sites resonate with users.

Temporary Rankings

Some rankings are not made to last. This is the case with a new site or new page ranking boost. Google has a thing where it tastes a new site to see how it fits with the rest of the Internet. A lot of SEOs crow about their client’s new website conquering the SERPs right out of the gate. What you almost never hear about is when those same sites drop out of the SERPs.

This isn’t a bad thing. It’s normal. It simply means that Google has tried the site and now it’s time for the site to earn its place in the SERPs.

There’s Nothing Wrong With The Site?

Many site publishers find it frustrating to be told that there’s nothing wrong with their site even though it lost rankings. What’s going on may be that the site and web page are fine, but that the competitors’ pages are finer. These kinds of issues are typically where the content is fine and the competitors’ content is about the same but is better in small ways.

This is the one form of ranking drop that many SEOs and publishers easily overlook because SEOs generally try to identify what’s “wrong” with a site, and when nothing obvious jumps out at them, they try to find something wrong with the backlinks or something else.

This inability to find something wrong leads to recommendations like filing link disavows to get rid of spam links or removing content to fix perceived but not actual problems (like duplicate content). They’re basically grasping at straws to find something to fix.

But sometimes it’s not that something is wrong with the site. Sometimes it’s just that there’s something right with the competitors.

What can be right with competitors?

  • Links
  • User experience
  • Image content (for example, site visitors are reflected in image content).
  • Multimodal approach
  • Strong outreach to potential customers
  • In-person marketing
  • Cultivate word-of-mouth promotion
  • Better advertising
  • Optimized for people

SEO Secret Sauce: Optimized For People

Optimizing for people is a common blind spot. Optimizing for people is a subset of conversion optimization. Conversion optimization is about subtle signals that indicate a web page contains what the site visitor needs.

Sometimes that need is to be recognized and acknowledged. It can be reassurance that you’re available right now or that the business is trustworthy.

For example, a client’s site featured a badge at the top of the page that said something like “Trusted by over 200 of the Fortune 500.” That badge whispered, “We’re legitimate and trustworthy.”

Another example is how a business identified that most of their site visitors were mothers of boys, so their optimization was to prioritize images of mothers with boys. This subtly recognized the site visitor and confirmed that what’s being offered is for them.

Nobody loves a site because it’s heavily SEO’d, but people do love sites that acknowledge the site visitor in some way. This is the secret sauce that’s invisible to SEO tools but helps sites outrank their competitors.

It may be helpful to avoid mimicking what competitors are doing and increase ways that differentiate the site and outreach in ways that make people like your site more. When I say outreach, I mean actively seeking out places where your typical customer might be hanging out and figuring out how you can make your pitch there. Third-party signals have long been strong ranking factors at Google, and now, with AI Search, what people and other sites say about your site are increasingly playing a role in rankings.

Takeaways

  1. Core updates sometimes correct over-ranking, not punish sites
    Ranking drops sometimes reflect Google closing loopholes and placing pages where they should have ranked all along rather than identifying new problems.
  2. Topical theming has become more precise
    Core updates sometimes make existing algorithms more precise. Google increasingly ranks content based on topical categories and intent, not just keywords or links.
  3. Topical themes can change dynamically
    Search results may shift between informational and commercial themes depending on context such as prior searches, location, device, or time of day.
  4. Authoritativeness is externally validated
    Recognition from users, citations, links, and broader awareness can be the difference why one site ranks and another does not.
  5. SEO does not control E-E-A-T and can’t be reduced to an on-page checklist
    While concepts of expertise and authoritativeness are inherent in content, they’re still objective judgments that can be inferred from external signals, not something that can be directly added to content by SEOs.
  6. Temporary ranking boosts are normal
    New pages and sites are tested briefly, then must earn long-term placement through sustained performance and reception.
  7. Competitors may simply be better for users
    Ranking losses often occur because competitors outperform in subtle but meaningful ways, not because the losing site is broken.
  8. People-first optimization is a competitive advantage
    Sites that resonate emotionally, visually, and practically with visitors often outperform purely SEO-optimized pages.

Ranking changes after a core update sometimes reflect clearer judgments about relevance, authority, and usefulness rather than newly discovered web page flaws. As Google sharpens how it understands topics, pages increasingly compete on how well they align with what users are actually trying to accomplish and which sources people already recognize and trust. The lasting advantage comes from building a site that resonates with actual visitors, earns attention beyond search, and gives Google consistent evidence that users prefer it over alternatives. Marketing, the old-fashioned tell-people-about-a-business approach to promoting it, should not be overlooked.

Featured Image by Shutterstock/Silapavet Konthikamee

AI coding is now everywhere. But not everyone is convinced.

Depending who you ask, AI-powered coding is either giving software developers an unprecedented productivity boost or churning out masses of poorly designed code that saps their attention and sets software projects up for serious long term-maintenance problems.

The problem is right now, it’s not easy to know which is true.

As tech giants pour billions into large language models (LLMs), coding has been touted as the technology’s killer app. Both Microsoft CEO Satya Nadella and Google CEO Sundar Pichai have claimed that around a quarter of their companies’ code is now AI-generated. And in March, Anthropic’s CEO, Dario Amodei, predicted that within six months 90% of all code would be written by AI. It’s an appealing and obvious use case. Code is a form of language, we need lots of it, and it’s expensive to produce manually. It’s also easy to tell if it works—run a program and it’s immediately evident whether it’s functional.


This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.


Executives enamored with the potential to break through human bottlenecks are pushing engineers to lean into an AI-powered future. But after speaking to more than 30 developers, technology executives, analysts, and researchers, MIT Technology Review found that the picture is not as straightforward as it might seem.  

For some developers on the front lines, initial enthusiasm is waning as they bump up against the technology’s limitations. And as a growing body of research suggests that the claimed productivity gains may be illusory, some are questioning whether the emperor is wearing any clothes.

The pace of progress is complicating the picture, though. A steady drumbeat of new model releases mean these tools’ capabilities and quirks are constantly evolving. And their utility often depends on the tasks they are applied to and the organizational structures built around them. All of this leaves developers navigating confusing gaps between expectation and reality. 

Is it the best of times or the worst of times (to channel Dickens) for AI coding? Maybe both.

A fast-moving field

It’s hard to avoid AI coding tools these days. There are a dizzying array of products available, both from model developers like Anthropic, OpenAI, and Google and from companies like Cursor and Windsurf, which wrap these models in polished code-editing software. And according to Stack Overflow’s 2025 Developer Survey, they’re being adopted rapidly, with 65% of developers now using them at least weekly.

AI coding tools first emerged around 2016 but were supercharged with the arrival of LLMs. Early versions functioned as little more than autocomplete for programmers, suggesting what to type next. Today they can analyze entire code bases, edit across files, fix bugs, and even generate documentation explaining how the code works. All this is guided through natural-language prompts via a chat interface.

“Agents”—autonomous LLM-powered coding tools that can take a high-level plan and build entire programs independently—represent the latest frontier in AI coding. This leap was enabled by the latest reasoning models, which can tackle complex problems step by step and, crucially, access external tools to complete tasks. “This is how the model is able to code, as opposed to just talk about coding,” says Boris Cherny, head of Claude Code, Anthropic’s coding agent.

These agents have made impressive progress on software engineering benchmarks—standardized tests that measure model performance. When OpenAI introduced the SWE-bench Verified benchmark in August 2024, offering a way to evaluate agents’ success at fixing real bugs in open-source repositories, the top model solved just 33% of issues. A year later, leading models consistently score above 70%

In February, Andrej Karpathy, a founding member of OpenAI and former director of AI at Tesla, coined the term “vibe coding”—meaning an approach where people describe software in natural language and let AI write, refine, and debug the code. Social media abounds with developers who have bought into this vision, claiming massive productivity boosts.

But while some developers and companies report such productivity gains, the hard evidence is more mixed. Early studies from GitHub, Google, and Microsoft—all vendors of AI tools—found developers completing tasks 20% to 55% faster. But a September report from the consultancy Bain & Company described real-world savings as “unremarkable.”

Data from the developer analytics firm GitClear shows that most engineers are producing roughly 10% more durable code—code that isn’t deleted or rewritten within weeks—since 2022, likely thanks to AI. But that gain has come with sharp declines in several measures of code quality. Stack Overflow’s survey also found trust and positive sentiment toward AI tools falling significantly for the first time. And most provocatively, a July study by the nonprofit research organization Model Evaluation & Threat Research (METR) showed that while experienced developers believed AI made them 20% faster, objective tests showed they were actually 19% slower.

Growing disillusionment

For Mike Judge, principal developer at the software consultancy Substantial, the METR study struck a nerve. He was an enthusiastic early adopter of AI tools, but over time he grew frustrated with their limitations and the modest boost they brought to his productivity. “I was complaining to people because I was like, ‘It’s helping me but I can’t figure out how to make it really help me a lot,’” he says. “I kept feeling like the AI was really dumb, but maybe I could trick it into being smart if I found the right magic incantation.”

When asked by a friend, Judge had estimated the tools were providing a roughly 25% speedup. So when he saw similar estimates attributed to developers in the METR study he decided to test his own. For six weeks, he guessed how long a task would take, flipped a coin to decide whether to use AI or code manually, and timed himself. To his surprise, AI slowed him down by an median of 21%—mirroring the METR results.

This got Judge crunching the numbers. If these tools were really speeding developers up, he reasoned, you should see a massive boom in new apps, website registrations, video games, and projects on GitHub. He spent hours and several hundred dollars analyzing all the publicly available data and found flat lines everywhere.

“Shouldn’t this be going up and to the right?” says Judge. “Where’s the hockey stick on any of these graphs? I thought everybody was so extraordinarily productive.” The obvious conclusion, he says, is that AI tools provide little productivity boost for most developers. 

Developers interviewed by MIT Technology Review generally agree on where AI tools excel: producing “boilerplate code” (reusable chunks of code repeated in multiple places with little modification), writing tests, fixing bugs, and explaining unfamiliar code to new developers. Several noted that AI helps overcome the “blank page problem” by offering an imperfect first stab to get a developer’s creative juices flowing. It can also let nontechnical colleagues quickly prototype software features, easing the load on already overworked engineers.

These tasks can be tedious, and developers are typically  glad to hand them off. But they represent only a small part of an experienced engineer’s workload. For the more complex problems where engineers really earn their bread, many developers told MIT Technology Review, the tools face significant hurdles.

Perhaps the biggest problem is that LLMs can hold only a limited amount of information in their “context window”—essentially their working memory. This means they struggle to parse large code bases and are prone to forgetting what they’re doing on longer tasks. “It gets really nearsighted—it’ll only look at the thing that’s right in front of it,” says Judge. “And if you tell it to do a dozen things, it’ll do 11 of them and just forget that last one.”

DEREK BRAHNEY

LLMs’ myopia can lead to headaches for human coders. While an LLM-generated response to a problem may work in isolation, software is made up of hundreds of interconnected modules. If these aren’t built with consideration for other parts of the software, it can quickly lead to a tangled, inconsistent code base that’s hard for humans to parse and, more important, to maintain.

Developers have traditionally addressed this by following conventions—loosely defined coding guidelines that differ widely between projects and teams. “AI has this overwhelming tendency to not understand what the existing conventions are within a repository,” says Bill Harding, the CEO of GitClear. “And so it is very likely to come up with its own slightly different version of how to solve a problem.”

The models also just get things wrong. Like all LLMs, coding models are prone to “hallucinating”—it’s an issue built into how they work. But because the code they output looks so polished, errors can be difficult to detect, says James Liu, director of software engineering at the advertising technology company Mediaocean. Put all these flaws together, and using these tools can feel a lot like pulling a lever on a one-armed bandit. “Some projects you get a 20x improvement in terms of speed or efficiency,” says Liu. “On other things, it just falls flat on its face, and you spend all this time trying to coax it into granting you the wish that you wanted and it’s just not going to.”

Judge suspects this is why engineers often overestimate productivity gains. “You remember the jackpots. You don’t remember sitting there plugging tokens into the slot machine for two hours,” he says.

And it can be particularly pernicious if the developer is unfamiliar with the task. Judge remembers getting AI to help set up a Microsoft cloud service called an Azure Functions, which he’d never used before. He thought it would take about two hours, but nine hours later he threw in the towel. “It kept leading me down these rabbit holes and I didn’t know enough about the topic to be able to tell it ‘Hey, this is nonsensical,’” he says.

The debt begins to mount up

Developers constantly make trade-offs between speed of development and the maintainability of their code—creating what’s known as “technical debt,” says Geoffrey G. Parker, professor of engineering innovation at Dartmouth College. Each shortcut adds complexity and makes the code base harder to manage, accruing “interest” that must eventually be repaid by restructuring the code. As this debt piles up, adding new features and maintaining the software becomes slower and more difficult.

Accumulating technical debt is inevitable in most projects, but AI tools make it much easier for time-pressured engineers to cut corners, says GitClear’s Harding. And GitClear’s data suggests this is happening at scale. Since 2020, the company has seen a significant rise in the amount of copy-pasted code—an indicator that developers are reusing more code snippets, most likely based on AI suggestions—and an even bigger decline in the amount of code moved from one place to another, which happens when developers clean up their code base.

And as models improve, the code they produce is becoming increasingly verbose and complex, says Tariq Shaukat, CEO of Sonar, which makes tools for checking code quality. This is driving down the number of obvious bugs and security vulnerabilities, he says, but at the cost of increasing the number of “code smells”—harder-to-pinpoint flaws that lead to maintenance problems and technical debt. 

Recent research by Sonar found that these make up more than 90% of the issues found in code generated by leading AI models. “Issues that are easy to spot are disappearing, and what’s left are much more complex issues that take a while to find,” says Shaukat. “That’s what worries us about this space at the moment. You’re almost being lulled into a false sense of security.”

If AI tools make it increasingly difficult to maintain code, that could have significant security implications, says Jessica Ji, a security researcher at Georgetown University. “The harder it is to update things and fix things, the more likely a code base or any given chunk of code is to become insecure over time,” says Ji.

There are also more specific security concerns, she says. Researchers have discovered a worrying class of hallucinations where models reference nonexistent software packages in their code. Attackers can exploit this by creating packages with those names that harbor vulnerabilities, which the model or developer may then unwittingly incorporate into software. 

LLMs are also vulnerable to “data-poisoning attacks,” where hackers seed the publicly available data sets models train on with data that alters the model’s behavior in undesirable ways, such as generating insecure code when triggered by specific phrases. In October, research by Anthropic found that as few as 250 malicious documents can introduce this kind of back door into an LLM regardless of its size.

The converted

Despite these issues, though, there’s probably no turning back. “Odds are that writing every line of code on a keyboard by hand—those days are quickly slipping behind us,” says Kyle Daigle, chief operating officer at the Microsoft-owned code-hosting platform GitHub, which produces a popular AI-powered tool called Copilot (not to be confused with the Microsoft product of the same name).

The Stack Overflow report found that despite growing distrust in the technology, usage has increased rapidly and consistently over the past three years. Erin Yepis, a senior analyst at Stack Overflow, says this suggests that engineers are taking advantage of the tools with a clear-eyed view of the risks. The report also found that frequent users tend to be more enthusiastic and more than half of developers are not using the latest coding agents, perhaps explaining why many remain underwhelmed by the technology.

Those latest tools can be a revelation. Trevor Dilley, CTO at the software development agency Twenty20 Ideas, says he had found some value in AI editors’ autocomplete functions, but when he tried anything more complex it would “fail catastrophically.” Then in March, while on vacation with his family, he set the newly released Claude Code to work on one of his hobby projects. It completed a four-hour task in two minutes, and the code was better than what he would have written.

“I was like, Whoa,” he says. “That, for me, was the moment, really. There’s no going back from here.” Dilley has since cofounded a startup called DevSwarm, which is creating software that can marshal multiple agents to work in parallel on a piece of software.

The challenge, says Armin Ronacher, a prominent open-source developer, is that the learning curve for these tools is shallow but long. Until March he’d remained unimpressed by AI tools, but after leaving his job at the software company Sentry in April to launch a startup, he started experimenting with agents. “I basically spent a lot of months doing nothing but this,” he says. “Now, 90% of the code that I write is AI-generated.”

Getting to that point involved extensive trial and error, to figure out which problems tend to trip the tools up and which they can handle efficiently. Today’s models can tackle most coding tasks with the right guardrails, says Ronacher, but these can be very task and project specific.

To get the most out of these tools, developers must surrender control over individual lines of code and focus on the overall software architecture, says Nico Westerdale, chief technology officer at the veterinary staffing company IndeVets. He recently built a data science platform 100,000 lines of code long almost exclusively by prompting models rather than writing the code himself.

Westerdale’s process starts with an extended conversation with the modelagent to develop a detailed plan for what to build and how. He then guides it through each step. It rarely gets things right on the first try and needs constant wrangling, but if you force it to stick to well-defined design patterns, the models can produce high-quality, easily maintainable code, says Westerdale. He reviews every line, and the code is as good as anything he’s ever produced, he says: “I’ve just found it absolutely revolutionary,. It’s also frustrating, difficult, a different way of thinking, and we’re only just getting used to it.”

But while individual developers are learning how to use these tools effectively, getting consistent results across a large engineering team is significantly harder. AI tools amplify both the good and bad aspects of your engineering culture, says Ryan J. Salva, senior director of product management at Google. With strong processes, clear coding patterns, and well-defined best practices, these tools can shine. 

DEREK BRAHNEY

But if your development process is disorganized, they’ll only magnify the problems. It’s also essential to codify that institutional knowledge so the models can draw on it effectively. “A lot of work needs to be done to help build up context and get the tribal knowledge out of our heads,” he says.

The cryptocurrency exchange Coinbase has been vocal about its adoption of AI tools. CEO Brian Armstrong made headlines in August when he revealed that the company had fired staff unwilling to adopt AI tools. But Coinbase’s head of platform, Rob Witoff, tells MIT Technology Review that while they’ve seen massive productivity gains in some areas, the impact has been patchy. For simpler tasks like restructuring the code base and writing tests, AI-powered workflows have achieved speedups of up to 90%. But gains are more modest for other tasks, and the disruption caused by overhauling existing processes often counteracts the increased coding speed, says Witoff.

One factor is that AI tools let junior developers produce far more code,. As in almost all engineering teams, this code has to be reviewed by others, normally more senior developers, to catch bugs and ensure it meets quality standards. But the sheer volume of code now being churned out i whichs quickly saturatinges the ability of midlevel staff to review changes. “This is the cycle we’re going through almost every month, where we automate a new thing lower down in the stack, which brings more pressure higher up in the stack,” he says. “Then we’re looking at applying automation to that higher-up piece.”

Developers also spend only 20% to 40% of their time coding, says Jue Wang, a partner at Bain, so even a significant speedup there often translates to more modest overall gains. Developers spend the rest of their time analyzing software problems and dealing with customer feedback, product strategy, and administrative tasks. To get significant efficiency boosts, companies may need to apply generative AI to all these other processes too, says Jue, and that is still in the works.

Rapid evolution

Programming with agents is a dramatic departure from previous working practices, though, so it’s not surprising companies are facing some teething issues. These are also very new products that are changing by the day. “Every couple months the model improves, and there’s a big step change in the model’s coding capabilities and you have to get recalibrated,” says Anthropic’s Cherny.

For example, in June Anthropic introduced a built-in planning mode to Claude; it has since been replicated by other providers. In October, the company also enabled Claude to ask users questions when it needs more context or faces multiple possible solutions, which Cherny says helps it avoid the tendency to simply assume which path is the best way forward.

Most significant, Anthropic has added features that make Claude better at managing its own context. When it nears the limits of its working memory, it summarizes key details and uses them to start a new context window, effectively giving it an “infinite” one, says Cherny. Claude can also invoke sub-agents to work on smaller tasks, so it no longer has to hold all aspects of the project in its own head. The company claims that its latest model, Claude 4.5 Sonnet, can now code autonomously for more than 30 hours without major performance degradation.

Novel approaches to software development could also sidestep coding agents’ other flaws. MIT professor Max Tegmark has introduced something he calls “vericoding,” which could allow agents to produce entirely bug-free code from a natural-language description. It builds on an approach known as “formal verification,” where developers create a mathematical model of their software that can prove incontrovertibly that it functions correctly. This approach is used in high-stakes areas like flight-control systems and cryptographic libraries, but it remains costly and time-consuming, limiting its broader use.

Rapid improvements in LLMs’ mathematical capabilities have opened up the tantalizing possibility of models that produce not only software but the mathematical proof that it’s bug free, says Tegmark. “You just give the specification, and the AI comes back with provably correct code,” he says. “You don’t have to touch the code. You don’t even have to ever look at the code.”

When tested on about 2,000 vericoding problems in Dafny—a language designed for formal verification—the best LLMs solved over 60%, according to non-peer-reviewed research by Tegmark’s group. This was achieved with off-the-shelf LLMs, and Tegmark expects that training specifically for vericoding could improve scores rapidly.

And counterintuitively, Tthe speed at which AI generates code could actuallylso ease maintainability concerns. Alex Worden, principal engineer at the business software giant Intuit, notes that maintenance is often difficult because engineers reuse components across projects, creating a tangle of dependencies where one change triggers cascading effects across the code base. Reusing code used to save developers time, but in a world where AI can produce hundreds of lines of code in seconds, that imperative has gone, says Worden.

Instead, he advocates for “disposable code,” where each component is generated independently by AI without regard for whether it follows design patterns or conventions. They are then connected via APIs—sets of rules that let components request information or services from each other. Each component’s inner workings are not dependent on other parts of the code base, making it possible to rip them out and replace them without wider impact, says Worden. 

“The industry is still concerned about humans maintaining AI-generated code,” he says. “I question how long humans will look at or care about code.”

A narrowing talent pipeline

For the foreseeable future, though, humans will still need to understand and maintain the code that underpins their projects. And one of the most pernicious side effects of AI tools may be a shrinking pool of people capable of doing so. 

Early evidence suggests that fears around the job-destroying effects of AI may be justified. A recent Stanford University study found that employment among software developers aged 22 to 25 fell nearly 20% between 2022 and 2025, coinciding with the rise of AI-powered coding tools.

Experienced developers could face difficulties too. Luciano Nooijen, an engineer at the video-game infrastructure developer Companion Group, used AI tools heavily in his day job, where they were provided for free. But when he began a side project without access to those tools, he found himself struggling with tasks that previously came naturally. “I was feeling so stupid because things that used to be instinct became manual, sometimes even cumbersome,” says Nooijen.

Just as athletes still perform basic drills, he thinks the only way to maintain an instinct for coding is to regularly practice the grunt work. That’s why he’s largely abandoned AI tools, though he admits that deeper motivations are also at play. 

Part of the reason Nooijen and other developers MIT Technology Review spoke to are pushing back against AI tools is a sense that they are hollowing out the parts of their jobs that they love. “I got into software engineering because I like working with computers. I like making machines do things that I want,” Nooijen says. “It’s just not fun sitting there with my work being done for me.”

A brief history of Sam Altman’s hype

Each time you’ve heard a borderline outlandish idea of what AI will be capable of, it often turns out that Sam Altman was, if not the first to articulate it, at least the most persuasive and influential voice behind it. 

For more than a decade he has been known in Silicon Valley as a world-class fundraiser and persuader. OpenAI’s early releases around 2020 set the stage for a mania around large language models, and the launch of ChatGPT in November 2022 granted Altman a world stage on which to present his new thesis: that these models mirror human intelligence and could swing the doors open to a healthier and wealthier techno-utopia.


This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.


Throughout, Altman’s words have set the agenda. He has framed a prospective superintelligent AI as either humanistic or catastrophic, depending on what effect he was hoping to create, what he was raising money for, or which tech giant seemed like his most formidable competitor at the moment. 

Examining Altman’s statements over the years reveals just how much his outlook has powered today’s AI boom. Even among Silicon Valley’s many hypesters, he’s been especially willing to speak about open questions—whether large language models contain the ingredients of human thought, whether language can also produce intelligence—as if they were already answered. 

What he says about AI is rarely provable when he says it, but it persuades us of one thing: This road we’re on with AI can go somewhere either great or terrifying, and OpenAI will need epic sums to steer it toward the right destination. In this sense, he is the ultimate hype man.

To understand how his voice has shaped our understanding of what AI can do, we read almost everything he’s ever said about the technology (we requested an interview with Altman, but he was not made available). 

His own words trace how we arrived here.

In conclusion … 

Altman didn’t dupe the world. OpenAI has ushered in a genuine tech revolution, with increasingly impressive language models that have attracted millions of users. Even skeptics would concede that LLMs’ conversational ability is astonishing.

But Altman’s hype has always hinged less on today’s capabilities than on a philosophical tomorrow—an outlook that quite handily doubles as a case for more capital and friendlier regulation. Long before large language models existed, he was imagining an AI powerful enough to require wealth redistribution, just as he imagined humanity colonizing other planets. Again and again, promises of a destination—abundance, superintelligence, a healthier and wealthier world—have come first, and the evidence second. 

Even if LLMs eventually hit a wall, there’s little reason to think his faith in a techno-utopian future will falter. The vision was never really about the particulars of the current model anyway. 

The AI doomers feel undeterred

It’s a weird time to be an AI doomer.

This small but influential community of researchers, scientists, and policy experts believes, in the simplest terms, that AI could get so good it could be bad—very, very bad—for humanity. Though many of these people would be more likely to describe themselves as advocates for AI safety than as literal doomsayers, they warn that AI poses an existential risk to humanity. They argue that absent more regulation, the industry could hurtle toward systems it can’t control. They commonly expect such systems to follow the creation of artificial general intelligence (AGI), a slippery concept generally understood as technology that can do whatever humans can do, and better. 


This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.


Though this is far from a universally shared perspective in the AI field, the doomer crowd has had some notable success over the past several years: helping shape AI policy coming from the Biden administration, organizing prominent calls for international “red lines” to prevent AI risks, and getting a bigger (and more influential) megaphone as some of its adherents win science’s most prestigious awards.

But a number of developments over the past six months have put them on the back foot. Talk of an AI bubble has overwhelmed the discourse as tech companies continue to invest in multiple Manhattan Projects’ worth of data centers without any certainty that future demand will match what they’re building. 

And then there was the August release of OpenAI’s latest foundation model, GPT-5, which proved something of a letdown. Maybe that was inevitable, since it was the most hyped AI release of all time; OpenAI CEO Sam Altman had boasted that GPT-5 felt “like a PhD-level expert” in every topic and told the podcaster Theo Von that the model was so good, it had made him feel “useless relative to the AI.” 

Many expected GPT-5 to be a big step toward AGI, but whatever progress the model may have made was overshadowed by a string of technical bugs and the company’s mystifying, quickly reversed decision to shut off access to every old OpenAI model without warning. And while the new model achieved state-of-the-art benchmark scores, many people felt, perhaps unfairly, that in day-to-day use GPT-5 was a step backward

All this would seem to threaten some of the very foundations of the doomers’ case. In turn, a competing camp of AI accelerationists, who fear AI is actually not moving fast enough and that the industry is constantly at risk of being smothered by overregulation, is seeing a fresh chance to change how we approach AI safety (or, maybe more accurately, how we don’t). 

This is particularly true of the industry types who’ve decamped to Washington: “The Doomer narratives were wrong,” declared David Sacks, the longtime venture capitalist turned Trump administration AI czar. “This notion of imminent AGI has been a distraction and harmful and now effectively proven wrong,” echoed the White House’s senior policy advisor for AI and tech investor Sriram Krishnan. (Sacks and Krishnan did not reply to requests for comment.) 

(There is, of course, another camp in the AI safety debate: the group of researchers and advocates commonly associated with the label “AI ethics.” Though they also favor regulation, they tend to think the speed of AI progress has been overstated and have often written off AGI as a sci-fi story or a scam that distracts us from the technology’s immediate threats. But any potential doomer demise wouldn’t exactly give them the same opening the accelerationists are seeing.)

So where does this leave the doomers? As part of our Hype Correction package, we decided to ask some of the movement’s biggest names to see if the recent setbacks and general vibe shift had altered their views. Are they angry that policymakers no longer seem to heed their threats? Are they quietly adjusting their timelines for the apocalypse? 

Recent interviews with 20 people who study or advocate AI safety and governance—including Nobel Prize winner Geoffrey Hinton, Turing Prize winner Yoshua Bengio, and high-profile experts like former OpenAI board member Helen Toner—reveal that rather than feeling chastened or lost in the wilderness, they’re still deeply committed to their cause, believing that AGI remains not just possible but incredibly dangerous.

At the same time, they seem to be grappling with a near contradiction. While they’re somewhat relieved that recent developments suggest AGI is further out than they previously thought (“Thank God we have more time,” says AI researcher Jeffrey Ladish), they also feel frustrated that some people in power are pushing policy against their cause (Daniel Kokotajlo, lead author of a cautionary forecast called “AI 2027,” says “AI policy seems to be getting worse” and calls the Sacks and Krishnan tweets “deranged and/or dishonest.”)

Broadly speaking, these experts see the talk of an AI bubble as no more than a speed bump, and disappointment in GPT-5 as more distracting than illuminating. They still generally favor more robust regulation and worry that progress on policy—the implementation of the EU AI Act; the passage of the first major American AI safety bill, California’s SB 53; and new interest in AGI risk from some members of Congress—has become vulnerable as Washington overreacts to what doomers see as short-term failures to live up to the hype. 

Some were also eager to correct what they see as the most persistent misconceptions about the doomer world. Though their critics routinely mock them for predicting that AGI is right around the corner, they claim that’s never been an essential part of their case: It “isn’t about imminence,” says Berkeley professor Stuart Russell, the author of Human Compatible: Artificial Intelligence and the Problem of Control. Most people I spoke with say their timelines to dangerous systems have actually lengthened slightly in the last year—an important change given how quickly the policy and technical landscapes can shift. 

“If someone said there’s a four-mile-diameter asteroid that’s going to hit the Earth in 2067, we wouldn’t say, ‘Remind me in 2066 and we’ll think about it.’”

Many of them, in fact, emphasize the importance of changing timelines. And even if they are just a tad longer now, Toner tells me that one big-picture story of the ChatGPT era is the dramatic compression of these estimates across the AI world. For a long while, she says, AGI was expected in many decades. Now, for the most part, the predicted arrival is sometime in the next few years to 20 years. So even if we have a little bit more time, she (and many of her peers) continue to see AI safety as incredibly, vitally urgent. She tells me that if AGI were possible anytime in even the next 30 years, “It’s a huge fucking deal. We should have a lot of people working on this.”

So despite the precarious moment doomers find themselves in, their bottom line remains that no matter when AGI is coming (and, again, they say it’s very likely coming), the world is far from ready. 

Maybe you agree. Or maybe you may think this future is far from guaranteed. Or that it’s the stuff of science fiction. You may even think AGI is a great big conspiracy theory. You’re not alone, of course—this topic is polarizing. But whatever you think about the doomer mindset, there’s no getting around the fact that certain people in this world have a lot of influence. So here are some of the most prominent people in the space, reflecting on this moment in their own words. 

Interviews have been edited and condensed for length and clarity. 


The Nobel laureate who’s not sure what’s coming

Geoffrey Hinton, winner of the Turing Award and the Nobel Prize in physics for pioneering deep learning

The biggest change in the last few years is that there are people who are hard to dismiss who are saying this stuff is dangerous. Like, [former Google CEO] Eric Schmidt, for example, really recognized this stuff could be really dangerous. He and I were in China recently talking to someone on the Politburo, the party secretary of Shanghai, to make sure he really understood—and he did. I think in China, the leadership understands AI and its dangers much better because many of them are engineers.

I’ve been focused on the longer-term threat: When AIs get more intelligent than us, can we really expect that humans will remain in control or even relevant? But I don’t think anything is inevitable. There’s huge uncertainty on everything. We’ve never been here before. Anybody who’s confident they know what’s going to happen seems silly to me. I think this is very unlikely but maybe it’ll turn out that all the people saying AI is way overhyped are correct. Maybe it’ll turn out that we can’t get much further than the current chatbots—we hit a wall due to limited data. I don’t believe that. I think that’s unlikely, but it’s possible. 

I also don’t believe people like Eliezer Yudkowsky, who say if anybody builds it, we’re all going to die. We don’t know that. 

But if you go on the balance of the evidence, I think it’s fair to say that most experts who know a lot about AI believe it’s very probable that we’ll have superintelligence within the next 20 years. [Google DeepMind CEO] Demis Hassabis says maybe 10 years. Even [prominent AI skeptic] Gary Marcus would probably say, “Well, if you guys make a hybrid system with good old-fashioned symbolic logic … maybe that’ll be superintelligent.” [Editor’s note: In September, Marcus predicted AGI would arrive between 2033 and 2040.]

And I don’t think anybody believes progress will stall at AGI. I think more or less everybody believes a few years after AGI, we’ll have superintelligence, because the AGI will be better than us at building AI.

So while I think it’s clear that the winds are getting more difficult, simultaneously, people are putting in many more resources [into developing advanced AI]. I think progress will continue just because there’s many more resources going in.

The deep learning pioneer who wishes he’d seen the risks sooner

Yoshua Bengio, winner of the Turing Award, chair of the International AI Safety Report, and founder of LawZero

Some people thought that GPT-5 meant we had hit a wall, but that isn’t quite what you see in the scientific data and trends.

There have been people overselling the idea that AGI is tomorrow morning, which commercially could make sense. But if you look at the various benchmarks, GPT-5 is just where you would expect the models at that point in time to be. By the way, it’s not just GPT-5, it’s Claude and Google models, too. In some areas where AI systems weren’t very good, like Humanity’s Last Exam or FrontierMath, they’re getting much better scores now than they were at the beginning of the year.

At the same time, the overall landscape for AI governance and safety is not good. There’s a strong force pushing against regulation. It’s like climate change. We can put our head in the sand and hope it’s going to be fine, but it doesn’t really deal with the issue.

The biggest disconnect with policymakers is a misunderstanding of the scale of change that is likely to happen if the trend of AI progress continues. A lot of people in business and governments simply think of AI as just another technology that’s going to be economically very powerful. They don’t understand how much it might change the world if trends continue, and we approach human-level AI. 

Like many people, I had been blinding myself to the potential risks to some extent. I should have seen it coming much earlier. But it’s human. You’re excited about your work and you want to see the good side of it. That makes us a little bit biased in not really paying attention to the bad things that could happen.

Even a small chance—like 1% or 0.1%—of creating an accident where billions of people die is not acceptable. 

The AI veteran who believes AI is progressing—but not fast enough to prevent the bubble from bursting

Stuart Russell, distinguished professor of computer science, University of California, Berkeley, and author of Human Compatible

I hope the idea that talking about existential risk makes you a “doomer” or is “science fiction” comes to be seen as fringe, given that most leading AI researchers and most leading AI CEOs take it seriously. 

There have been claims that AI could never pass a Turing test, or you could never have a system that uses natural language fluently, or one that could parallel-park a car. All these claims just end up getting disproved by progress.

People are spending trillions of dollars to make superhuman AI happen. I think they need some new ideas, but there’s a significant chance they will come up with them, because many significant new ideas have happened in the last few years. 

My fairly consistent estimate for the last 12 months has been that there’s a 75% chance that those breakthroughs are not going to happen in time to rescue the industry from the bursting of the bubble. Because the investments are consistent with a prediction that we’re going to have much better AI that will deliver much more value to real customers. But if those predictions don’t come true, then there’ll be a lot of blood on the floor in the stock markets.

However, the safety case isn’t about imminence. It’s about the fact that we still don’t have a solution to the control problem. If someone said there’s a four-mile-diameter asteroid that’s going to hit the Earth in 2067, we wouldn’t say, “Remind me in 2066 and we’ll think about it.” We don’t know how long it takes to develop the technology needed to control superintelligent AI.

Looking at precedents, the acceptable level of risk for a nuclear plant melting down is about one in a million per year. Extinction is much worse than that. So maybe set the acceptable risk at one in a billion. But the companies are saying it’s something like one in five. They don’t know how to make it acceptable. And that’s a problem.

The professor trying to set the narrative straight on AI safety

David Krueger, assistant professor in machine learning at the University of Montreal and Yoshua Bengio’s Mila Institute, and founder of Evitable

I think people definitely overcorrected in their response to GPT-5. But there was hype. My recollection was that there were multiple statements from CEOs at various levels of explicitness who basically said that by the end of 2025, we’re going to have an automated drop-in replacement remote worker. But it seems like it’s been underwhelming, with agents just not really being there yet.

I’ve been surprised how much these narratives predicting AGI in 2027 capture the public attention. When 2027 comes around, if things still look pretty normal, I think people are going to feel like the whole worldview has been falsified. And it’s really annoying how often when I’m talking to people about AI safety, they assume that I think we have really short timelines to dangerous systems, or that I think LLMs or deep learning are going to give us AGI. They ascribe all these extra assumptions to me that aren’t necessary to make the case. 

I’d expect we need decades for the international coordination problem. So even if dangerous AI is decades off, it’s already urgent. That point seems really lost on a lot of people. There’s this idea of “Let’s wait until we have a really dangerous system and then start governing it.” Man, that is way too late.

I still think people in the safety community tend to work behind the scenes, with people in power, not really with civil society. It gives ammunition to people who say it’s all just a scam or insider lobbying. That’s not to say that there’s no truth to these narratives, but the underlying risk is still real. We need more public awareness and a broad base of support to have an effective response.

If you actually believe there’s a 10% chance of doom in the next 10 years—which I think a reasonable person should, if they take a close look—then the first thing you think is: “Why are we doing this? This is crazy.” That’s just a very reasonable response once you buy the premise.

The governance expert worried about AI safety’s credibility

Helen Toner, acting executive director of Georgetown University’s Center for Security and Emerging Technology and former OpenAI board member

When I got into the space, AI safety was more of a set of philosophical ideas. Today, it’s a thriving set of subfields of machine learning, filling in the gulf between some of the more “out there” concerns about AI scheming, deception, or power-seeking and real concrete systems we can test and play with. 

“I worry that some aggressive AGI timeline estimates from some AI safety people are setting them up for a boy-who-cried-wolf moment.”

AI governance is improving slowly. If we have lots of time to adapt and governance can keep improving slowly, I feel not bad. If we don’t have much time, then we’re probably moving too slow.

I think GPT-5 is generally seen as a disappointment in DC. There’s a pretty polarized conversation around: Are we going to have AGI and superintelligence in the next few years? Or is AI actually just totally all hype and useless and a bubble? The pendulum had maybe swung too far toward “We’re going to have super-capable systems very, very soon.” And so now it’s swinging back toward “It’s all hype.”

I worry that some aggressive AGI timeline estimates from some AI safety people are setting them up for a boy-who-cried-wolf moment. When the predictions about AGI coming in 2027 don’t come true, people will say, “Look at all these people who made fools of themselves. You should never listen to them again.” That’s not the intellectually honest response, if maybe they later changed their mind, or their take was that they only thought it was 20 percent likely and they thought that was still worth paying attention to. I think that shouldn’t be disqualifying for people to listen to you later, but I do worry it will be a big credibility hit. And that’s applying to people who are very concerned about AI safety and never said anything about very short timelines.

The AI security researcher who now believes AGI is further out—and is grateful

Jeffrey Ladish, executive director at Palisade Research

In the last year, two big things updated my AGI timelines. 

First, the lack of high-quality data turned out to be a bigger problem than I expected. 

Second, the first “reasoning” model, OpenAI’s o1 in September 2024, showed reinforcement learning scaling was more effective than I thought it would be. And then months later, you see the o1 to o3 scale-up and you see pretty crazy impressive performance in math and coding and science—domains where it’s easier to sort of verify the results. But while we’re seeing continued progress, it could have been much faster.

All of this bumps up my median estimate to the start of fully automated AI research and development from three years to maybe five or six years. But those are kind of made up numbers. It’s hard. I want to caveat all this with, like, “Man, it’s just really hard to do forecasting here.”

Thank God we have more time. We have a possibly very brief window of opportunity to really try to understand these systems before they are capable and strategic enough to pose a real threat to our ability to control them.

But it’s scary to see people think that we’re not making progress anymore when that’s clearly not true. I just know it’s not true because I use the models. One of the downsides of the way AI is progressing is that how fast it’s moving is becoming less legible to normal people. 

Now, this is not true in some domains—like, look at Sora 2. It is so obvious to anyone who looks at it that Sora 2 is vastly better than what came before. But if you ask GPT-4 and GPT-5 why the sky is blue, they’ll give you basically the same answer. It is the correct answer. It’s already saturated the ability to tell you why the sky is blue. So the people who I expect to most understand AI progress right now are the people who are actually building with AIs or using AIs on very difficult scientific problems.

The AGI forecaster who saw the critics coming

Daniel Kokotajlo, executive director of the AI Futures Project; an OpenAI whistleblower; and lead author of “AI 2027,” a vivid scenario where—starting in 2027—AIs progress from “superhuman coders” to “wildly superintelligent” systems in the span of months

AI policy seems to be getting worse, like the “Pro-AI” super PAC [launched earlier this year by executives from OpenAI and Andreessen Horowitz to lobby for a deregulatory agenda], and the deranged and/or dishonest tweets from Sriram Krishnan and David Sacks. AI safety research is progressing at the usual pace, which is excitingly rapid compared to most fields, but slow compared to how fast it needs to be.

We said on the first page of “AI 2027” that our timelines were somewhat longer than 2027. So even when we launched AI 2027, we expected there to be a bunch of critics in 2028 triumphantly saying we’ve been discredited, like the tweets from Sacks and Krishnan. But we thought, and continue to think, that the intelligence explosion will probably happen sometime in the next five to 10 years, and that when it does, people will remember our scenario and realize it was closer to the truth than anything else available in 2025. 

Predicting the future is hard, but it’s valuable to try; people should aim to communicate their uncertainty about the future in a way that is specific and falsifiable. This is what we’ve done and very few others have done. Our critics mostly haven’t made predictions of their own and often exaggerate and mischaracterize our views. They say our timelines are shorter than they are or ever were, or they say we are more confident than we are or were.

I feel pretty good about having longer timelines to AGI. It feels like I just got a better prognosis from my doctor. The situation is still basically the same, though.

This story has been updated to clarify some of Kokotajlo’s views on AI policy.

Garrison Lovely is a freelance journalist and the author of Obsolete, an online publication and forthcoming book on the discourse, economics, and geopolitics of the race to build machine superintelligence (out spring 2026). His writing on AI has appeared in the New York Times, Nature, Bloomberg, Time, the Guardian, The Verge, and elsewhere.

The great AI hype correction of 2025

Some disillusionment was inevitable. When OpenAI released a free web app called ChatGPT in late 2022, it changed the course of an entire industry—and several world economies. Millions of people started talking to their computers, and their computers started talking back. We were enchanted, and we expected more.

We got it. Technology companies scrambled to stay ahead, putting out rival products that outdid one another with each new release: voice, images, video. With nonstop one-upmanship, AI companies have presented each new product drop as a major breakthrough, reinforcing a widespread faith that this technology would just keep getting better. Boosters told us that progress was exponential. They posted charts plotting how far we’d come since last year’s models: Look how the line goes up! Generative AI could do anything, it seemed.

Well, 2025 has been a year of reckoning. 


This story is part of MIT Technology Review’s Hype Correction package, a series that resets expectations about what AI is, what it makes possible, and where we go next.


For a start, the heads of the top AI companies made promises they couldn’t keep. They told us that generative AI would replace the white-collar workforce, bring about an age of abundance, make scientific discoveries, and help find new cures for disease. FOMO across the world’s economies, at least in the Global North, made CEOs tear up their playbooks and try to get in on the action.

That’s when the shine started to come off. Though the technology may have been billed as a universal multitool that could revamp outdated business processes and cut costs, a number of studies published this year suggest that firms are failing to make the AI pixie dust work its magic. Surveys and trackers from a range of sources, including the US Census Bureau and Stanford University, have found that business uptake of AI tools is stalling. And when the tools do get tried out, many projects stay stuck in the pilot stage. Without broad buy-in across the economy it is not clear how the big AI companies will ever recoup the incredible amounts they’ve already spent in this race. 

At the same time, updates to the core technology are no longer the step changes they once were.

The highest-profile example of this was the botched launch of GPT-5 in August. Here was OpenAI, the firm that had ignited (and to a large extent sustained) the current boom, set to release a brand-new generation of its technology. OpenAI had been hyping GPT-5 for months: “PhD-level expert in anything,” CEO Sam Altman crowed. On another occasion Altman posted, without comment, an image of the Death Star from Star Wars, which OpenAI stans took to be a symbol of ultimate power: Coming soon! Expectations were huge.

And yet, when it landed, GPT-5 seemed to be—more of the same? What followed was the biggest vibe shift since ChatGPT first appeared three years ago. “The era of boundary-breaking advancements is over,” Yannic Kilcher, an AI researcher and popular YouTuber, announced in a video posted two days after GPT-5 came out: “AGI is not coming. It seems very much that we’re in the Samsung Galaxy era of LLMs.”

A lot of people (me included) have made the analogy with phones. For a decade or so, smartphones were the most exciting consumer tech in the world. Today, new products drop from Apple or Samsung with little fanfare. While superfans pore over small upgrades, to most people this year’s iPhone now looks and feels a lot like last year’s iPhone. Is that where we are with generative AI? And is it a problem? Sure, smartphones have become the new normal. But they changed the way the world works, too.

To be clear, the last few years have been filled with genuine “Wow” moments, from the stunning leaps in the quality of video generation models to the problem-solving chops of so-called reasoning models to the world-class competition wins of the latest coding and math models. But this remarkable technology is only a few years old, and in many ways it is still experimental. Its successes come with big caveats.

Perhaps we need to readjust our expectations.

The big reset

Let’s be careful here: The pendulum from hype to anti-hype can swing too far. It would be rash to dismiss this technology just because it has been oversold. The knee-jerk response when AI fails to live up to its hype is to say that progress has hit a wall. But that misunderstands how research and innovation in tech work. Progress has always moved in fits and starts. There are ways over, around, and under walls.

Take a step back from the GPT-5 launch. It came hot on the heels of a series of remarkable models that OpenAI had shipped in the previous months, including o1 and o3 (first-of-their-kind reasoning models that introduced the industry to a whole new paradigm) and Sora 2, which raised the bar for video generation once again. That doesn’t sound like hitting a wall to me.

AI is really good! Look at Nano Banana Pro, the new image generation model from Google DeepMind that can turn a book chapter into an infographic, and much more. It’s just there—for free—on your phone.

And yet you can’t help but wonder: When the wow factor is gone, what’s left? How will we view this technology a year or five from now? Will we think it was worth the colossal costs, both financial and environmental? 

With that in mind, here are four ways to think about the state of AI at the end of 2025: The start of a much-needed hype correction.

01: LLMs are not everything

In some ways, it is the hype around large language models, not AI as a whole, that needs correcting. It has become obvious that LLMs are not the doorway to artificial general intelligence, or AGI, a hypothetical technology that some insist will one day be able to do any (cognitive) task a human can.

Even an AGI evangelist like Ilya Sutskever, chief scientist and cofounder at the AI startup Safe Superintelligence and former chief scientist and cofounder at OpenAI, now highlights the limitations of LLMs, a technology he had a huge hand in creating. LLMs are very good at learning how to do a lot of specific tasks, but they do not seem to learn the principles behind those tasks, Sutskever said in an interview with Dwarkesh Patel in November.

It’s the difference between learning how to solve a thousand different algebra problems and learning how to solve any algebra problem. “The thing which I think is the most fundamental is that these models somehow just generalize dramatically worse than people,” Sutskever said.

It’s easy to imagine that LLMs can do anything because their use of language is so compelling. It is astonishing how well this technology can mimic the way people write and speak. And we are hardwired to see intelligence in things that behave in certain ways—whether it’s there or not. In other words, we have built machines with humanlike behavior and cannot resist seeing a humanlike mind behind them.

That’s understandable. LLMs have been part of mainstream life for only a few years. But in that time, marketers have preyed on our shaky sense of what the technology can really do, pumping up expectations and turbocharging the hype. As we live with this technology and come to understand it better, those expectations should fall back down to earth.  

02: AI is not a quick fix to all your problems

In July, researchers at MIT published a study that became a tentpole talking point in the disillusionment camp. The headline result was that a whopping 95% of businesses that had tried using AI had found zero value in it.  

The general thrust of that claim was echoed by other research, too. In November, a study by researchers at Upwork, a company that runs an online marketplace for freelancers, found that agents powered by top LLMs from OpenAI, Google DeepMind, and Anthropic failed to complete many straightforward workplace tasks by themselves.

This is miles off Altman’s prediction: “We believe that, in 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies,” he wrote on his personal blog in January.

But what gets missed in that MIT study is that the researchers’ measure of success was pretty narrow. That 95% failure rate accounts for companies that had tried to implement bespoke AI systems but had not yet scaled them beyond the pilot stage after six months. It shouldn’t be too surprising that a lot of experiments with experimental technology don’t pan out straight away.

That number also does not include the use of LLMs by employees outside of official pilots. The MIT researchers found that around 90% of the companies they surveyed had a kind of AI shadow economy where workers were using personal chatbot accounts. But the value of that shadow economy was not measured.  

When the Upwork study looked at how well agents completed tasks together with people who knew what they were doing, success rates shot up. The takeaway seems to be that a lot of people are figuring out for themselves how AI might help them with their jobs.

That fits with something the AI researcher and influencer (and coiner of the term “vibe coding”) Andrej Karpathy has noted: Chatbots are better than the average human at a lot of different things (think of giving legal advice, fixing bugs, doing high school math), but they are not better than an expert human. Karpathy suggests this may be why chatbots have proved popular with individual consumers, helping non-experts with everyday questions and tasks, but they have not upended the economy, which would require outperforming skilled employees at their jobs.

That may change. For now, don’t be surprised that AI has not (yet) had the impact on jobs that boosters said it would. AI is not a quick fix, and it cannot replace humans. But there’s a lot to play for. The ways in which AI could be integrated into everyday workflows and business pipelines are still being tried out.   

03: Are we in a bubble? (If so, what kind of bubble?)

If AI is a bubble, is it like the subprime mortgage bubble of 2008 or the internet bubble of 2000? Because there’s a big difference.

The subprime bubble wiped out a big part of the economy, because when it burst it left nothing behind except debt and overvalued real estate. The dot-com bubble wiped out a lot of companies, which sent ripples across the world, but it left behind the infant internet—an international network of cables and a handful of startups, like Google and Amazon, that became the tech giants of today.  

Then again, maybe we’re in a bubble unlike either of those. After all, there’s no real business model for LLMs right now. We don’t yet know what the killer app will be, or if there will even be one. 

And many economists are concerned about the unprecedented amounts of money being sunk into the infrastructure required to build capacity and serve the projected demand. But what if that demand doesn’t materialize? Add to that the weird circularity of many of those deals—with Nvidia paying OpenAI to pay Nvidia, and so on—and it’s no surprise everybody’s got a different take on what’s coming. 

Some investors remain sanguine. In an interview with the Technology Business Programming Network podcast in November, Glenn Hutchins, cofounder of Silver Lake Partners, a major international private equity firm, gave a few reasons not to worry. “Every one of these data centers—almost all of them—has a solvent counterparty that is contracted to take all the output they’re built to suit,” he said. In other words, it’s not a case of “Build it and they’ll come”—the customers are already locked in. 

And, he pointed out, one of the biggest of those solvent counterparties is Microsoft. “Microsoft has the world’s best credit rating,” Hutchins said. “If you sign a deal with Microsoft to take the output from your data center, Satya is good for it.”

Many CEOs will be looking back at the dot-com bubble and trying to learn its lessons. Here’s one way to see it: The companies that went bust back then didn’t have the money to last the distance. Those that survived the crash thrived.

With that lesson in mind, AI companies today are trying to pay their way through what may or may not be a bubble. Stay in the race; don’t get left behind. Even so, it’s a desperate gamble.

But there’s another lesson too. Companies that might look like sideshows can turn into unicorns fast. Take Synthesia, which makes avatar generation tools for businesses. Nathan Benaich, cofounder of the VC firm Air Street Capital, admits that when he first heard about the company a few years ago, back when fear of deepfakes was rife, he wasn’t sure what its tech was for and thought there was no market for it.

“We didn’t know who would pay for lip-synching and voice cloning,” he says. “Turns out there’s a lot of people who wanted to pay for it.” Synthesia now has around 55,000 corporate customers and brings in around $150 million a year. In October, the company was valued at $4 billion.

04: ChatGPT was not the beginning, and it won’t be the end

ChatGPT was the culmination of a decade’s worth of progress in deep learning, the technology that underpins all of modern AI. The seeds of deep learning itself were planted in the 1980s. The field as a whole goes back at least to the 1950s. If progress is measured against that backdrop, generative AI has barely got going.

Meanwhile, research is at a fever pitch. There are more high-quality submissions to the world’s major AI conferences than ever before. This year, organizers of some of those conferences resorted to turning down papers that reviewers had already approved, just to manage numbers. (At the same time, preprint servers like arXiv have been flooded with AI-generated research slop.)

“It’s back to the age of research again,” Sutskever said in that Dwarkesh interview, talking about the current bottleneck with LLMs. That’s not a setback; that’s the start of something new.

“There’s always a lot of hype beasts,” says Benaich. But he thinks there’s an upside to that: Hype attracts the money and talent needed to make real progress. “You know, it was only like two or three years ago that the people who built these models were basically research nerds that just happened on something that kind of worked,” he says. “Now everybody who’s good at anything in technology is working on this.”

Where do we go from here?

The relentless hype hasn’t come just from companies drumming up business for their vastly expensive new technologies. There’s a large cohort of people—inside and outside the industry—who want to believe in the promise of machines that can read, write, and think. It’s a wild decades-old dream

But the hype was never sustainable—and that’s a good thing. We now have a chance to reset expectations and see this technology for what it really is—assess its true capabilities, understand its flaws, and take the time to learn how to apply it in valuable (and beneficial) ways. “We’re still trying to figure out how to invoke certain behaviors from this insanely high-dimensional black box of information and skills,” says Benaich.

This hype correction was long overdue. But know that AI isn’t going anywhere. We don’t even fully understand what we’ve built so far, let alone what’s coming next.