How AI video games can help reveal the mysteries of the human mind

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. 

This week I’ve been thinking about thought. It was all brought on by reading my colleague Niall Firth’s recent cover story about the use of artificial intelligence in video games. The piece describes how game companies are working to incorporate AI into their products to create more immersive experiences for players.

These companies are applying large language models to generate new game characters with detailed backstories—characters that could engage with a player in any number of ways. Enter in a few personality traits, catchphrases, and other details, and you can create a background character capable of endless unscripted, never-repeating conversations with you.

This is what got me thinking. Neuroscientists and psychologists have long been using games as research tools to learn about the human mind. Numerous video games have been either co-opted or especially designed to study how people learn, navigate, and cooperate with others, for example. Might AI video games allow us to probe more deeply, and unravel enduring mysteries about our brains and behavior?

I decided to call up Hugo Spiers to find out. Spiers is a neuroscientist at University College London who has been using a game to study how people find their way around. In 2016, Spiers and his colleagues worked with Deutsche Telekom and the games company Glitchers to develop Sea Hero Quest, a mobile video game in which players have to navigate a sea in a boat. They have since been using the game to learn more about how people lose navigational skills in the early stages of Alzheimer’s disease.

The use of video games in neuroscientific research kicked into gear in the 1990s, Spiers tells me, following the release of 3D games like Wolfenstein 3D and Duke Nukem. “For the first time, you could have an entirely simulated world in which to test people,” he says.

Scientists could observe and study how players behaved in these games: how they explored their virtual environment, how they sought rewards, how they made decisions. And research volunteers didn’t need to travel to a lab—their gaming behavior could be observed from wherever they happened to be playing, whether that was at home, at a library, or even inside an MRI scanner.

For scientists like Spiers, one of the biggest advantages of using games in research is that people want to play them. The use of games allows scientists to explore fundamental experiences like fun and curiosity. Researchers often offer a small financial incentive to volunteers who take part in their studies. But they don’t have to pay people to play games, says Spiers.

You’re much more likely to have fun if you’re motivated. It’s just not quite the same when you’re doing something purely for the money. And not having to pay participants allows researchers to perform huge studies on smaller budgets. Spiers has been able to collect data on over 4 million people from 195 countries, all of whom have willingly played Sea Hero Quest.  

AI could help researchers go even further. A rich, immersive world filled with characters that interact in realistic ways could help them study how our minds respond to various social settings and how we relate to other individuals. By observing how players interact with AI characters, scientists can learn more about how we cooperate—and compete—with others. It would be far cheaper and easier than hiring actors to engage with research volunteers, says Spiers.

Spiers himself is interested in learning how people hunt, whether for food, clothes, or a missing pet. “We still use these bits of our brain that our ancestors would have used daily, and of course some traditional communities still hunt,” he tells me. “But we know almost nothing about how the brain does this.” He envisions using AI-driven nonplayer characters to learn more about how humans cooperate for hunting.

There are other, newer questions to explore. At a time when people are growing attached to “virtual companions,” and an increasing number of AI girlfriends and boyfriends are being made available, AI video-game characters could also help us understand these novel relationships. “People are forming a relationship with an artificial agent,” says Spiers. “That’s inherently interesting. Why would you not want to study that?”


Now read the rest of The Checkup

Read more from MIT Technology Review’s archive:

My fellow London-based colleagues had a lot of fun generating an AI game character based on Niall. He turned out to be a sarcastic, smug, and sassy monster.

Google DeepMind has developed a generative AI model that can generate a basic but playable video game from a short description, a hand-drawn sketch, or a photo, as my colleague Will Heaven wrote earlier this year. The resulting games look a bit like Super Mario Bros.

Today’s world is undeniably gamified, argues Bryan Gardiner. He explores how we got here in another article from the Play issue of the magazine.

Large language models behave in unexpected ways. And no one really knows why, as Will wrote in March.

Technologies can be used to study the brain in lots of different ways—some of which are much more invasive than others. Tech that aims to read your mind and probe your memories is already being used, as I wrote in a previous edition of The Checkup.

From around the web:

Bad night of sleep left you needing a pick-me-up? Scientists have designed an algorithm to deliver tailored sleep-and-caffeine-dosing schedules to help tired individuals “maximize the benefits of limited sleep opportunities and consume the least required amount of caffeine.” (Yes, it may have been developed with the US Army in mind, but surely we all stand to benefit?) (Sleep)

Is dog cloning a sweet way to honor the memory of a dearly departed pet, or a “frivolous and wasteful and ethically obnoxious” pursuit in which humans treat living creatures as nothing more than their own “stuff”? This feature left me leaning toward the latter view, especially after learning that people tend to like having dogs with health problems … (The New Yorker)

States that have enacted the strongest restrictions to abortion access have also seen prescriptions for oral contraceptives plummet, according to new research. (Mother Jones)

And another study has linked Texas’s 2021 ban on abortion in early pregnancy with an increase in the number of infant deaths recorded in the state. In 2022, across the rest of the US, the number of infant deaths ascribed to anomalies present at birth decreased by 3.1%. In Texas, this figure increased by 22.9%. (JAMA Pediatrics)

We are three months into the bird flu outbreak in US dairy cattle. But the country still hasn’t implemented a sufficient testing infrastructure and doesn’t fully understand how the virus is spreading. (STAT)

The Download: AI video games’ research potential, and US government website redesigns

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

How AI video games can help reveal the mysteries of the human mind

Video gaming companies are applying large language models to generate new game characters with detailed backstories—characters that could engage with a player in any number of ways. Enter in a few personality traits, catchphrases, and other details, and you can create a character capable of endless unscripted, never-repeating conversations with you. (You can read our story all about that here.)

Beyond just gaming however, it’s a development that raises a tantalizing prospect: might AI video games allow neuroscientists and psychologists to probe more deeply, and unravel enduring mysteries about our brains and behavior? Our senior reporter Jessica Hamzelou decided to find out. Here’s what she learned.

This story is from The Checkup, our weekly newsletter all about biotech and health. Sign up to receive it in your inbox every Thursday.

Inside the US government’s brilliantly boring websites

Before the internet, Americans may have interacted with the federal government by stepping into grand buildings adorned with impressive stone columns and gleaming marble floors. 

Today, the neoclassical architecture of those physical spaces has been (at least partially) replaced by the digital architecture of website design—HTML code, tables, forms, and buttons. 

There are about 26,000 federal websites in the US. And for a long time, they were buggy or poorly designed. That all started changing in 2014, when President Obama created two new teams to help improve government tech. Read about what they’ve achieved since.

This story is from the latest issue of MIT Technology Review, which explores the theme of Play. Subscribe to read the whole thing, if you don’t already!

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Trump-Biden debate conspiracies are already all over the internet
And plenty of them are being pushed by Trump himself. (Wired $)
Election misinformation is being repeated by AI tools like ChatGPT and Copilot too. (NBC)
Spare a thought for pollsters. Their job is only getting harder and harder these days. (Ars Technica)

2 The voices of AI can tell us a lot
It’s new technology, but stereotypes of a compliant, endlessly empathetic female assistant are as old as it gets. (NYT $)

3 An effort is underway to encourage responsible use of AI in music
But of course, it relies on getting enough adoption—and that’s a big ask. (CNET)
Especially as there’s a giant legal battle underway over getting AI companies to pay to use records for training data. (MIT Technology Review)
Content-licensing sellers have formed the first AI dataset trade body. (Reuters $)
Time is the latest publisher to strike a licensing deal with OpenAI. (Axios)

4 We’re getting a better idea of how weight loss drugs work 
Researchers have zeroed in on two groups of neurons in the brain that seem to regulate the feeling of fullness. (Nature)

5 Google says Gemini AI is 20% faster than ChatGPT
And execs say it can now cite its sources, which is arguably even more important.  (Quartz $)
It’s not just Nvidia: here’s the AI stocks to watch. (WP $)

6 Amazon is investigating AI search startup Perplexity
Over whether it violated its rules by scraping its websites. (Wired $)
Perplexity’s CEO openly admitted to some pretty dodgy data practices when they were getting off the ground. (404 Media)

7 ISS astronauts had to take shelter after a Russian satellite disintegrated
It broke up into over 100 pieces, raising speculation it could’ve been subject to an anti-satellite missile test. (Gizmodo)
Why the first-ever space junk fine is such a big deal. (MIT Technology Review)

8 A lot of Gen Zs describe themselves as content creators
Passively lurking online is just not the vibe anymore. (WP $)

9 Would you clone your dog? 
It’d set you back $50,000—and in a way, you have to ask what you’re really getting for that. (New Yorker $)
These scientists are working to extend the life span of pet dogs—and their owners. (MIT Technology Review)

10 Why the internet’s going wild for Nerds Gummy Clusters
No joke—people are getting tattoos. (Slate $)

Quote of the day

“Let’s not go overboard on this. Datacentres are, in the most extreme case, a 6% addition [in energy demand] but probably only 2% to 2.5%. The question is, will AI accelerate a more than 6% reduction? And the answer is: certainly.”

—Bill Gates claims AI will be more of a help than a hindrance in achieving climate goals, amid rising concern about its energy footprint, The Guardian reports.

The big story

Inside NASA’s bid to make spacecraft as small as possible

detail from an image of Mars' surface

NASA/JPL-CALTECH

October 2023

Since the 1970s, we’ve sent a lot of big things to Mars. But when NASA successfully sent twin Mars Cube One spacecraft, the size of cereal boxes, in November 2018, it was the first time we’d ever sent something so small.

Just making it this far heralded a new age in space exploration. NASA and the community of planetary science researchers caught a glimpse of a future long sought: a pathway to much more affordable space exploration using smaller, cheaper spacecraft. Read the full story.

—David W. Brown

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or tweet ’em at me.)

+ The Bear probably wouldn’t exist if it weren’t for the late, great Bourdain
+ Exhausted? Remember your energy is a finite resource. Use it wisely.
+ Always late to everything? This has to be one of the funniest excuses I’ve heard yet.

Successful Entrepreneurs Don’t Overthink

Corey Wilks is a clinical psychologist turned business coach. He helps entrepreneurs and creators improve their mindset and overcome self-imposed obstacles. He says a common obstacle is overthinking a problem. Others involve perfectionism, doubt, and even fear.

In our recent conversation, he addressed those mental hurdles and more. The entire audio is embedded below. The transcript is edited for clarity and length.

Eric Bandholz: Tell our listeners who you are.

Corey Wilks: I am a licensed clinical psychologist specializing in cognitive behavior therapy, the practice of helping folks overcome unhealthy thinking. I now coach entrepreneurs and creators to build a values-aligned life and business and overcome beliefs holding them back.

I write a lot of articles on my website and for Psychology Today. I’ve appeared on a couple of prominent podcasts.

I received an early coaching boost from Ali Abdaal. He’s a prominent productivity expert, YouTuber, and entrepreneur. He put out a 45-minute video about the eight things he struggled with in life and business. It was exactly what I help people with, so I dropped everything and, in a day, wrote an article on how I would approach his pain points.

I posted it on X, and enough people shared it that he finally saw it. We worked together for about six months. I didn’t know him before then.

Many good things have happened since then. When I see somebody struggling with something, I send them either a video or an article that might help. If they want to work with me, that’s cool. If not, they still have that resource to check back on.

Bandholz: What does your coaching look like?

Wilks: Being a psychologist, all of my work is around mindset and prioritizing strategies. Common obstacles of entrepreneurs are limiting beliefs and personal narratives. Most of us know what we want or how to get what we want, but we second-guess ourselves and listen to what others think. We question our intelligence or worthiness of success.

Many entrepreneurs struggle with self-doubt and perfectionism. They blame a lack of money, resources, and intelligence for not getting the things they claim to want. But that is very rarely the case, especially with how accessible information is online.

They talk about it, but they never take action. And that typically comes down to fear. So much of my work with business owners is about identifying and overcoming these limiting beliefs or clarifying what matters. We then figure out where the disconnect is.

Bandholz: What’s your advice to people with expectations that exceed their capabilities?

Wilks: Intelligence after a certain point can be a hindrance because you overthink everything. Entrepreneurship is about identifying problems and creating solutions. Smart people are good at solving problems. But there are an infinite number of problems, and it’s easy to get paralyzed.

I know many more successful entrepreneurs who are intelligent but do not necessarily have a high IQ. They don’t overthink. Plenty of mediocre people are high performers because they’re not overthinking. That’s a big thing.

Bandholz: I’m curious why you left psychology.

Wilks: I got fired from my job as a behavioral health provider — a psychologist. I specialized in addiction treatment, working in rural West Virginia. I had peak job security. During Covid, I accepted a remote telehealth position in Kentucky. Two months into that new contract, I got an email stating I was fired in 30 days.

In the U.S., the patient has to reside in the state where the therapist is licensed. I’m licensed in West Virginia, not Kentucky. I couldn’t find another remote job out of West Virginia and wasn’t willing to move back.

Getting licensed in Kentucky would have taken four to six months. It’s a lot of red tape. I had 30 days and three paychecks to figure out my life. I spent 12 years getting my doctorate, and I couldn’t practice therapy anymore. What do I do with my life? I had to take all this knowledge and apply it to something else.

I got certified as an executive coach, which is like a four-letter word among therapists. Coaching is unregulated. A 14-year-old with a TikTok can call himself a life coach.

I decided to pursue coaching because the therapy world defines wellness as the absence of illness. Coaching is about helping healthy people flourish, thrive, and reach their potential.

I had to learn how to set up a business and create a WordPress site. I did a lot of Googling and YouTubeing. I met some kind and helpful entrepreneurs on X. We became friends. They took me under their wing and showed me the ropes.

I’ve taught myself. I produce valuable content to help folks attract friends and customers. I tell people to just start and then iterate.

Bandholz: Where can people follow you and learn more?

Wilks: My site is CoreyWilksPsyD.com. I’m @CoreyWilksPsyD on X, or add me on LinkedIn.

Google’s E-E-A-T & The Myth Of The Perfect Ranking Signal via @sejournal, @MattGSouthern

Few concepts have generated as much buzz and speculation in SEO as E-E-A-T.

Short for Experience, Expertise, Authoritativeness, and Trustworthiness, this framework has been a cornerstone of Google’s Search Quality Evaluator Guidelines for years.

But despite its prominence, more clarity about how E-E-A-T relates to Google‘s ranking algorithms is still needed.

In a recent episode of Google’s Search Off The Record podcast, Search Director & Product Manager Elizabeth Tucker addressed this complex topic.

Her comments offer insights into how Google evaluates and ranks content.

No Perfect Match

One key takeaway from Tucker’s discussion of E-E-A-T is that no single ranking signal perfectly aligns with all four elements.

Tucker explained

“There is no E-E-A-T ranking signal. But this really is for people to remember it’s a shorthand, something that should always be a consideration, although, you know, different types of results arguably need different levels of E-E-A-T.”

This means that while Google’s algorithms do consider factors like expertise, authoritativeness, and trustworthiness when ranking content, there isn’t a one-to-one correspondence between E-E-A-T and any specific signal.

The PageRank Connection

However, Tucker did offer an example of how one classic Google ranking signal – PageRank – aligns with at least one aspect of E-E-A-T.

Tucker said:

“PageRank, one of our classic Google ranking signals, probably is sort of along the lines of authoritativeness. I don’t know that it really matches up necessarily with some of those other letters in there.”

For those unfamiliar, PageRank is an algorithm that measures the importance and authority of a webpage based on the quantity and quality of links pointing to it.

In other words, a page with many high-quality inbound links is seen as more authoritative than one with fewer or lower-quality links.

Tucker’s comments suggest that while PageRank may be a good proxy for authoritativeness, it doesn’t necessarily capture the other elements of E-E-A-T, like expertise or trustworthiness.

Why SEJ Cares

While it’s clear that E-E-A-T matters, Tucker’s comments underscore that it’s not a silver bullet to ranking well.

Instead of chasing after a mythical “E-E-A-T score,” websites should create content that demonstrates their expertise and builds user trust.

This means investing in factors like:

  • Accurate, up-to-date information
  • Clear sourcing and attribution
  • Author expertise and credentials
  • User-friendly design and navigation
  • Secure, accessible web infrastructure

By prioritizing these elements, websites can send strong signals to users and search engines about the quality and reliability of their content.

The E-E-A-T Evolution

It’s worth noting that E-E-A-T isn’t a static concept.

Tucker explained in the podcast that Google’s understanding of search quality has evolved over the years, and the Search Quality Evaluator Guidelines have grown and changed along with it.

Today, E-E-A-T is just one of the factors that Google considers when evaluating and ranking content.

However, the underlying principles – expertise, authoritativeness, and trustworthiness – will likely remain key pillars of search quality for the foreseeable future.

Listen to the full podcast episode below:


Featured Image: salarko/Shutterstock

Google Warns Of Soft 404 Errors And Their Impact On SEO via @sejournal, @MattGSouthern

In a recent LinkedIn post, Google Analyst Gary Illyes raised awareness about two issues plaguing web crawlers: soft 404 and other “crypto” errors.

These seemingly innocuous mistakes can negatively affect SEO efforts.

Understanding Soft 404s

Soft 404 errors occur when a web server returns a standard “200 OK” HTTP status code for pages that don’t exist or contain error messages. This misleads web crawlers, causing them to waste resources on non-existent or unhelpful content.

Illyes likened the experience to visiting a coffee shop where every item is unavailable despite being listed on the menu. While this scenario might be frustrating for human customers, it poses a more serious problem for web crawlers.

As Illyes explains:

“Crawlers use the status codes to interpret whether a fetch was successful, even if the contents of the page is basically just an error message. They might happily go back to the same page again and again wasting your resources, and if there are many such pages, exponentially more resources.”

The Hidden Costs Of Soft Errors

The consequences of soft 404 errors extend beyond the inefficient use of crawler resources.

According to Illyes, these pages are unlikely to appear in search results because they are filtered out during indexing.

To combat this issue, Illyes advises serving the appropriate HTTP status code when the server or client encounters an error.

This allows crawlers to understand the situation and allocate their resources more effectively.

Illyes also cautioned against rate-limiting crawlers with messages like “TOO MANY REQUESTS SLOW DOWN,” as crawlers cannot interpret such text-based instructions.

Why SEJ Cares

Soft 404 errors can impact a website’s crawlability and indexing.

By addressing these issues, crawlers can focus on fetching and indexing pages with valuable content, potentially improving the site’s visibility in search results.

Eliminating soft 404 errors can also lead to more efficient use of server resources, as crawlers won’t waste bandwidth repeatedly visiting error pages.

How This Can Help You

To identify and resolve soft 404 errors on your website, consider the following steps:

  1. Regularly monitor your website’s crawl reports and logs to identify pages returning HTTP 200 status codes despite containing error messages.
  2. Implement proper error handling on your server to ensure that error pages are served with the appropriate HTTP status codes (e.g., 404 for not found, 410 for permanently removed).
  3. Use tools like Google Search Console to monitor your site’s coverage and identify any pages flagged as soft 404 errors.

Proactively addressing soft 404 errors can improve your website’s crawlability, indexing, and SEO.


Featured Image: Julia Tim/Shutterstock

WordPress Plugin Supply Chain Attacks Escalate via @sejournal, @martinibuster

WordPress plugins continue to be under attack by hackers using stolen credentials (from other data breaches) to gain direct access to plugin code.  What makes these attacks of particular concern is that these supply chain attacks can sneak in because the compromise appears to users as plugins with a normal update.

Supply Chain Attack

The most common vulnerability is when a software flaw allows an attacker to inject malicious code or to launch some other kind of attack, the flaw is in the code. But a supply chain attack is when the software itself or a component of that software (like a third party script used within the software) is directly altered with malicious code. This creates the situation where the software itself is delivering the malicious files.

The United States Cybersecurity and Infrastructure Security Agency (CISA) defines a supply chain attack (PDF):

“A software supply chain attack occurs when a cyber threat actor infiltrates a software vendor’s network and employs malicious code to compromise the software before the vendor sends it to their customers. The compromised software then compromises the customer’s data or system.

Newly acquired software may be compromised from the outset, or a compromise may occur through other means like a patch or hotfix. In these cases, the compromise still occurs prior to the patch or hotfix entering the customer’s network. These types of attacks affect all users of the compromised software and can have widespread consequences for government, critical infrastructure, and private sector software customers.”

For this specific attack on WordPress plugins, the attackers are using stolen password credentials to gain access to developer accounts that have direct access to plugin code to add malicious code to the plugins in order to create administrator level user accounts at every website that uses the compromised WordPress plugins.

Today, Wordfence announced that additional WordPress plugins have been identified as having been compromised. It may very well be the case that there will be more plugins that are or will be compromised. So it’s good to understand what is going on and to be proactive about protecting sites under your control.

More WordPress Plugins Attacked

Wordfence issued an advisory that more plugins were compromised, including a highly popular podcasting plugin called PowerPress Podcasting plugin by Blubrry.

These are the newly discovered compromised plugins announced by Wordfence:

  • WP Server Health Stats (wp-server-stats): 1.7.6
    Patched Version: 1.7.8
    10,000 active installations
  • Ad Invalid Click Protector (AICP) (ad-invalid-click-protector): 1.2.9
    Patched Version: 1.2.10
    30,000+ active installations
  • PowerPress Podcasting plugin by Blubrry (powerpress): 11.9.3 – 11.9.4
    Patched Version: 11.9.6
    40,000+ active installations
  • Latest Infection – Seo Optimized Images (seo-optimized-images): 2.1.2
    Patched Version: 2.1.4
    10,000+ active installations
  • Latest Infection – Pods – Custom Content Types and Fields (pods): 3.2.2
    Patched Version: No patched version needed currently.
    100,000+ active installations
  • Latest Infection – Twenty20 Image Before-After (twenty20): 1.6.2, 1.6.3, 1.5.4
    Patched Version: No patched version needed currently.
    20,000+ active installations

These are the first group of compromised plugins:

  • Social Warfare
  • Blaze Widget
  • Wrapper Link Element
  • Contact Form 7 Multi-Step Addon
  • Simply Show Hooks

More information about the WordPress Plugin Supply Chain Attack here.

What To Do If Using A Compromised Plugin

Some of the plugins have been updated to fix the problem, but not all of them. Regardless of whether the compromised plugin has been patched to remove the malicious code and the developer password updated, site owners should check their database to make sure there are no rogue admin accounts that have been added to the WordPress website.

The attack creates administrator accounts with the user names of “Options” or “PluginAuth” so those are the user names to watch for. However, it’s probably a good idea to look for any new admin level user accounts that are unrecognized in case the attack has evolved and the hackers are using different administrator accounts.

Site owners that use the Wordfence free or Pro version of the Wordfence WordPress security plugin are notified if there’s a discovery of a compromised plugin. Pro level users of the plugin receive malware signatures for immediately detecting infected plugins.

The official Wordfence warning announcement about these new infected plugins advises:

“If you have any of these plugins installed, you should consider your installation compromised and immediately go into incident response mode. We recommend checking your WordPress administrative user accounts and deleting any that are unauthorized, along with running a complete malware scan with the Wordfence plugin or Wordfence CLI and removing any malicious code.

Wordfence Premium, Care, and Response users, as well as paid Wordfence CLI users, have malware signatures to detect this malware. Wordfence free users will receive the same detection after a 30 day delay on July 25th, 2024. If you are running a malicious version of one of the plugins, you will be notified by the Wordfence Vulnerability Scanner that you have a vulnerability on your site and you should update the plugin where available or remove it as soon as possible.”

Read more:

WordPress Plugins Compromised At The Source – Supply Chain Attack

3 More Plugins Infected in WordPress.org Supply Chain Attack Due to Compromised Developer Passwords

Featured Image by Shutterstock/Moksha Labs

Google’s Search Dilemma: The Battle With ‘Not’ & Prepositions via @sejournal, @MattGSouthern

While Google has made strides in understanding user intent, Director & Product Manager Elizabeth Tucker says specific queries remain challenging.

In a recent episode of Google’s Search Off The Record podcast, Tucker discussed some lingering pain points in the company’s efforts to match users with the information they seek.

Among the top offenders were searches containing the word “not” and queries involving prepositions, Tucker reveals:

“Prepositions, in general, are another hard one. And one of the really big, exciting breakthroughs was the BERT paper and transformer-based machine learning models when we started to be able to get some of these complicated linguistic issues right in searches.”

BERT, or Bidirectional Encoder Representations from Transformers, is a neural network-based technique for natural language processing that Google began leveraging in search in 2019.

The technology is designed to understand the nuances and context of words in searches rather than treating queries as a bag of individual terms.

‘Not’ There Yet

Despite the promise of BERT and similar advancements, Tucker acknowledged that Google’s ability to parse complex queries is still a work in progress.

Searches with the word “not” remain a thorn in the search engine’s side, Tucker explains:

“It’s really hard to know when ‘not’ means that you don’t want the word there or when it has a different kind of semantic meaning.”

For example, Google’s algorithms could interpret a search like “shoes not made in China” in multiple ways.

Does the user want shoes made in countries other than China, or are they looking for information on why some shoe brands have moved their manufacturing out of China?

This ambiguity poses a challenge for websites trying to rank for such queries. If Google can’t match the searcher’s intent with the content on a page, it may struggle to surface the most relevant results.

The Preposition Problem

Another area where Google’s algorithms can stumble is prepositions, which show the relationship between words in a sentence.

Queries like “restaurants with outdoor seating” or “hotels near the beach” rely on prepositions to convey key information about the user’s needs.

For SEO professionals, this means that optimizing for queries with prepositions may require some extra finesse.

It’s not enough to include the right keywords on a page; the content needs to be structured to communicate the relationships between those keywords.

The Long Tail Challenge

The difficulties Google faces with complex queries are particularly relevant to long-tail searches—those highly specific, often multi-word phrases that make up a significant portion of all search traffic.

Long-tail keywords are often seen as a golden opportunity for SEO, as they tend to have lower competition and can signal a high level of user intent.

However, if Google can’t understand these complex queries, it may be harder for websites to rank for them, even with well-optimized content.

The Road Ahead

Tucker noted that Google is actively improving its handling of these linguistically challenging queries, but a complete solution may still be a way off.

Tucker said:

“I would not say this is a solved problem. We’re still working on it.”

In the meantime, users may need to rephrase their searches or try different query formulations to find the information they’re looking for – a frustrating reality in an age when many have come to expect Google to understand their needs intuitively.

Why SEJ Cares

While BERT and similar advancements have helped Google understand user intent, the search giant’s struggles with “not” queries and prepositions remind us that there’s still plenty of room for improvement.

As Google continues to invest in natural language processing and other AI-driven technologies, it remains to be seen how long these stumbling blocks will hold back the search experience.

What It Means For SEO

So, what can SEO professionals and website owners do in light of this information? Here are a few things to keep in mind:

  1. Focus on clarity and specificity in your content. The more you can communicate the relationships between key concepts and phrases, the easier it will be for Google to understand and rank your pages.
  2. Use structured data and other technical SEO best practices to help search engines parse your content more effectively.
  3. Monitor your search traffic and rankings for complex queries, and be prepared to adjust your strategy if you see drops or inconsistencies.
  4. Monitor Google’s efforts to improve its natural language understanding and be ready to adapt as new algorithms and technologies emerge.

Listen to the full podcast episode below:

What Is Bounce Rate & How To Audit It via @sejournal, @vahandev

Many people talk about how important it is to have a “low bounce rate.”

But bounce rate is one of the most misunderstood metrics in SEO and digital marketing.

This article will explore the complexities of bounce rate and why it’s not as straightforward as you might think.

You’ll also learn how to analyze your bounce using Google Analytics 4 exploration reports.

In order to understand what bounce rate is, we need to define what engaged sessions are according to GA4.

What Is An Engaged Session?

An engaged session in GA4 is a session which meets either of the following criteria:

  • Lasts at least 10 seconds.
  • Has key event (formerly conversions).
  • Has at least two screen views (or pageviews).

Simply put, if a user lands on your homepage and leaves without converting (key event), that would produce a 100 percent bounce rate for that session.

If one lands and visits a second page or signs up for your newsletter (as you defined it as a key event), that would mean the bounce rate for that session is 0%.

What Is Bounce Rate In Google Analytics?

Bounce rate is a percentage of unengaged sessions, and it is calculated with the following formula:

(total sessions/unengaged sessions)*100.

So, it’s not only visiting a second page that brings the bounce rate down but also when key events occur.

You can set up any event, either built-in or custom-defined in Google Analytics 4 (GA4), to count as a key event (formerly conversion), and in cases when it occurs during the session, it will be counted as a non-bounce visit.

Here is how to define any event as a key event:

  • Navigate to Admin.
  • Under Data display, navigate to Events.
  • Find the event you are interested in and toggle Mark as key event to turn it blue.
How to mark events as key events in GAHow to mark events as key events in GA.

How To Change The Default Engaged Session Timer In GA4

As a marketer, you may want to adjust the default 10-second timer for engaged sessions based on your project needs.

For example, if you have a blog article, you may want to set the timer as high as 20 seconds, but if you have a product page where users typically take more time to explore details, you might increase the timer to 30 seconds to better reflect user engagement.

To change:

  • Navigate to Data streams and click on the stream.
  • In the slide popup, navigate to Configure tag settings.
  • In the second slide popup, click Show more at the bottom.
  • Click on the Adjust session timeout setting.
  • Change Adjust timer for engaged sessions to the value of your choice.

Here is the detailed video guide on how to adjust the timer for engaged sessions:

What Is A Good Bounce Rate?

So, it’s not as straightforward as saying, “Example.com has a bounce rate of 43 percent, and example2.com has a bounce rate of 20 percent; therefore, example2.com performs better.”

For example, if you search [what’s on at the cinema…], then land on a website and have to dig through five pages of the site to find what’s showing, the website might have a low bounce rate but will have a poor user experience.

In this case, that’s misleading if you consider a low bounce rate good.

On top of that, what use is there in measuring the bounce rate for the whole website when you have lots of different templates that are laid out and designed in different ways, and you track ‘key events,’ aka conversions, differently?

In most cases, this shows that your marketing is effective and well-targeted, and visitors are engaging with your content and wanting to know more.

Remember, bounce rate is not a ranking factor, but when users navigate deeper into your pages, it is an engagement ranking signal that Google may take into account, according to what Google’s Pandu Nayak said during hearings.

That said, it may make sense to track the number of sessions with two or more pageviews in GA4, which you may want to consider as a KPI when reporting.

How To Set Up A Custom Audience With Multiple Pageviews Per Session

If you want to know how many visitors you have who have more than two page views in a session, you can easily set it up in GA4.

To do that:

  • Navigate to Admin.
  • Under Data display, navigate to Audiences.
  • Click the New Audience blue button on the top right corner.
  • Click Create custom audience.
  • Set up a name for your audience.
  • Select scope to “Within the same session.”
  • Select session_start.
  • Click And and select “page_views” with the parameter with “Event count” greater than one.

You simply tell it to add to my audience all users who viewed more than two pages within the same session. Here is a quick video guide on how to do that.

You can set up audiences with any granularity, like sessions with exactly two or three pageviews and greater than three pageviews.

Later, you can filter your standard reports using your custom audiences.

How To Do Bounce Rate Reporting And Audit

Next time your boss or client asks you, “Why is my bounce rate so high?” – first, send them this article.

Second, conduct an in-depth bounce rate audit to understand what’s going on.

Here’s how I do it.

Bounce Rate by Date Range

Look at bounce rates on your website for a particular period. This is the most simple reporting on bounce rate.

To do that:

  • Navigate to Explorations on the right-side menu.
  • Click ‘Blank’ report.
  • From Metrics choose “Bounce rate.”
  • Set Values to a “Bounce rate.”
  • Under Settings (2nd column), choose visualization type “Line chart.”
  • Select the date period of your choice.
How to set up a bounce rate report for the entire website by date range.How to set up a bounce rate report for the entire website by date range.

If you see spikes in the chart, it may indicate a change you made to the website that influenced the bounce rate.

How To Analyze Bounce Rate On A Page Level

When running a lead generation campaign on many different landing pages, evaluating which pages convert well or poorly is vital to optimize them for better performance.

Another example use case of page-level bounce reports is A/B testing.

To do that:

  • Navigate to Explorations on the right-side menu.
  • Click Blank report.
  • From Metrics, choose Bounce rate and Sessions.
  • From Dimensions, choose Landing page + query string.
  • Under Settings (second column), choose visualization type ‘Table.”
  • Set Rows to a “Landing page + query string.”
  • Set Values  to a “Bounce rate: and “Sessions.”
  • Set the filter to include pages with more than 100 sessions ( to ensure the data you’re mining is statistically significant).
  • Select the date period of your choice.

Tip: You don’t need to create a new blank exploration report; instead, add another tab to the same report and change only the configuration.

How to setup page level-bounce rate report in GA4How to set up page level-bounce rate report.

If we don’t filter by sessions number, you’ll be looking at bounce rates on some pages with only one or two sessions, which doesn’t tell you anything.

Once you’ve done the above, repeat the process per channel to gain an even more rounded understanding of what content/source combinations produce the most or least engaged visits.

How To Analyze Your Bounce Rates By Traffic Channel

Bounce rates can be wildly different depending on the source of traffic.

For example, it’s likely that search traffic will produce a low bounce rate while social and display traffic might produce a high bounce rate.

So you also have to consider bounce rate on a channel level as well as on a page level.

The bounce rate from social and display is almost always higher than “inbound” channels for these reasons:

  • When a user is on social media looking through their news feed, they are (often) not actively looking for what we are promoting.
  • When a user sees a banner ad on another website, they are (often) not actively looking for what we are promoting.

However, for inbound channels like organic and paid search, it’s logical that the bounce rate is lower as these users are actively searching for what you are promoting.

So, you capture their attention during the “doing” phase of their buyer’s journey (depending on the search term in question).

To dig deeper into each one:

  • From Metrics, choose Bounce rate and Sessions.
  • From Dimensions, choose Session default channel group.
  • Under Settings (second column), choose visualization type Table.
  • Set Rows to a Session default channel group.
  • Set Values to a Bounce rate and Sessions.
  • Select the date period of your choice.
How to set up a bounce rate report by traffic channels in GA4.How to set up a bounce rate report by traffic channels.

A little homework: Try to plot a line graph based on the bounce rate for your organic traffic.

Now, you can dig deeper into the data and look for patterns or reasons that one page or set of pages/source or set of sources has a higher or lower bounce rate.

Compile the information in an easy-to-read format, ping it to the powers that be, and head for a congratulatory coffee.

Do You Have The Right Intent?

Sometimes, you’ll find pages that rank in search engines for terms that have more than one meaning.

For example, a recent one I discovered was a page on a website I manage that ranks first for the search term ‘Alang Alang’ (the name of a villa), but Alang Alang is also the name of a film.

The villa page had a high bounce rate, and one reason for this is that some of the visitors landing on that page were actually looking for the film, not the villa.

By doing keyword and competition research to see what results your target keywords produce, you can quickly understand if you have any pages that rank well for terms that could be intended for other topics.

When you identify such pages, you have three options:

  • Completely change your keyword targeting.
  • Remove the page from the SERPs.
  • Overhaul your title and meta description, so searchers know explicitly what the page is about before they click.

How To Increase Website Engagement

Now you’ve figured out what’s going wrong, you’re all set to make some changes.

All of this depends on your study’s findings, so not all of these points are relevant to every scenario, but this should be a good starting point.

Most importantly track custom events as “key events” (conversions) so things like newsletter sign-ups result in Google Analytics classifying that as a non-bounce even if the user didn’t visit a second page.

Is High Bounce Rate Bad?

Hopefully, you now understand why bounce rate isn’t simply “high” or “low”. It depends on many factors, and there is no single answer to the question, “Is high bounce rate bad?”

If you defined your ‘key events’ (conversions) and GA4 settings correctly for your goals, a high bounce ( +90% ) rate is definitely concerning because it means your visitors don’t engage enough with your webpages.

But if you have GA4 on default settings, you can never rely on data because of the reasons we discussed above.

Never assume anything. Do your research and make sure you configure your GA4 account properly to track ‘key events.’

Now, go forth and conquer your bounce rate!

More resources:


Featured Image: eamesBot/Shutterstock

44 Pinterest Statistics And Facts For 2024 via @sejournal, @annabellenyst

While it may not have the reach or revenue of big-hitters like Facebook and Instagram, Pinterest is absolutely a social platform worth exploring.

With its focus on visual discovery and inspiration, Pinterest occupies its own niche space in the social media landscape, which offers unique opportunities for marketers and brands.

Pinterest’s users are active, devoted, and take action – a powerful combination that not all social networks, even the biggest, can boast.

In this article, we’ll explore some of the latest facts and statistics highlighting Pinterest’s reach, user behavior, advertising potential, and more in 2024.

Let’s get started.

Pinterest Overview

1. Pinterest is the world’s 15th most-used social platform in 2024, with over 518 million global active users.

2. Global users spend an average of 1 hour and 45 minutes on Pinterest’s Android app per month.

3. Approximately 27.3% of Pinterest Android users open the app every day.

4. The same users open the Pinterest app approximately 48 times per month.

5. Pinterest is the seventh most visited social network in the US, with an estimated 266 million monthly visits in April 2024.

6. Of its monthly US visitors, roughly 62.5% are desktop users.

7. US users spend an average of 11 minutes and 25 seconds on Pinterest per visit.

8. Pinterest is the 28th most searched query globally, with a search volume of 72,120,000.

9. Compared to other traditional social media platforms, Gen Z rates Pinterest more highly for promoting and preserving well-being metrics such as “self-worth, belonging, and purpose.”

(Source) (Source) (Source) (Source) (Source)

Pinterest Company Background

10. Pinterest was founded in March 2010 by Ben Silbermann, Evan Sharp, and Paul Sciarra. It evolved from an earlier app called Tote, which was designed as a virtual substitute for paper catalogs.

11. The current CEO of Pinterest is Bill Ready.

12. Pinterest is headquartered in San Francisco, California.

13. The company has approximately 1,001 to 5,000 employees.

(Source) (Source) (Source)

Pinterest Financial Performance

14. As of May 2024, Pinterest has a market cap of more than $28 billion.

15. Pinterest generated $740 million in revenue in Q1 of 2024, reflecting a 23% increase year-over-year.

16. In 2023, Pinterest generated more than $3 billion in revenue.

(Source) (Source)

Pinterest User Statistics

17. Pinterest has 518 million monthly active users (MAUs) in 2024, an increase of 12% year-over-year. This puts Pinterest well into the coveted “half a billion users” club.

18. More than 40% of Pinterest users are Gen Z, and this demographic saves more content than any other.

19. Pinterest has 98 million MAUs in the US and Canada alone.

20. In Europe, the platform has 140 million MAUs.

21. The remaining 279 million MAUs are dispersed around the world.

22. Pinterest’s audience is skewed towards women, who make up 79.5% of its user base.

23. In the US, 35% of adults use Pinterest.

24. Women are significantly more likely to use Pinterest than men in the US, with usage rates of 50% compared to 19%.

25. People aged 25-34 years old make up the bulk of Pinterest’s users, with 81.9 million in 2023 (accounting for 30.9% of the company’s total ad audience).

(Source) (Source) (Source) (Source) (Source)

Pinterest Statistics By Location

26. The United States is the most prominent global audience for Pinterest, with more than 90.1 million active users.

27. As of April 2023, Pinterest’s global audience size includes:

Country Active Pinterest Users
US 90.1 million
Brazil 34.2 million
Mexico 23.6 million
Germany 16.8 million
France 12.7 million
The UK 10.1 million
Canada 9.7 million
Italy 9.5 million
Spain 8.1 million
Colombia 7.4 million

(Source)

Pinterest Advertising

28. Advertisers can reach 317 million users on Pinterest in 2024.

29. The top reason people use Pinterest is to find new products and brands.

30. About 36.8% of active Pinterest users say they use the platform to follow or research brands and products, making it the most popular activity on the platform (which isn’t true for any other social network).

31. Approximately 7.67% of web traffic to third-party websites arrives via Pinterest links, a 25.9% increase year-over-year.

32. Pinterest’s ad impressions grew by 38% in Q1 of 2024.

33. Ads on Pinterest deliver a cost per conversion that is 2.3 times more efficient than those on other social media platforms.

34. Retail brands experience twice the return on ad spend (ROAS) with Pinterest ads compared to ads on other social media platforms.

35. Female users aged 25-34 years old constitute the biggest cohort of Pinterest’s advertising audience, at 20.3%. Close behind are female users aged 18-24 years old, at 19.8%.

36. About 80% of weekly Pinterest users report feeling inspired by the shopping experience on the platform.

37. Almost all (96%) of Pinterest’s top searches are unbranded, indicating that users are open to discovering new ideas.

38. Pinterest shoppers spend twice as much per month compared to users on other platforms.

(Source) (Source) (Source) (Source) (Source) (Source)

Pinterest Content and Engagement

39. Pinterest users save 1.5 billion Pins every week.

40. About 85% of weekly Pinterest users say the network is their go-to platform when starting a new project.

41. Pinterest is seen as a positive online space by 80% of its users.

42. Pinterest users are actively shopping on the platform; 85% of users have made purchases directly from Pins.

43. About 80% of weekly users have found new brands or products on the platform.

(Source)

Most Followed Pinterest Boards

44. Here are some of the most followed Pinterest boards in 2024:

Board Followers*
1 Joy Cho / Oh Joy! 15.1 million
2 Poppytalk 10.4 million
3 BuzzFeed’s Tasty 10.3 million
4 Etsy 9.77 million
5 Maryann Rizzo 9 million
6 Mamas Uncut 8.5 million
7 Cathie Hong Interiors 7.9 million
8 Jane Wang 7.7 million
9 Erica Chan Coffman 7.2 million
10 Bonnie Tsang 7 million

*Pinterest followers as of May 2024

(Source)

In Summary

Though it serves a slightly more niche audience than some social media platforms, Pinterest has a highly loyal, dedicated audience that regularly uses the platform to shop, discover brands, and garner inspiration for their day-to-day lives.

This is all to say: Pinterest possesses a ton of potential as a marketing tool for brands and marketers who are savvy and can use it to their advantage.

Hopefully, these facts and statistics will help you leverage Pinterest’s platform to benefit your business and 2024.

More resources: 


Featured Image: Kaspars Grinvalds/Shutterstock

Training AI music models is about to get very expensive

AI music is suddenly in a make-or-break moment. On June 24, Suno and Udio, two leading AI music startups that make tools to generate complete songs from a prompt in seconds, were sued by major record labels. Sony Music, Warner Music Group, and Universal Music Group claim the companies made use of copyrighted music in their training data “at an almost unimaginable scale,” allowing the AI models to generate songs that “imitate the qualities of genuine human sound recordings.”

Two days later, the Financial Times reported that YouTube is pursuing a comparatively aboveboard approach. Rather than training AI music models on secret data sets, the company is reportedly offering unspecified lump sums to top record labels in exchange for licenses to use their catalogues for training. 

In response to the lawsuits, both Suno and Udio released statements mentioning efforts to ensure that their models don’t imitate copyrighted works, but neither company has specified whether their training sets contain them. Udio said its model “has ‘listened’ to and learned from a large collection of recorded music,” and two weeks before the lawsuits, Suno CEO Mikey Shulman told me its training set is “both industry standard and legal” but the exact recipe is proprietary.

While the ground here is changing fast, none of these moves should be all that surprising: litigious training-data battles have become something like a rite of passage for generative AI companies. The trend has led many of those companies, including OpenAI, to pay for licensing deals while the cases unfold. 

However, the stakes are higher for AI music than for image generators or chatbots. Generative AI companies working in text or photos have options to work around lawsuits; for example, they can cobble together open-source corpuses to train models. In contrast, music in the public domain is much more limited (and not exactly what most people want to listen to). 

Other AI companies can also more easily cut licensing deals with interested publishers and creators, of which there are many; but rights in music are far more concentrated than those in film, images, or text, industry experts say. They’re largely managed by the three biggest record labels—the new plaintiffs—whose publishing arms collectively own more than 10 million songs and much of the music that has defined the last century. (The filing names a long list of artists who the labels allege were wrongfully included in training data, ranging from ABBA to those on the Hamilton soundtrack.) 

On top of all this, it’s also just more difficult to create music worth listening to—generating a readable poem or passable illustration with AI is one technical challenge, but infusing a model with the taste required to create music we like is another. 

It’s of course possible that the AI companies will win the case, and none of this will matter; they would have carte blanche to train on a century of copyrighted music. But experts say the case from the record labels is strong, and it’s more likely that AI companies will soon have to pay up—and pay a lot—if they want to survive. If a court were to rule that AI music companies could not train for free on these labels’ catalogues, then expensive licensing deals, like the one YouTube is reportedly pursuing, would seem to be the only path forward. This would effectively ensure that the company with the deepest pockets ends up on top.

More than any training-data case yet, the outcome of this one will determine the shape of a big slice of AI—and whether there is a future for it at all. 

Merits of the case

Suno’s music generator has been public for less than a year, but the company has already garnered 12 million users, a $125 million funding round last month, and a partnership with Microsoft Copilot. Udio is even newer to the scene, having launched in April with $10 million in seed funding from musician-investors like will.i.am and Common. 

The record labels allege that both of the startups are engaging in copyright infringement on the training and the output sides of their models.

“The plaintiffs here have the best odds of almost anyone suing an AI company,” says James Grimmelmann, a professor of digital and information law at Cornell Law School. He draws comparisons to the ongoing New York Times case against OpenAI, which he says offered, until now, the best example of a rights holder with a strong case against an AI company. But the suit against Suno and Udio “is worse for a bunch of reasons.”

The Times has accused OpenAI of copyright infringement in its model training by using the publication’s articles without consent. Grimmelmann says OpenAI has a bit of plausible deniability in this accusation, because the company could say that it scraped much of the internet for a training corpus and copies of New York Times articles appeared in places without the company’s knowledge. 

For Suno and Udio, that defense is far less believable. “This is not like, ‘We scraped the web for all audio and we couldn’t tell the commercially produced songs apart from everything else,’” Grimmelmann says. “It’s pretty clear that they had to have been pulling in large databases of commercial recordings.” 

In addition to complaints about training, the new case alleges that tools like Suno and Udio are more imitative than generative AI, meaning that their output mimics the style of artists and songs protected by copyright. 

While Grimmelmann notes that the Times cited examples in which ChatGPT reproduced entire copies of its articles, record labels claim they were able to generate problematic responses from the AI music models with much simpler prompts. For instance, prompting Udio with “my tempting 1964 girl smokey sing hitsville soul pop,” the plaintiffs say, yielded a song that “any listener familiar with the Temptations would instantly recognize as resembling the copyrighted sound recording ‘My Girl.’” (The court documents include links to examples on Udio, but the songs appear to have been removed.) The plaintiffs mention similar examples from Suno, including an ABBA-adjacent song called “Prancing Queen” that was generated with the prompt “70s pop” and the lyrics for “Dancing Queen.”

What’s more, Grimmelmann explains, there is more copyrightable information in a song than a news article. “There’s just a lot more information density in capturing the way that Mariah Carey’s voice works than there is in words,” he says, which is perhaps part of the reason past lawsuits navigating music copyright have sometimes been so drawn-out and complex. 

In a statement, Shulman wrote that Suno prioritizes originality and that the model is “designed to generate completely new outputs, not to memorize and regurgitate preexisting content.” He added, “That is why we don’t allow user prompts that reference specific artists.” Udio’s statement similarly mentioned “state-of-the-art filters to ensure our model does not reproduce copyrighted works or artists’ voices.”

Indeed, the tools will block a request if it names an artist. But the record labels allege that the safeguards have significant loopholes. Following the news of the lawsuits, for instance, social media users shared examples suggesting that if users separate an artist’s name with spaces, the request may go through. My own request for “a song like Kendrick” was blocked by Suno, citing an artist’s name, but “a song like k e n d r i c k” resulted in a “hip-hop rhythmic beat-driven” track and “a song like k o r n” resulted in “nu-metal heavy aggressive.” (To be fair, they didn’t resemble the respective artists’ unique styles, but to even respond in the right tightly defined genre seems to suggest that the model is in fact familiar with each artist’s work.) Similar workarounds were blocked on Udio. 

Possible outcomes

There are three ways the case could go, Grimmelmann says. One is wholly in favor of the AI startups: the lawsuits fail and the court determines that companies did not violate fair use or imitate copyrighted works too closely in their outputs. If the models are found to fall under fair use, it would mean songwriters and rights holders would need to find a different legal mechanism to pursue compensation. 

Another possibility is a mixed bag: the court finds the AI companies did not violate fair use in their training but must better control their models’ output to make sure it does not improperly imitate copyrighted works. Grimmelmann says this would be similar to one of the initial rulings against Napster, in which the company was forced to ban searches for copyrighted works in its libraries (though users quickly found workarounds). 

The third and essentially nuclear option is that the court finds fault on both the training and the output sides of the AI models. This would mean the companies could not train on copyrighted works without licenses, and also could not allow outputs that closely imitate copyrighted works. The companies could be ordered to pay damages for infringement, which could run into the hundreds of millions for each company. If they aren’t bankrupted by such a ruling, it would force them to completely restructure their training through licensing deals, which could also be cost-prohibitive. 

COURTESY SUNO.AI

To license or not to license

Though the immediate goals of the plaintiffs are to get the AI companies to cease training and pay damages, the chairman of the Recording Industry Association of America, Mitch Glazier, is already looking ahead toward a future of licensing. “As in the past, music creators will enforce their rights to protect the creative engine of human artistry and enable the development of a healthy and sustainable licensed market that recognizes the value of both creativity and technology,” he wrote in a recent op-ed in Billboard.

Such a market for licenses could mirror what has already unfolded for text generators. OpenAI has struck licensing deals with a number of news publishers, including Politico, the Atlantic, and the Wall Street Journal. The deals promise to make content from the publishers discoverable in OpenAI’s products, though the ability for the models to transparently cite where they’re getting information from is limited at best.

If AI music companies follow that pattern, the only ones with the means to create powerful music models might be those with the most cash. That’s perhaps exactly what YouTube is thinking. The company did not immediately respond to questions from MIT Technology Review about the details of its negotiations, but given the massive amount of data required to train AI models and the concentration of rights owners in music, it’s fair to assume the price of deals with record labels would be eye-popping. 

In theory, an AI company could bypass the licensing process altogether by building its model exclusively on music in the public domain, but it would be a Herculean task. There have been similar efforts in the realm of text and image generation, including a legal consultancy in Chicago that created a model trained on dense regulatory documents, and a model from Hugging Face that trained on images of Mickey Mouse from the 1920s. But the models are small and unremarkable. If Suno or Udio is forced to train on only what’s in the public domain—think military march music and the royalty-free songs found in corporate videos—the resulting model would be a far cry from what they have today.

If AI companies do move forward with licensing agreements, negotiations may be tricky, says Grimmelmann. Music licensing is complicated by the fact that two different copyrights are at play: one for the song, which generally covers the composition, like the music and lyrics, and one for the master, which covers the recording—like what you’d hear if you streamed the song. 

Some artists, like Taylor Swift and Frank Ocean, have come to own the masters of their catalogues after drawn-out legal battles, and would therefore be in the driver’s seat for any potential licensing deal. Many others, though, retain only the song copyright, while the record labels retain the masters. In these cases, the record label might theoretically be able to grant AI companies a license to use the music without an artist’s permission—but at the risk of burning relationships with artists and sparking more legal battles. 

The question of whether to license their music to such companies has divided musician groups. In contract rules adopted in April by SAG-AFTRA, which represents recording artists as well as actors, AI clones of member voices are allowed, though there are minimum rates for compensation. Back in December, a group called the Indie Musicians Caucus expressed frustrations that the leading instrumental musicians’ union, the 70,000-member American Federation of Musicians (AFM), was not doing enough to protect its rank and file against AI companies in contracts. The caucus wrote that it would vote against any agreement “obligating AFM members to dig [their] own graves by participating—without a right to consent, compensation, or credit—in the training of our permanent Generative AI replacements.”

But at this point, AFM does not appear eager to facilitate any deals. I asked Kenneth Shirk, international secretary-treasurer at AFM, whether he thought musicians should engage with AI companies and push to be fairly compensated, whatever that means, or instead resist licensing deals completely. 

“Looking at those questions makes me think, would you rather have a swarm of fire ants crawling all over you, or roll around in a bed of broken glass?” he told me. “We want musicians to get paid. But we also want to ensure that there’s a career in music to be had for those that are going to come after us.”