AI Integration in Marketing: Strategic Insights For SEO & Agency Leaders via @sejournal, @CallRail

This edited extract is from Data Storytelling in Marketing by Caroline Florence ©2024 and is reproduced and adapted with permission from Kogan Page Ltd.

Storytelling is an integral part of the human experience. People have been communicating observations and data to each other for millen­nia using the same principles of persuasion that are being used today.

However, the means by which we can generate data and insights and tell stories has shifted significantly and will continue to do so, as tech­nology plays an ever-greater role in our ability to collect, process, and find meaning from the wealth of information available.

So, what is the future of data storytelling?

I think we’ve all talked about data being the engine that powers business decision-making. And there’s no escaping the role that AI and data are going to play in the future.

So, I think the more data literate and aware you are, the more informed and evidence-led you can be about our decisions, regardless of what field you are in – because that is the future we’re all working towards and going to embrace, right?

It’s about relevance and being at the forefront of cutting-edge technology.

Sanica Menezes, Head of Customer Analytics, Aviva

The Near Future Scenario

Imagine simply applying a generative AI tool to your marketing data dashboards to create audience-ready copy. The tool creates a clear narrative structure, synthesized from the relevant datasets, with actionable and insightful messages relevant to the target audience.

The tool isn’t just producing vague and generic output with question­able accuracy but is sophisticated enough to help you co-author technically robust and compelling content that integrates a level of human insight.

Writing stories from vast and complex datasets will not only drive efficiency and save time, but free up the human co-author to think more creatively about how they deliver the end story to land the message, gain traction with recommendations and influence decisions and actions.

There is still a clear role for the human to play as co-author, including the quality of the prompts given, expert interpretation, nuance of language, and customization for key audiences.

But the human co-author is no longer bogged down by the complex and time-consuming process of gathering different data sources and analysing data for insights. The human co-author can focus on synthesizing findings to make sense of patterns or trends and perfect their insight, judgement, and communication.

In my conversations with expert contributors, the consensus was that AI would have a significant impact on data storytelling but would never replace the need for human intervention.

This vision for the future of storytelling is (almost) here. Tools like this already exist and are being further improved, enhanced, and rolled out to market as I write this book.

But the reality is that the skills involved in leveraging these tools are no different from the skills needed to currently build, create, and deliver great data stories. If anything, the risks involved in not having human co-authors means acquiring the skills covered in this book become even more valuable.

In the AI storytelling exercise WINconducted, the tool came up with “80 per cent of people are healthy” as its key point. Well, it’s just not an interesting fact.

Whereas the humans looking at the same data were able to see a trend of increasing stress, which is far more interesting as a story. AI could analyse the data in seconds, but my feeling is that it needs a lot of really good prompting in order for it to seriously help with the storytelling bit.

I’m much more positive about it being able to create 100 slides for me from the data and that may make it easier for me to pick out what the story is.

Richard Colwell, CEO, Red C Research & Marketing Group

We did a recent experiment with the Inspirient AI platform taking a big, big, big dataset, and in three minutes, it was able to produce 1,000 slides with decent titles and design.

Then you can ask it a question about anything, and it can produce 110 slides, 30 slides, whatever you want. So, there is no reason why people should be wasting time on the data in that way.

AI is going to make a massive difference – and then we bring in the human skill which is contextualization, storytelling, thinking about the impact and the relevance to the strategy and all that stuff the computer is never going to be able to do.

Lucy Davison, Founder And CEO, Keen As Mustard Marketing

Other Innovations Impacting On Data Storytelling

Besides AI, there are a number of other key trends that are likely to have an impact on our approach to data storytelling in the future:

Synthetic Data

Synthetic data is data that has been created artificially through computer simulation to take the place of real-world data. Whilst already used in many data models to supplement real-world data or when real-world data is not available, the incidence of synthetic data is likely to grow in the near future.

According to Gartner (2023), by 2024, 60 per cent of the data used in training AI models will be synthetically generated.

Speaking in Marketing Week (2023), Mark Ritson cites around 90 per cent accuracy for AI-derived consumer data, when triangulated with data generated from primary human sources, in academic studies to date.

This means that it has a huge potential to help create data stories to inform strategies and plans.

Virtual And Augmented Reality

Virtual and augmented reality will enable us to generate more immersive and interactive experiences as part of our data storytelling. Audiences will be able to step into the story world, interact with the data, and influence the narrative outcomes.

This technology is already being used in the world of entertainment to blur the lines between traditional linear television and interactive video games, creating a new form of content consumption.

Within data storytelling we can easily imagine a world with simulated customer conversations, whilst navigating the website or retail environment.

Instead of static visualizations and charts showing data, the audience will be able to overlay data onto their physical environment and embed data from different sources accessed at the touch of a button.

Transmedia Storytelling

Transmedia storytelling will continue to evolve, with narratives spanning multiple platforms and media. Data storytellers will be expected to create interconnected storylines across different media and channels, enabling audiences to engage with the data story in different ways.

We are already seeing these tools being used in data journalism where embedded audio and video, on-the-ground eyewitness content, live-data feeds, data visualization and photography sit alongside more traditional editorial commentary and narrative storytelling.

For a great example of this in practice, look at the Pulitzer Prize-winning “Snow fall: The avalanche at Tunnel Creek (Branch, 2012)” that changed the way The New York Times approached data storytelling.

In the marketing world, some teams are already investing in high-end knowledge share portals or embedding tools alongside their intranet and internet to bring multiple media together in one place to tell the data story.

User-Generated Content

User-generated content will also have a greater influence on data storytelling. With the rise of social media and online communities, audiences will actively participate in creating and sharing stories.

Platforms will emerge that enable collaboration between storytellers and audiences, allowing for the co-creation of narratives and fostering a sense of community around storytelling.

Tailoring narratives to the individual audience member based on their preferences, and even their emotional state, will lead to greater expectations of customization in data storytelling to enhance engagement and impact.

Moving beyond the traditional “You said, so we did” communication with customers to demonstrate how their feedback has been actioned, user-generated content will enable customers to play a more central role in sharing their experiences and expectations

These advanced tools are a complement to, and not a substitution for, the human creativity and critical thinking that great data storytelling requires. If used appropriately, they can enhance your data storytelling, but they cannot do it for you.

Whether you work with Microsoft Excel or access reports from more sophisticated business intelligence tools, such as Microsoft Power BI, Tableau, Looker Studio, or Qlik, you will still need to take those outputs and use your skills as a data storyteller to curate them in ways that are useful for your end audi­ence.

There are some great knowledge-sharing platforms out there that can integrate outputs from existing data storytelling tools and help curate content in one place. Some can be built into existing plat­forms that might be accessible within your business, like Confluence.

Some can be custom-built using external tools for a bespoke need, such as creating a micro-site for your data story using WordPress. And some can be brought in at scale to integrate with existing Microsoft or Google tools.

The list of what is available is extensive but will typically be dependent on what is available IT-wise within your own organization.

The Continuing Role Of The Human In Data Storytelling

In this evolving world, the role of the data storyteller doesn’t disap­pear but becomes ever more critical.

The human data storyteller still has many important roles to still play, and the skills necessary to influence and engage cynical, discerning, and overwhelmed audiences become even more valuable.

Now that white papers, marketing copy, internal presentations, and digital content can all be generated faster than humans could ever manage on their own, the risk of informa­tion overload becomes inevitable without a skilled storyteller to curate the content.

Today, the human data storyteller is crucial for:

  • Ensuring we are not telling “any old story” just because we can and that the story is relevant to the business context and needs.
  • Understanding the inputs being used by the tool, including limitations and potential bias, as well as ensuring data is used ethically and that it is accurate, reliable, and obtained with the appropriate permissions.
  • Framing queries appropriately in the right way to incorporate the relevant context, issues, and target audience needs to inform the knowledge base.
  • Cross-referencing and synthesizing AI-generated insights or synthetic data with human expertise and subject domain knowledge to ensure the relevance and accuracy of recommendations.
  • Leveraging the different VR, AR, and transmedia tools available to ensure the right one for the job.

To read the full book, SEJ readers have an exclusive 25% discount code and free shipping to the US and UK. Use promo code SEJ25 at koganpage.com here.

More resources: 


Featured Image: PopTika/Shutterstock

Search GPT – Can Search GPT Disrupt Google Search? via @sejournal, @Kevin_Indig

Despite initial concerns, Chat GPT has not replaced search. Q2 record earnings show Google Search does better than ever. That’s why OpenAI’s new search engine, Search GPT, makes only sense after a second look.

Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free!

$5b USD

Why would OpenAI launch a search engine if its main product poses one of the biggest threats to Google?

Image Credit: Kevin Indig

Searches for “LLM Search” are growing, but it’s not consumer demand that pulls OpenAI in that direction. There are six good reasons (in order of importance):

1/ OpenAI’s problem is that Chat GPT is not perceived as a search engine despite similar capabilities, so the company positions Search GPT as a direct Google alternative to gain more Search market share.

Rumors about launching a search engine just before Google I/O in 2024 and the impact of the actual announcement on Alphabet’s stock show the ambition to compete directly.

The Information reports that OpenAI loses $5b a year in expenses.1 Just capturing 3% of Google’s $175b Search business would allow OpenAI to recoup expenses.

Image Credit: Kevin Indig

Searches for ChatGPT on Google are growing so much, they get close to searches for “Google”. They’ve already surpassed searches for other search engines by a lot.

To be fair, people search less for “Google” on Google (maybe in their browser bar to get to the Google homepage), and traffic numbers between Google (465b, according to Similarweb) and Chat GPT (660M) are still magnitudes apart.

Image Credit: Kevin Indig

OpenAI has a strategic advantage over Google: Search GPT can provide a very different, maybe less noisy, user experience than Google because it’s not reliant on ad revenue. In any decision regarding Search, Google needs to take ads into account.

2/ OpenAI crawls the web for training data and already has half the ingredients for a search engine on the table. Consumers are already familiar with the concept of a search engine, making adoption more likely.

I have no doubt that OpenAI will see a lot of curious sign-ups for Search GPT but the bigger challenge will be retaining users.

It’s also important to point out that the market hasn’t found the final form of LLMs yet. Chatbots made sense because of their prompting nature, but voice devices will likely become much better devices for LLMs.

3/ Search can deliver better user signals than prompting because it’s a more specific use case.

The beauty of prompting is that it’s an open field. You can do whatever you want. But that’s also a disadvantage because most people have no idea what they want to do and where to start.

As a result, success and failure are harder to measure at scale for chatbots than search engines.

A search engine, despite being versatile, has clearer use cases, which could drive more adoption and deliver better signals for LLMs to learn. In return, those learnings could transfer to chatbot answers, which are a big part of Search GPT.

4/ OpenAI wants to throw publishers a lifeline to secure a content pipeline. LLM developers need fresh content to train models and serve timely answers.

Search is the biggest source of publisher traffic2, but publishers are growing more frustrated with Google due to Algorithm updates, site reputation abuse penalties and AI Overviews.

It’s good timing for OpenAI to offer another source of revenue and get publishers “on their side”, especially after OpenAI itself has received a lot of criticism from publishers and a lawsuit from the NY Times.

The launch of SearchGPT follows a long list of publisher licensing deals:

  1. News Corp (+$250 million over five years): WSJ, New York Post, The Times, The Sun
  2. Associated Press (AP)
  3. Axel Springer: Bild, Politico, Business Insider
  4. Financial Times
  5. Le Monde
  6. Reuters
  7. Prisa Media
  8. Dotdash Meredith
  9. Time magazine
  10. Vox media
  11. Wiley (one-time fee of $23 million for previously published academic articles and books)

But even the best deals don’t help if publishers cannot sustain the creation of fresh content. If Search GPT can become a new traffic and revenue source for publishers, it would be a way to keep the critical ecosystem alive and get on the good side of publishers.

5/ Perplexity is a small challenger to OpenAI, but even a small challenger can take away mind share, and you never want to underestimate the competition. A search engine would conveniently fence in their growth. Why use Perplexity when Search GPT, which looks very similar, can do the same thing?

6/ OpenAI might bet on regulators breaking up Google’s exclusive search engine deal with Apple and hope to become part of a search engine choice set on Apple devices.

Granted, we’re talking about a very small chance, and certainly not the decisive factor for building a search engine, but it could be a small factor nonetheless.

Publisher GPT

Search GPT is clearly the sibling of Chat GPT. Besides SERP Features like weather charts and table stakes features like auto-suggest, the experience feels like Chat GPT.

The differences are hard to spot at first but meaningful in their potential to drive revenue, compete with Google and strengthen OpenAI’s data mining.

But one change stands out: Search GPT has more pronounced links to web results, a clear hat tip to publishers.

The Search GPT landing page mentions the word publisher 14 times and underlines how important publishers are for the open web and how dedicated OpenAI is to working with them.

OpenAI uses a different user agent to crawl websites for its search engine than for LLM training and strongly separates the two.

Importantly, SearchGPT is about search and is separate from training OpenAI’s generative AI foundation models. Sites can be surfaced in search results even if they opt out of generative AI training.

It’s not an accident that OpenAI tries to regain its grip on the web. A recently published study3 found that 25% of words (tokens) in Common Crawl stem from domains that have now excluded AI crawlers, with OpenAI at the top of the list, in their robots.txt or ToS.

SEO Implications

The two questions every SEO is asking themselves is whether they should care about Search GPT and how it might work.

Search GPT has a chance to become relevant for SEO quickly, given Chat GPT’s adoption. The Apple Intelligence integration and a potential phone would spur adoption even more.

However, OpenAI might integrate Search GPT into Chat GPT, which could change the relevance as a traffic source.

We cannot yet know how Search GPT works because it’s not live, but one big differentiator will be whether Search GPT includes results from the broad web or only from publishers OpenAI made a deal with.

If it’s the broad web, Search GPT has a high chance of being relevant. If it’s limited to partnering publishers, SEO won’t make sense for anyone not a partner because the answer set is limited.

If Search GPT uses RAG and ranks results similar to Google’s AI Overviews, we could use AIO performance as indicator and predictor for SearchGPT performance.

There is also chance that an answer from Chat GPT for queries that don’t require QDF (query deserves freshness) is the same on Search GPT, which would give us a way to understand what works before Search GPT launches publicly. Hard to validate without access Search GPT, though.

Search GPT could gain the web’s favor by sending relevant traffic, making it easy for sites to submit content, for example, through XML sitemaps, and providing some sort of webmaster console. As a result, Search GPT would position itself even stronger against Google.

A New Way To Search

If the main benefit or Search GPT for OpenAI is a revenue stream and access to more user data, the next logical step for OpenAI is to build a (AI-powered) browser.

Browser data is incredibly valuable for understanding user behavior, personalization and LLM training. Best of all, it’s app-agnostic, so OpenAI could learn from users even when they use Perplexity or Google.

We’ve seen the power of browser data in the Google lawsuit, where it turned out Google relied on Chrome data all along for ranking. The only layer that’s more powerful is the operating system and device layer.

Image Credit: Kevin Indig

There is already news that Sam Altman is working with Jon Ivy on building a phone. No wonder since Apple holds immense power over other ecosystems and platforms.

Remember when Apple blew a $10b hole into Meta’s annual revenue? Apple could develop its own models and surface them on the OS level—a critical threat to OpenAI. A browser could alleviate at least some of that threat.

Bing released its own update to Search, giving us an idea of what Search GPT could look like. The new Bing prominently features AI answers at the top and search results on the side. A fitting metaphor for classic blue links.

Image Credit: Kevin Indig

Why OpenAI Could Lose $5 Billion This Year

Who Sends Traffic on the Web and How Much? New Research from Datos & SparkToro

Consent in Crisis: The Rapid Decline of the AI Data Commons

Cracking Open The Local SEO Bucket: Expert Strategies To Shape Success via @sejournal, @hethr_campbell

Are your local SEO efforts yielding the results you expected?

This is the perfect time to get a closer look at the strategies that drive real impact in local search rankings and user experience for the big brands. 

Local SEO isn’t just about appearing in local search results; it’s about ensuring your brand is visible, relevant, and engaging to your community. In our upcoming webinar, we’ll break down five key strategies that can significantly boost your local visibility and drive more foot traffic to your business.

On July 24, join industry expert Matt Coghlan, Director of SEO Partnerships at Uberall as he shares practical insights and proven tactics that have led to success for big brands like KFC.

Get actionable tips to help navigate the dynamic landscape of local SEO in 5 core areas:

  1. Discovery: Learn how to ensure your business is easily found by local customers.
  2. Relevance: Discover how to make your business stand out by personalizing your content and SEO strategies to meet local search goals.
  3. Experience: Improve user experience to keep visitors engaged and satisfied.
  4. Engagement: Boost engagement through reviews, social media, and local content.
  5. Conversions: Convert local traffic into paying customers with effective CTAs and optimized landing pages.

We will also delve into:

  • KFC’s Proven Strategies: Gain insights into how KFC enhanced their local SEO to drive significant results.
  • Quick Visibility Wins: Learn tactics that can immediately boost your online presence.
  • Adapting to Changes: Understand the recent shifts in local search and how to adjust your strategies accordingly.

Whether you’re a seasoned SEO professional or just starting out, this webinar is packed with info that will help you master the dynamics of local SEO.

Stay until the end for a live Q&A session where you can get your specific questions answered by our expert.

Can’t make it to the live session? No worries! Register now, and we’ll send you the recording so you won’t miss out.

Strengthen your local SEO game, enhance your digital presence, and drive impactful results for your business. Sign up today!

Google Hints Lowering SEO Value Of Country Code Top-Level Domains via @sejournal, @MattGSouthern

In a recent episode of Google’s Search Off The Record podcast, the company’s Search Relations team hinted at potential changes in how country-code top-level domains (ccTLDs) are valued for SEO.

This revelation came during a discussion on internationalization and hreflang implementation.

The Fading Importance Of ccTLDs

Gary Illyes, a senior member of Google’s Search Relations team, suggested that the localization boost traditionally associated with ccTLDs may soon be over.

Illyes stated:

“I think eventually, like in years’ time, that [ccTLD benefit] will also fade away.”

He explained that ccTLDs are becoming less reliable indicators of a website’s geographic target audience.

Creative Use Of ccTLDs For Branding

According to Illyes, the primary reason for this shift is the creative use of ccTLDs for branding purposes rather than geographic targeting.

He elaborated:

“Think about the all the funny domain names that you can buy nowadays like the .ai. I think that’s Antigua or something… It doesn’t say anything anymore about the country… it doesn’t mean that the content is for the country.”

Illyes further explained the historical context and why this change is occurring:

“One of the main algorithms that do the whole localization thing… is called something like LDCP – language demotion country promotion. So basically if you have like a .de, then for users in Germany you would get like a slight boost with your .de domain name. But nowadays, with .co or whatever .de, which doesn’t relate to Germany anymore, it doesn’t really make sense for us to like automatically apply that little boost because it’s ambiguous what the target is.”

The Impact On SEO Strategies

This change in perspective could have implications for international SEO strategies.

Traditionally, many businesses have invested in ccTLDs to gain a perceived advantage in local search results.

If Google stops using ccTLDs as a strong signal for geographic relevance, this could alter how companies approach their domain strategy for different markets.

Marketing Value Of ccTLDs

However, Illyes also noted that from a marketing perspective, there might still be some value in purchasing ccTLDs:

“I think from a marketing perspective there’s still some value in buying the ccTLDs and if I… if I were to run some… like a new business, then I would try to buy the country TLDs when I can, when like it’s monetarily feasible, but I would not worry too much about it.”

What This Means For You

As search engines become more capable of understanding content and context, traditional signals like ccTLDs may carry less weight.

This could lead to a more level playing field for websites, regardless of their domain extension.

Here are some top takeaways:

  1. If you’ve invested heavily in country-specific domains for SEO purposes, it may be time to reassess this strategy.
  2. Should the importance of ccTLDs decrease, proper implementation of hreflang tags becomes crucial for indicating language and regional targeting.
  3. While the SEO benefits may diminish, ccTLDs can still have branding and marketing value.
  4. Watch for official announcements or changes in Google’s documentation regarding using ccTLDs and international SEO best practices.

While no immediate changes were announced, this discussion provides valuable insight into the potential future direction of international SEO.

Listen to the full podcast episode below:

Google Advises Caution With AI Generated Answers via @sejournal, @martinibuster

Google’s Gary Illyes cautioned about the use of Large Language Models (LLMs), affirming the importance of checking authoritative sources before accepting any answers from an LLM. His answer was given in the context of a question, but curiously, he didn’t publish what that question was.

LLM Answer Engines

Based on what Gary Illyes said, it’s clear that the context of his recommendation is the use of AI for answering queries. The statement comes in the wake of OpenAI’s announcement of SearchGPT that they are testing an AI Search Engine prototype. It may be that his statement is not related to that announcement and is just a coincidence.

Gary first explained how LLMs craft answers to questions and mentions how a technique called “grounding” can improve the accuracy of the AI generated answers but that it’s not 100% perfect, that mistakes still slip through. Grounding is a way to connect a database of facts, knowledge, and web pages to an LLM. The goal is to ground the AI generated answers to authoritative facts.

This is what Gary posted:

“Based on their training data LLMs find the most suitable words, phrases, and sentences that align with a prompt’s context and meaning.

This allows them to generate relevant and coherent responses. But not necessarily factually correct ones. YOU, the user of these LLMs, still need to validate the answers based on what you know about the topic you asked the LLM about or based on additional reading on resources that are authoritative for your query.

Grounding can help create more factually correct responses, sure, but it’s not perfect; it doesn’t replace your brain. The internet is full of intended and unintended misinformation, and you wouldn’t believe everything you read online, so why would you LLM responses?

Alas. This post is also online and I might be an LLM. Eh, you do you.”

AI Generated Content And Answers

Gary’s LinkedIn post is a reminder that LLMs generate answers that are contextually relevant to the questions that are asked but that contextual relevance isn’t necessarily factually accurate.

Authoritativeness and trustworthiness is an important quality of the kind of content Google tries to rank. Therefore it is in publishers best interest to consistently fact check content, especially AI generated content, in order to avoid inadvertently becoming less authoritative. The need to verify facts also holds true for those who use generative AI for answers.

Read Gary’s LinkedIn Post:

Answering something from my inbox here

Featured Image by Shutterstock/Roman Samborskyi

5 Automated And AI-Driven Workflows To Scale Enterprise SEO via @sejournal, @seomeetsdesign

That’s where Ahrefs’ in-built AI translator may be a better fit for your project, solving both problems in one go:

GIF from Ahrefs Keywords Explorer, July 2024

It offers automatic translations for 40+ languages and dialects in 180+ countries, with more coming soon.

However, the biggest benefit is that you’ll get a handful of alternative translations to select from, giving you greater insight into the nuances of how people search in local markets.

For example, there are over a dozen ways to say ‘popcorn’ across all Spanish-speaking countries and dialects. The AI translator is able to detect the most popular variation in each country.

Screenshot from Ahrefs Keywords Explorer, July 2024

This, my friends, is quality international SEO on steroids.

2.   Identify The Dominant Search Intent Of Any Keyword

Search intent is the internal motivator that leads someone to look for something online. It’s the reason why they’re looking and the expectations they have about what they’d like to find.

The intent behind many keywords is often obvious. For example, it’s not rocket science to infer that people expect to purchase a product when searching any of these terms:

Screenshot from Ahrefs Keywords Explorer, July 2024

However, there are many keywords where the intent isn’t quite so clear-cut.

For instance, take the keyword “waterbed.” We could try to guess its intent, or we could use AI to analyze the top-ranking pages and give us a breakdown of the type of content most users seem to be looking for.

Gif from Ahrefs Keywords Explorer, July 2024

For this particular keyword, 89% of results skew toward purchase intent. So, it makes sense to create or optimize a product page for this term.

For the keyword “arrow fletchings,” there is a mix of different types of content ranking, like informational posts, product pages, and how-to guides.

Screenshot from Ahrefs Identify Intents, July 2024

If your brand or product lent itself to one of the popular content types, that’s what you could plan in your content calendar.

Or, you could use the data here to outline a piece of content that covers all the dominant intents in a similar proportion to what’s already ranking:

  • ~40% providing information and answers to common questions.
  • ~30% providing information on fletching products and where to buy them.
  • ~20% providing a process for a reader to make their own fletchings.
  • And so on.

For enterprises, the value of outsourcing this to AI is simple. If you guess and get it wrong, you’ll have to allocate your limited SEO funds toward fixing the mistake instead of working on new content.

It’s better to have data on your side confirming the intent of any keyword before you publish content with an intent misalignment, let alone rolling it out over multiple websites or languages!

3.   Easily Identify Missing Topics Within Your Content

Topical gap analysis is very important in modern SEO. We’ve evolved well beyond the times when simply adding keywords to your content was enough to make it rank.

However, it’s not always quick or easy to identify missing topics within your content. Generative AI can help plug gaps beyond what most content-scoring tools can identify.

For example, ChatGPT can analyze your text against competitors’ to find missing topics you can include. You could prompt it to do something like the following:

Screenshot from ChatGPT, July 2024

SIDENOTE. You’ll need to add your content and competitors’ content to complete the prompt.

Here’s an example of the list of topics it identifies:

Screenshot from ChatGPT, July 2024

And the scores and analysis it can provide for your content:

Screenshot from ChatGPT, July 2024

This goes well beyond adding words and entities, like what most content scoring tools suggest.

The scores on many of these tools can easily be manipulated, providing higher scores the more you add certain terms; even if, from a conceptual standpoint, your content doesn’t do a good job of covering a topic.

If you want the detailed analysis offered by ChatGPT but available in bulk and near-instantly… then good news. We’re working on Content Master, a content grading solution that automates topic gap analysis.

I can’t reveal too much about this yet, but it has a big USP compared to most existing content optimization tools: its content score is based on topic coverage—not just keywords.

Screenshot from Ahrefs Content Master, July 2024

You can’t just lazily copy and paste related keywords or entities into the content to improve the score.

If you rely on a pool of freelancers to create content at scale for your enterprise company, this tool will provide you with peace of mind that they aren’t taking any shortcuts.

4.   Update Search Engines With Changes On Your Website As They Happen

Have you ever made a critical change on your website, but search engines haven’t picked up on it for ages? There’s now a fix for that.

If you aren’t already aware of IndexNow, it’s time to check it out.

It tells participating search engines when a change, any change, has been made on a website. If you add, update, remove, or redirect pages, participating search engines can pick up on the changes faster.

Not all search engines have adopted this yet, including Google. However, Microsoft Bing, Yandex, Naver, Seznam.cz, and Yep all have. Once one partner is pinged, all the information is shared with the other partners making it very valuable for international organizations:

Most content management systems and delivery networks already use IndexNow and will ping search engines automatically for you. However, since many enterprise websites are built on custom ERP platforms or tech stacks, it’s worth looking into whether this is happening for the website you’re managing or not.

You could partner with the dev team to implement the free IndexNow API. Ask them to try these steps as shared by Bing if your website tech stack doesn’t already use IndexNow:

  1. Get your free IndexNow API key
  2. Place the key in your site’s root directory as a .txt file
  3. Submit your key as a URL parameter
  4. Track URL discoveries by search engines

You could also use Ahrefs instead of involving developers. You can easily connect your IndexNow API directly within Site Audit and configure your desired settings.

Here’s a quick snapshot of how IndexNow works with Ahrefs:

In short, it’s an actual real-time monitoring and alerting system, a dream come true for technical SEOs worldwide. Check out Patrick Stox’s update for all the details.

Paired with our always-on crawler, no matter what changes you’re making, you can trust search engines will be notified of any changes you want, automatically. It’s the indexing shortcut you’ve been looking for.

5.   Automatically Fix Common Technical SEO Issues

Creative SEO professionals get stuff done with or without support from other departments. Unfortunately, in many enterprise organizations, relationships between the SEO team and devs can be tenuous, affecting how many technical fixes are implemented on a website.

If you’re a savvy in-house SEO, you’ll love this new enterprise feature we’re about to drop. It’s called Patches.

It’s designed to automatically fix common technical issues with the click of a button. You will be able to launch these fixes directly from our platform using Cloudflare workers or JavaScript snippets.

Picture this:

  1. You run a technical SEO crawl.
  2. You identify key issues to fix across one page, a subset of pages, or all affected pages.
  3. With the click of a button, you fix the issue across your selected pages.
  4. Then you instantly re-crawl these pages to check the fixes are working as expected.

For example, you can make page-level fixes for pesky issues like re-writing page titles, descriptions, and headings:

Screenshot from Ahrefs Site Audit, July 2024

You can also make site-wide fixes. For example, fixing internal links to broken pages can be challenging without support from developers on large sites. With Patches, you’ll be able to roll out automatic fixes for issues like this yourself:

Screenshot from Ahrefs Site Audit, July 2024

As we grow this tool, we plan to automate over 95% of technical fixes via JavaScript snippets or Cloudflare workers, so you don’t have to rely on developers as much as you may right now. We’re also integrating AI to help you speed up the process of fixing fiddly tasks even more.

Get More Buy-In For Enterprise SEO With These Workflows

Now, as exciting and helpful as these workflows may be for you, the key is to get your boss and your boss’ boss on board.

If you’re ever having trouble getting buy-in for SEO projects or budgets for new initiatives, try using the cost savings you can pass as leverage.

For instance, you can show how, usually, three engineers would dedicate five sprints to fixing a particular issue, costing the company illions of dollars—millions, billions, bajillions, whatever it is. But with your proposed solution, you can reduce costs and free up the engineers’ time to work on high-value tasks.

You can also share the Ultimate Enterprise SEO Playbook with them. It’s designed to show executives how your team is strategically valuable and can solve many other challenges within the organization.

Google Cautions On Blocking GoogleOther Bot via @sejournal, @martinibuster

Google’s Gary Illyes answered a question about the non-search features that the GoogleOther crawler supports, then added a caution about the consequences of blocking GoogleOther.

What Is GoogleOther?

GoogleOther is a generic crawler created by Google for the various purposes that fall outside of those of bots that specialize for Search, Ads, Video, Images, News, Desktop and Mobile. It can be used by internal teams at Google for research and development in relation to various products.

The official description of GoogleOther is:

“GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.”

Something that may be surprising is that there are actually three kinds of GoogleOther crawlers.

Three Kinds Of GoogleOther Crawlers

  1. GoogleOther
    Generic crawler for public URLs
  2. GoogleOther-Image
    Optimized to crawl public image URLs
  3. GoogleOther-Video
    Optimized to crawl public video URLs

All three GoogleOther crawlers can be used for research and development purposes. That’s just one purpose that Google publicly acknowledges that all three versions of GoogleOther could be used for.

What Non-Search Features Does GoogleOther Support?

Google doesn’t say what specific non-search features GoogleOther supports, probably because it doesn’t really “support” a specific feature. It exists for research and development crawling which could be in support of a new product or an improvement in a current product, it’s a highly open and generic purpose.

This is the question asked that Gary narrated:

“What non-search features does GoogleOther crawling support?”

Gary Illyes answered:

“This is a very topical question, and I think it is a very good question. Besides what’s in the public I don’t have more to share.

GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.

Historically Googlebot was used for this, but that kind of makes things murky and less transparent, so we launched GoogleOther so you have better controls over what your site is crawled for.

That said GoogleOther is not tied to a single product, so opting out of GoogleOther crawling might affect a wide range of things across the Google universe; alas, not Search, search is only Googlebot.”

It Might Affect A Wide Range Of Things

Gary is clear that blocking GoogleOther wouldn’t have an affect on Google Search because Googlebot is the crawler used for indexing content. So if blocking any of the three versions of GoogleOther is something a site owner wants to do, then it should be okay to do that without a negative effect on search rankings.

But Gary also cautioned about the outcome that blocking GoogleOther, saying that it would have an effect on other products and services across Google. He didn’t state which other products it could affect nor did he elaborate on the pros or cons of blocking GoogleOther.

Pros And Cons Of Blocking GoogleOther

Whether or not to block GoogleOther doesn’t necessarily have a straightforward answer. There are several considerations to whether doing that makes sense.

Pros

Inclusion in research for a future Google product that’s related to search (maps, shopping, images, a new feature in search) could be useful. It might be helpful to have a site included in that kind of research because it might be used for testing something good for a site and be one of the few sites chosen to test a feature that could increase earnings for a site.

Another consideration is that blocking GoogleOther to save on server resources is not necessarily a valid reason because GoogleOther doesn’t seem to crawl so often that it makes a noticeable impact.

If blocking Google from using site content for AI is a concern then blocking GoogleOther will have no impact on that at all. GoogleOther has nothing to do with crawling for Google Gemini apps or Vertex AI, including any future products that will be used for training associated language models. The bot for that specific use case is Google-Extended.

Cons

On the other hand it might not be helpful to allow GoogleOther if it’s being used to test something related to fighting spam and there’s something the site has to hide.

It’s possible that a site owner might not want to participate if GoogleOther comes crawling for market research or for training machine learning models (for internal purposes) that are unrelated to public-facing products like Gemini and Vertex.

Allowing GoogleOther to crawl a site for unknown purposes is like giving Google a blank check to use your site data in any way they see fit outside of training public-facing LLMs or purposes related to named bots like GoogleBot.

Takeaway

Should you block GoogleOther? It’s a coin toss. There are possible potential benefits but in general there isn’t enough information to make an informed decision.

Listen to the Google SEO Office Hours podcast at the 1:30 minute mark:

Featured Image by Shutterstock/Cast Of Thousands

What Can AI Do For Healthcare Marketing In 2024? via @sejournal, @CallRail

This post was sponsored by CallRail. The opinions expressed in this article are the sponsor’s own.

Artificial intelligence (AI) has huge potential for healthcare practices. It can assist with diagnosis and treatment, as well as administrative and marketing tasks. Yet, many practices are still wary of using AI, especially regarding marketing.

The reality is that AI is here to stay, and many healthcare practices are beginning to use the technology. According to one recent study, 89% of healthcare professionals surveyed said that they were at least evaluating AI products, experimenting with them, or had implemented AI.

To help you determine whether using AI is right for your healthcare practice, let’s take a look at some of the pros and cons of using AI while marketing.

The Pros And Cons Of AI For Healthcare Practices

Healthcare practices that choose to implement AI in safe and appropriate ways to help them with their marketing and patient experience efforts can reap many benefits, including more leads, conversions, and satisfied patients. In fact, 41% of healthcare organizations say their marketing team already uses AI.

Patients also expect healthcare practices to begin to implement AI in a number of ways. In one dentistry study, patients overall showed a positive attitude toward using AI. So, what’s holding your practice back from adding new tools and finding new use cases for AI? Let’s take a look at common concerns.

Con #1: Data Security And Privacy Concerns

Let’s get one of the biggest concerns with AI and healthcare out of the way first. Healthcare practices must follow all privacy and security regulations related to patients’ protected health information (PHI) to maintain HIPAA compliance.

So, concerns over whether AI can be used in a way that doesn’t interfere with HIPAA compliance are valid. In addition, there are also concerns about the open-source nature of popular GenAI models, which means sensitive practice data might be exposed to competitors or even hackers.

Pro #1: AI Can Help You Get More Value From Your Data Securely

While there are valid concerns about how AI algorithms make decisions and data privacy concerns, AI can also be used to enrich data to help you achieve your marketing goals while still keeping it protected.

With appropriate guardrails and omission procedures in place, you can apply AI to gain insights from data that matters to you without putting sensitive data at risk.

For example, our CallRail Labs team is helping marketers remove their blind spots by using AI to analyze and detect critical context clues that help you qualify which calls are your best leads so you can follow up promptly.

At the same time, we know how important it is for healthcare companies to keep PHI secure, which is why we integrate with healthcare privacy platforms like Freshpaint. It can help you bridge the gap between patient privacy and digital marketing.

In addition, our AI-powered Healthcare Plan automatically redacts sensitive patient-protected health information from call transcripts, enforces obligatory log-outs to prevent PHI from becoming public, provides full audit trail logging, and even features unique logins and credentials for every user, which helps eliminate the potential for PHI to be accidentally exposed to employees who don’t need access to that information.

Con #2: AI Is Impersonal

Having a good patient experience is important to almost all patients, and according to one survey, 52% of patients said a key part of a good patient experience is being treated with respect. Almost as many (46%) said they want to be addressed as a person. Given these concerns, handing over content creation or customer interactions to AI can feel daunting. While an AI-powered chatbot might be more efficient than a human in a call center, you also don’t want patients to feel like you’ve delegated customer service to a robot. Trust is the key to building patient relationships.

Pro #2: AI Can Improve The Patient Experience

Worries over AI making patient interactions feel impersonal are reasonable, but just like any other type of tool, it’s how you use AI that matters. There are ways to deploy AI that can actually enhance the patient experience and, by doing so, give your healthcare practice an advantage over your competitors.

The answer isn’t in offloading customer interaction to chatbots. But AI can help you analyze customer interactions to make customer service more efficient and helpful.

With CallRail’s AI-powered Premium Conversation Intelligence™, which transcribes, summarizes, and analyzes each call, you can quickly assess your patients’ needs and concerns and respond appropriately with a human touch. For instance, Premium Conversation Intelligence can identify and extract common keywords and topics from call transcripts. This data reveals recurring themes, such as frequently asked questions, common complaints, and popular services. A healthcare practice could then use these insights to tailor their marketing campaigns to address the most pressing patient concerns.

Con #3: AI Seems Too Complicated To Use

Let’s face it: new technology is risky, and for healthcare practices especially, risk is scary. With AI, some of the risk comes from its perceived complexity. Identifying the right use cases for your practice, selecting the right tools, training your staff, and changing workflows can all feel quite daunting. Figuring this out takes time and money. And, if there aren’t clear use cases and ROI attached, the long-term benefits may not be worth the short-term impact on business.

Pro #3: AI Can Save Time And Money

Using a computer or a spreadsheet for the first time probably also felt complicated – and on the front end, took some time to learn. However, you know that using these tools, compared to pen, paper, and calculators, has saved an enormous amount of time, making the upfront investment clearly worth it. Compared to many technologies, AI tools are often intuitive and only require you to learn a few simple things like writing prompts, refining prompts, reviewing reports, etc. Even if it takes some time to learn new AI tools, the time savings will be worth it once you do.

To get the greatest return on investment, focus on AI solutions that take care of time-intensive tasks to free up time for innovation. With the right use cases and tools, AI can help solve complexity without adding complexity. For example, with Premium Conversation Intelligence, our customers spend 60% less time analyzing calls each week, and they’re using that time to train staff better, increase their productivity, and improve the patient experience.

Con #4: AI Marketing Can Hurt Your Brand

Many healthcare practices are excited to use GenAI tools to accelerate creative marketing efforts, like social media image creation and article writing. But consumers are less excited. In fact, consumers are more likely to say that the use of AI makes them distrusting (40%), rather than trusting (19%), of a brand. In a market where trust is the most important factor for patients when choosing healthcare providers, there is caution and hesitancy around using GenAI for marketing.

Pro #4: AI Helps Make Your Marketing Better

While off-brand AI images shared on social media can be bad brand marketing, there are many ways AI can elevate your marketing efforts without impacting the brand perception. From uncovering insights to improving your marketing campaigns and maximizing the value of each marketing dollar spent to increasing lead conversion rates and decreasing patient churn, AI can help you tackle these problems faster and better than ever.

At CallRail, we’re using AI to tackle complex challenges like multi-conversation insights. CallRail can give marketers instant access to a 3-6 sentence summary for each call, average call sentiment, notable trends behind positive and negative interactions, and a summary of commonly asked questions. Such analysis would take hours and hours for your marketing team to do manually, but with AI, you have call insights at your fingertips to help drive messaging and keyword decisions that can improve your marketing attribution and the patient experience.

Con #5: Adapting AI Tools Might Cause Disruption

As a modern healthcare practice, your tech stack is the engine that runs your business. When onboarding any new technology, there are always concerns about how well it will integrate with existing technology and tools you use and whether it supports HIPAA compliance. There may also be concern about how AI tools can fit into your existing workflows without causing disruption.

Pro #5: AI Helps People Do Their Jobs Better

Pairing the right AI tool for roles with repetitive tasks can be a win for your staff and your practice. For example, keeping up with healthcare trends is important for marketers to improve messaging and campaigns.

An AI-powered tool that analyzes conversations and provides call highlights can help healthcare marketers identify keyword and Google Ad opportunities so they can focus on implementing the most successful marketing strategy rather than listening to hours of call recordings. In addition, CallRail’s new AI-powered Convert Assist helps healthcare marketers provide a better patient experience. With AI-generated call coaching, marketers can identify what went well and what to improve after every conversation.

What’s more, with a solution like CallRail, which offers a Healthcare Plan and will sign a business associate agreement (BAA), you are assured that we will comply with HIPAA controls within our service offerings to ensure that your call tracking doesn’t expose you to potential fines or litigation. Moreover, we also integrate with other marketing tools, like Google Ads, GA4, and more, making it easy to integrate our solution into your existing technologies and workflows.

Let CallRail Show You The Pros Of AI

If you’re still worried about using AI in your healthcare practice, start with a trusted solution like CallRail that has proven ROI for AI-powered tools and a commitment to responsible AI development. You can talk to CallRail’s experts or test the product out for yourself with a 14-day free trial.


Image Credits

Featured Image: Image by CallRail. Used with permission.

Find Keyword Cannibalization Using OpenAI’s Text Embeddings With Examples via @sejournal, @vahandev

This new series of articles focuses on working with LLMs to scale your SEO tasks. We hope to help you integrate AI into SEO so you can level up your skills.

We hope you enjoyed the previous article and understand what vectors, vector distance, and text embeddings are.

Following this, it’s time to flex your “AI knowledge muscles” by learning how to use text embeddings to find keyword cannibalization.

We will start with OpenAI’s text embeddings and compare them.

Model Dimensionality Pricing Notes
text-embedding-ada-002 1536 $0.10 per 1M tokens Great for most use cases.
text-embedding-3-small 1536 $0.002 per 1M tokens Faster and cheaper but less accurate
text-embedding-3-large 3072 $0.13 per 1M tokens More accurate for complex long text-related tasks, slower

(*tokens can be considered as words words.)

But before we start, you need to install Python and Jupyter on your computer.

Jupyter is a web-based tool for professionals and researchers. It allows you to perform complex data analysis and machine learning model development using any programming language.

Don’t worry – it’s really easy and takes little time to finish the installations. And remember, ChatGPT is your friend when it comes to programming.

In a nutshell:

  • Download and install Python.
  • Open your Windows command line or terminal on Mac.
  • Type this commands pip install jupyterlab and pip install notebook
  • Run Jupiter by this command: jupyter lab

We will use Jupyter to experiment with text embeddings; you’ll see how fun it is to work with!

But before we start, you must sign up for OpenAI’s API and set up billing by filling your balance.

Open AI Api Billing settingsOpen AI Api Billing settings

Once you’ve done that, set up email notifications to inform you when your spending exceeds a certain amount under Usage limits.

Then, obtain API keys under Dashboard > API keys, which you should keep private and never share publicly.

OpenAI API keysOpenAI API keys

Now, you have all the necessary tools to start playing with embeddings.

  • Open your computer command terminal and type jupyter lab.
  • You should see something like the below image pop up in your browser.
  • Click on Python 3 under Notebook.
jupyter labjupyter lab

In the opened window, you will write your code.

As a small task, let’s group similar URLs from a CSV. The sample CSV has two columns: URL and Title. Our script’s task will be to group URLs with similar semantic meanings based on the title so we can consolidate those pages into one and fix keyword cannibalization issues.

Here are the steps you need to do:

Install required Python libraries with the following commands in your PC’s terminal (or in Jupyter notebook)

pip install pandas openai scikit-learn numpy unidecode

The ‘openai’ library is required to interact with the OpenAI API to get embeddings, and ‘pandas’ is used for data manipulation and handling CSV file operations.

The ‘scikit-learn’ library is necessary for calculating cosine similarity, and ‘numpy’ is essential for numerical operations and handling arrays. Lastly, unidecode is used to clean text.

Then, download the sample sheet as a CSV, rename the file to pages.csv, and upload it to your Jupyter folder where your script is located.

Set your OpenAI API key to the key you obtained in the step above, and copy-paste the code below into the notebook.

Run the code by clicking the play triangle icon at the top of the notebook.


import pandas as pd
import openai
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import csv
from unidecode import unidecode

# Function to clean text
def clean_text(text: str) -> str:
    # First, replace known problematic characters with their correct equivalents
    replacements = {
        '–': '–',   # en dash
        '’': '’',   # right single quotation mark
        '“': '“',   # left double quotation mark
        '”': '”',   # right double quotation mark
        '‘': '‘',   # left single quotation mark
        'â€': '—'     # em dash
    }
    for old, new in replacements.items():
        text = text.replace(old, new)
    # Then, use unidecode to transliterate any remaining problematic Unicode characters
    text = unidecode(text)
    return text

# Load the CSV file with UTF-8 encoding from root folder of Jupiter project folder
df = pd.read_csv('pages.csv', encoding='utf-8')

# Clean the 'Title' column to remove unwanted symbols
df['Title'] = df['Title'].apply(clean_text)

# Set your OpenAI API key
openai.api_key = 'your-api-key-goes-here'

# Function to get embeddings
def get_embedding(text):
    response = openai.Embedding.create(input=[text], engine="text-embedding-ada-002")
    return response['data'][0]['embedding']

# Generate embeddings for all titles
df['embedding'] = df['Title'].apply(get_embedding)

# Create a matrix of embeddings
embedding_matrix = np.vstack(df['embedding'].values)

# Compute cosine similarity matrix
similarity_matrix = cosine_similarity(embedding_matrix)

# Define similarity threshold
similarity_threshold = 0.9  # since threshold is 0.1 for dissimilarity

# Create a list to store groups
groups = []

# Keep track of visited indices
visited = set()

# Group similar titles based on the similarity matrix
for i in range(len(similarity_matrix)):
    if i not in visited:
        # Find all similar titles
        similar_indices = np.where(similarity_matrix[i] >= similarity_threshold)[0]
        
        # Log comparisons
        print(f"nChecking similarity for '{df.iloc[i]['Title']}' (Index {i}):")
        print("-" * 50)
        for j in range(len(similarity_matrix)):
            if i != j:  # Ensure that a title is not compared with itself
                similarity_value = similarity_matrix[i, j]
                comparison_result = 'greater' if similarity_value >= similarity_threshold else 'less'
                print(f"Compared with '{df.iloc[j]['Title']}' (Index {j}): similarity = {similarity_value:.4f} ({comparison_result} than threshold)")

        # Add these indices to visited
        visited.update(similar_indices)
        # Add the group to the list
        group = df.iloc[similar_indices][['URL', 'Title']].to_dict('records')
        groups.append(group)
        print(f"nFormed Group {len(groups)}:")
        for item in group:
            print(f"  - URL: {item['URL']}, Title: {item['Title']}")

# Check if groups were created
if not groups:
    print("No groups were created.")

# Define the output CSV file
output_file = 'grouped_pages.csv'

# Write the results to the CSV file with UTF-8 encoding
with open(output_file, 'w', newline='', encoding='utf-8') as csvfile:
    fieldnames = ['Group', 'URL', 'Title']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    
    writer.writeheader()
    for group_index, group in enumerate(groups, start=1):
        for page in group:
            cleaned_title = clean_text(page['Title'])  # Ensure no unwanted symbols in the output
            writer.writerow({'Group': group_index, 'URL': page['URL'], 'Title': cleaned_title})
            print(f"Writing Group {group_index}, URL: {page['URL']}, Title: {cleaned_title}")

print(f"Output written to {output_file}")

This code reads a CSV file, ‘pages.csv,’ containing titles and URLs, which you can easily export from your CMS or get by crawling a client website using Screaming Frog.

Then, it cleans the titles from non-UTF characters, generates embedding vectors for each title using OpenAI’s API, calculates the similarity between the titles, groups similar titles together, and writes the grouped results to a new CSV file, ‘grouped_pages.csv.’

In the keyword cannibalization task, we use a similarity threshold of 0.9, which means if cosine similarity is less than 0.9, we will consider articles as different. To visualize this in a simplified two-dimensional space, it will appear as two vectors with an angle of approximately 25 degrees between them.

<span class=

In your case, you may want to use a different threshold, like 0.85 (approximately 31 degrees between them), and run it on a sample of your data to evaluate the results and the overall quality of matches. If it is unsatisfactory, you can increase the threshold to make it more strict for better precision.

You can install ‘matplotlib’ via terminal.

pip install matplotlib

And use the Python code below in a separate Jupyter notebook to visualize cosine similarities in two-dimensional space on your own. Try it; it’s fun!


import matplotlib.pyplot as plt
import numpy as np

# Define the angle for cosine similarity of 0.9. Change here to your desired value. 
theta = np.arccos(0.9)

# Define the vectors
u = np.array([1, 0])
v = np.array([np.cos(theta), np.sin(theta)])

# Define the 45 degree rotation matrix
rotation_matrix = np.array([
    [np.cos(np.pi/4), -np.sin(np.pi/4)],
    [np.sin(np.pi/4), np.cos(np.pi/4)]
])

# Apply the rotation to both vectors
u_rotated = np.dot(rotation_matrix, u)
v_rotated = np.dot(rotation_matrix, v)

# Plotting the vectors
plt.figure()
plt.quiver(0, 0, u_rotated[0], u_rotated[1], angles='xy', scale_units='xy', scale=1, color='r')
plt.quiver(0, 0, v_rotated[0], v_rotated[1], angles='xy', scale_units='xy', scale=1, color='b')

# Setting the plot limits to only positive ranges
plt.xlim(0, 1.5)
plt.ylim(0, 1.5)

# Adding labels and grid
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.grid(True)
plt.title('Visualization of Vectors with Cosine Similarity of 0.9')

# Show the plot
plt.show()

I usually use 0.9 and higher for identifying keyword cannibalization issues, but you may need to set it to 0.5 when dealing with old article redirects, as old articles may not have nearly identical articles that are fresher but partially close.

It may also be better to have the meta description concatenated with the title in case of redirects, in addition to the title.

So, it depends on the task you are performing. We will review how to implement redirects in a separate article later in this series.

Now, let’s review the results with the three models mentioned above and see how they were able to identify close articles from our data sample from Search Engine Journal’s articles.

Data SampleData Sample

From the list, we already see that the 2nd and 4th articles cover the same topic on ‘meta tags.’ The articles in the 5th and 7th rows are pretty much the same – discussing the importance of H1 tags in SEO – and can be merged.

The article in the 3rd row doesn’t have any similarities with any of the articles in the list but has common words like “Tag” or “SEO.”

The article in the 6th row is again about H1, but not exactly the same as H1’s importance to SEO. Instead, it represents Google’s opinion on whether they should match.

Articles on the 8th and 9th rows are quite close but still different; they can be combined.

text-embedding-ada-002

By using ‘text-embedding-ada-002,’ we precisely found the 2nd and 4th articles with a cosine similarity of 0.92 and the 5th and 7th articles with a similarity of 0.91.

Screenshot from Jupyter log showing cosine similaritiesScreenshot from Jupyter log showing cosine similarities

And it generated output with grouped URLs by using the same group number for similar articles. (colors are applied manually for visualization purposes).

Output sheet with grouped URLsOutput sheet with grouped URLs

For the 2nd and 3rd articles, which have common words “Tag” and “SEO” but are unrelated, the cosine similarity was 0.86. This shows why a high similarity threshold of 0.9 or greater is necessary. If we set it to 0.85, it would be full of false positives and could suggest merging unrelated articles.

text-embedding-3-small

By using ‘text-embedding-3-small,’ quite surprisingly, it didn’t find any matches per our similarity threshold of 0.9 or higher.

For the 2nd and 4th articles, cosine similarity was 0.76, and for the 5th and 7th articles, with similarity 0.77.

To better understand this model through experimentation, I’ve added a slightly modified version of the 1st row with ’15’ vs. ’14’ to the sample.

  1. “14 Most Important Meta And HTML Tags You Need To Know For SEO”
  2. “15 Most Important Meta And HTML Tags You Need To Know For SEO”
Example which shows text-embedding-3-small resultsAn example which shows text-embedding-3-small results

On the contrary, ‘text-embedding-ada-002’ gave 0.98 cosine similarity between those versions.

Title 1 Title 2 Cosine Similarity
14 Most Important Meta And HTML Tags You Need To Know For SEO 15 Most Important Meta And HTML Tags You Need To Know For SEO 0.92
14 Most Important Meta And HTML Tags You Need To Know For SEO Meta Tags: What You Need To Know For SEO 0.76

Here, we see that this model is not quite a good fit for comparing titles.

text-embedding-3-large

This model’s dimensionality is 3072, which is 2 times higher than that of ‘text-embedding-3-small’ and ‘text-embedding-ada-002′, with 1536 dimensionality.

As it has more dimensions than the other models, we could expect it to capture semantic meaning with higher precision.

However, it gave the 2nd and 4th articles cosine similarity of 0.70 and the 5th and 7th articles similarity of 0.75.

I’ve tested it again with slightly modified versions of the first article with ’15’ vs. ’14’ and without ‘Most Important’ in the title.

  1. “14 Most Important Meta And HTML Tags You Need To Know For SEO”
  2. “15 Most Important Meta And HTML Tags You Need To Know For SEO”
  3. “14 Meta And HTML Tags You Need To Know For SEO”
Title 1 Title 2 Cosine Similarity
14 Most Important Meta And HTML Tags You Need To Know For SEO 15 Most Important Meta And HTML Tags You Need To Know For SEO 0.95
14 Most Important Meta And HTML Tags You Need To Know For SEO 14 Most Important Meta And HTML Tags You Need To Know For SEO 0.93
14 Most Important Meta And HTML Tags You Need To Know For SEO Meta Tags: What You Need To Know For SEO 0.70
15 Most Important Meta And HTML Tags You Need To Know For SEO 14 Most Important  Meta And HTML Tags You Need To Know For SEO 0.86

So we can see that ‘text-embedding-3-large’ is underperforming compared to ‘text-embedding-ada-002’ when we calculate cosine similarities between titles.

I want to note that the accuracy of ‘text-embedding-3-large’ increases with the length of the text, but ‘text-embedding-ada-002’ still performs better overall.

Another approach could be to strip away stop words from the text. Removing these can sometimes help focus the embeddings on more meaningful words, potentially improving the accuracy of tasks like similarity calculations.

The best way to determine whether removing stop words improves accuracy for your specific task and dataset is to empirically test both approaches and compare the results.

Conclusion

With these examples, you have learned how to work with OpenAI’s embedding models and can already perform a wide range of tasks.

For similarity thresholds, you need to experiment with your own datasets and see which thresholds make sense for your specific task by running it on smaller samples of data and performing a human review of the output.

Please note that the code we have in this article is not optimal for large datasets since you need to create text embeddings of articles every time there is a change in your dataset to evaluate against other rows.

To make it efficient, we must use vector databases and store embedding information there once generated. We will cover how to use vector databases very soon and change the code sample here to use a vector database.

More resources: 


Featured Image: BestForBest/Shutterstock

Study Backs Google’s Claims: AI Search Boosts User Satisfaction via @sejournal, @MattGSouthern

A new study finds that despite concerns about AI in online services, users are more satisfied with search engines and social media platforms than before.

The American Customer Satisfaction Index (ACSI) conducted its annual survey of search and social media users, finding that satisfaction has either held steady or improved.

This comes at a time when major tech companies are heavily investing in AI to enhance their services.

Search Engine Satisfaction Holds Strong

Google, Bing, and other search engines have rapidly integrated AI features into their platforms over the past year. While critics have raised concerns about potential negative impacts, the ACSI study suggests users are responding positively.

Google maintains its position as the most satisfying search engine with an ACSI score of 81, up 1% from last year. Users particularly appreciate its AI-powered features.

Interestingly, Bing and Yahoo! have seen notable improvements in user satisfaction, notching 3% gains to reach scores of 77 and 76, respectively. These are their highest ACSI scores in over a decade, likely due to their AI enhancements launched in 2023.

The study hints at the potential of new AI-enabled search functionality to drive further improvements in the customer experience. Bing has seen its market share improve by small but notable margins, rising from 6.35% in the first quarter of 2023 to 7.87% in Q1 2024.

Customer Experience Improvements

The ACSI study shows improvements across nearly all benchmarks of the customer experience for search engines. Notable areas of improvement include:

  • Ease of navigation
  • Ease of using the site on different devices
  • Loading speed performance and reliability
  • Variety of services and information
  • Freshness of content

These improvements suggest that AI enhancements positively impact various aspects of the search experience.

Social Media Sees Modest Gains

For the third year in a row, user satisfaction with social media platforms is on the rise, increasing 1% to an ACSI score of 74.

TikTok has emerged as the new industry leader among major sites, edging past YouTube with a score of 78. This underscores the platform’s effective use of AI-driven content recommendations.

Meta’s Facebook and Instagram have also seen significant improvements in user satisfaction, showing 3-point gains. While Facebook remains near the bottom of the industry at 69, Instagram’s score of 76 puts it within striking distance of the leaders.

Challenges Remain

Despite improvements, the study highlights ongoing privacy and advertising challenges for search engines and social media platforms. Privacy ratings for search engines remain relatively low but steady at 79, while social media platforms score even lower at 73.

Advertising experiences emerge as a key differentiator between higher- and lower-satisfaction brands, particularly in social media. New ACSI benchmarks reveal user concerns about advertising content’s trustworthiness and personal relevance.

Why This Matters For SEO Professionals

This study provides an independent perspective on how users are responding to the AI push in online services. For SEO professionals, these findings suggest that:

  1. AI-enhanced search features resonate with users, potentially changing search behavior and expectations.
  2. The improving satisfaction with alternative search engines like Bing may lead to a more diverse search landscape.
  3. The continued importance of factors like content freshness and site performance in user satisfaction aligns with long-standing SEO best practices.

As AI becomes more integrated into our online experiences, SEO strategies may need to adapt to changing user preferences.


Featured Image: kate3155/Shutterstock