Google is rolling out two enhancements to AI Mode in Labs: Gemini 2.5 Pro and Deep Search.
These capabilities are exclusive to users subscribed to Google’s AI Pro and AI Ultra plans.
Gemini 2.5 Pro Now Available In AI Mode
Subscribers can now access Gemini 2.5 Pro from a dropdown menu within the AI Mode tab.
Screenshot from: blog.google/products/search/deep-search-business-calling-google-search, July 2025.
While the default model remains available for general queries, the 2.5 Pro model is designed to handle more complex prompts, particularly those involving reasoning, mathematics, or coding.
In an example shared by Google, the model walks through a multi-step physics problem involving gravitational fields, showing how it can solve equations and explain its reasoning with supporting links.
Screenshot from: blog.google/products/search/deep-search-business-calling-google-search, July 2025.
Deep Search Offers AI-Assisted Research
Today’s update also introduces Deep Search, which Google describes as a tool for conducting more comprehensive research.
The feature can generate detailed, citation-supported reports by processing multiple searches and aggregating information across sources.
Google stated in its announcement:
“Deep Search is especially useful for in-depth research related to your job, hobbies, or studies.”
Availability & Rollout
These features are currently limited to users in the United States who subscribe to Google’s AI Pro or AI Ultra plans and have opted into AI Mode through Google Labs.
Google hasn’t provided a firm timeline for when all eligible users will receive access, but rollout has begun.
The “experimental” label on Gemini 2.5 Pro suggests continued adjustments based on user testing.
What This Means
The launch of Deep Search and Gemini 2.5 Pro reflects Google’s broader effort to incorporate generative AI into the search experience.
For marketers, the shift raises questions about visibility in a time when AI-generated summaries and reports may increasingly shape user behavior.
If Deep Search becomes a commonly used tool for information gathering, the structure and credibility of content could play a larger role in discoverability.
Gemini 2.5 Pro’s focus on reasoning and code-related queries makes it relevant for more technical users. Google has positioned it as capable of helping with debugging, code generation, and explanation of advanced concepts. Similar to tools like ChatGPT’s coding features or GitHub Copilot.
Its integration into Search may appeal to users who want technical assistance without leaving the browser environment.
Looking Ahead
The addition of these features behind a paywall continues Google’s movement toward monetizing AI capabilities through subscription services.
While billed as experimental, these updates may provide early insight into how the company envisions the future of AI in search: more automated, task-oriented, and user-specific.
Search professionals will want to monitor how these features evolve, as tools like Deep Search could become more widely adopted.
Google has introduced a new AI-powered calling feature in Search that contacts local businesses on a user’s behalf to gather pricing and availability details.
The feature, rolling out to all U.S. Search users this week, allows people to request information from multiple businesses with a single query.
When searching for services like pet grooming or dry cleaning, users may now see a new option to “Have AI check pricing.”
How It Works
After selecting the AI option, users are guided through a form to provide details about the service they need.
Google’s AI then calls relevant local businesses to gather information such as pricing, appointment availability, and service options. The responses are consolidated and presented to the user.
The experience starts with a typical local search, such as “pet groomers near me.” If the AI calling feature is available, users can specify details like:
Pet type, breed, and size
Requested services (e.g., bath, nail trim, haircut)
Time preferences (e.g., within 48 hours)
Preferred method of communication (SMS or email)
According to a Google spokesperson, the AI determines which businesses to contact based on traditional local search rankings. Only those that appear in results for the relevant query and match the user’s criteria will be contacted.
What It Looks Like
Examples show a multi-step process where users enter information and confirm their request.
Google displays responses from participating businesses, including prices and availability, all gathered through automated calls.
Before submitting a request, users must confirm that Google can call businesses and share the submitted details. The process is governed by Google’s privacy policy, and users are informed of how their data will be used.
Business Participation & Control
Businesses can manage whether they receive these AI-driven calls via their Business Profile settings.
Google describes the feature as creating “new opportunities” to connect with potential customers, while also giving businesses control over participation.
Available to All (With Premium Perks)
The AI calling feature is available to all users in the U.S., though Google AI Pro and AI Ultra subscribers benefit from higher usage limits.
Google says more agentic AI features will debut for these subscribers before expanding globally.
What This Means
Because the AI selects businesses using standard local search rankings, maintaining strong local SEO becomes even more important.
Businesses with optimized listings and higher rankings are more likely to receive calls and capture leads.
This could also shift how businesses handle inbound requests. Those that rely on phone calls may want to prepare staff or systems to handle more frequent, possibly scripted, AI-initiated inquiries.
Looking Ahead
By automating time-consuming tasks like gathering service quotes, Google aims to make Search more actionable.
Adoption will depend on how well the AI handles real-world complexity, as well as how many businesses opt in.
For marketers and local service providers, it’s another sign that search visibility directly connects to lead generation. Keeping Business Profile data accurate and staying visible in local results could increasingly determine whether a business gets contacted at all.
Google Search Console Core Web Vitals (CWV) reporting for mobile is experiencing a dip that is confirmed to be related to the Chrome User Experience Report (CrUX). Search Console CWV reports for mobile performance show a marked dip beginning around July 10, at which point the reporting appears to stop completely.
“Hey @johnmu.com is there a known issue or bug with Core Web Vitals reporting in Search Console? Seeing a sudden massive drop in reported URLs (both “good” and “needs improvement”) on mobile as of July 14.”
The person referred to July 14th, but that’s the date the reporting hit zero. The drop actually starts closer to July 10th, which you can see when you hover a cursor at the point that the drops begin.
Google’s John Mueller responded:
“These reports are based on samples of what we know for your site, and sometimes the overall sample size for a site changes. That’s not indicative of a problem. I’d focus on the samples with issues (in your case it looks fine), rather than the absolute counts.”
The person who initially started the discussion responded to inform Mueller that this isn’t just on his site, the peculiar drop in reporting is happening on other sites.
Mueller was unaware of any problem with CWV reporting so he naturally assumed that this was an artifact of natural changes in Internet traffic and user behavior. So his next response continued under the assumption that this wasn’t a widespread issue:
He responded:
“That can happen. The web is dynamic and alive – our systems have to readjust these samples over time.”
Then Jamie Indigo responded to confirm she’s seeing it, too.
“Hey John! Thanks for responding 🙂 It seems like … everyone beyond the usual ebb and flow. Confirming nothing in the mechanics have changed?”
At this point it was becoming clear that this weird behavior wasn’t isolated to just one site and Mueller’s response to Jamie reflected this growing awareness. Mueller confirmed that there’s nothing happening on the Search Console side, leaving it open about the CrUX side of the Core Web Vitals reporting.
His response:
“Correct, nothing in the mechanics changed (at least with regards to Search Console — I’m also not aware of anything on the Chrome / CrUX side, but I’m not as involved there).”
CrUX CWV Field Data
CrUX is the acronym for the Chrome User Experience report. It’s CWV reporting based on real website visits. The data is collected from Chrome browser website visits by users who have opted in to reporting their data for the report.
“The Chrome User Experience Report (also known as the Chrome UX Report, or CrUX for short) is a dataset that reflects how real-world Chrome users experience popular destinations on the web.
CrUX is the official dataset of the Web Vitals program. All user-centric Core Web Vitals metrics are represented.
CrUX data is collected from real browsers around the world, based on certain browser options which determine user eligibility. A set of dimensions and metrics are collected which allow site owners to determine how users experience their sites.”
Core Web Vitals Reporting Outage Is Widespread
At this point more people joined the conversation, with Alan Bleiweiss offering both a comment and a screenshot showing the same behavior where the reporting completely drops off is happening on the Search Console CWV reports for other websites.
He posted:
“oooh Google had to slow down server requests to set aside more power to keep the swimming pools cool as the summer heats up.”
Here’s a closeup detail of Alan’s screenshot of a Search Console CWV report:
Screenshot Of CWV Report Showing July 10 Drop
I searched the Chrome Lighthouse changelog to see if there’s anything there that corresponds to the drop but nothing stood out.
So what is going on?
CWV Reporting Outage Is Confirmed
I next checked the X and Bluesky accounts of Googlers who work on the Chrome team and found a post by Barry Pollard, Web Performance Developer Advocate on Google Chrome, who had posted about this issue last week.
Barry posted a note about a reporting outage on Bluesky:
“We’ve noticed another dip on the metrics this month, particularly on mobile. We are actively investigating this and have a potential reason and fix rolling out to reverse this temporary dip. We’ll update further next month. Other than that, there are no further announcements this month.”
Takeaways
Google Search Console Core Web Vitals (CWV) data drop: A sudden stop in CWV reporting was observed in Google Search Console around July 10, especially on mobile.
Issue is widespread, not site-specific: Multiple users confirmed the drop across different websites, ruling out individual site problems.
Origin of issue is not at Search Console: John Mueller confirmed there were no changes on the Search Console side.
Possible link to CrUX data pipeline: Barry Pollard from the Chrome team confirmed a reporting outage and mentioned a fix may be rolled out at an unspecified time in the future.
We now know that this is a confirmed issue. Google Search Console’s Core Web Vitals reports began showing a reporting outage around July 10, leading users to suspect a bug. The issue was later acknowledged by Barry Pollard as reporting outage affecting CrUX data, particularly on mobile.
Featured Image by Shutterstock/Mix and Match Studio
Wordfence published an advisory on the WordPress Malcure Malware Scanner plugin, which was discovered to have a vulnerability rated at a severity level of 8.1. At the time of publishing, there is no patch to fix the problem.
Screenshot Showing 8.1 Severity Rating
Malcure Malware Scanner Vulnerability
The Malcure Malware Scanner plugin, installed on over 10,000 WordPress websites, is vulnerable to “Arbitrary File Deletion due to a missing capability check on the wpmr_delete_file() function” by authenticated attackers. The fact that an attacker needs authentication as a user makes it a little less likely for it to be exploited, however not by much because it only requires subscriber level authentication, which is the lowest level of authentication. The “subscriber” role is the default level of registration on a WordPress website (if registration is allowed).
“This makes it possible for authenticated attackers, with Subscriber-level access and above, to delete arbitrary files making remote code execution possible. This is only exploitable when advanced mode is enabled on the site.”
There is no known patch available for the plugin and users are cautioned to take necessary actions such as uninstalling the plugin to mitigate risk.
The plugin is currently unavailable for download with a notice showing that it is under review.
Screenshot Of Malcure Plugin At WordPress Repository
Anthropic announced a new Financial Analysis Solution powered by its Claude 4 and Claude Code models. This is Anthropic’s first foray into a major vertical-focused platform, signaling a shift toward AI providers building tools that directly address common pain points in business workflows and productivity.
Claude For Financial Services
Anthropic’s Claude’s new service is an AI-powered financial analysis tool that’s targeted to financial professionals. It offers data integration via MCP (Model Context Protocol) and secure handling of data and total privacy. No user data is used for training Claude’s generative models.
According to the announcement:
“Claude has real-time access to comprehensive financial information including:
Box enables secure document management and data room analysis
Daloopa supplies high-quality fundamentals and KPIs from SEC filings
Databricks offers unified analytics for big data and AI workloads
FactSet provides comprehensive equity prices, fundamentals, and consensus estimates
Morningstar contributes valuation data and research analytics
PitchBook delivers industry-leading private capital market data and research, empowering users to source investment and fundraising opportunities, conduct due diligence and benchmark performance, faster and with greater confidence
S&P Global enables access to Capital IQ Financials, earnings call transcripts, and more–essentially your entire research workflow”
Takeaway:
This launch may signal a shift among AI providers toward building industry-specific tools that solve problems for professionals, rather than offering only general-purpose models that others use to provide the same solutions. Generative AI companies have the ability to stitch together solutions from big data providers in ways that smaller companies can’t.
WordPress released a maintenance update that contains twenty changes to the core and fixes fifteen issues in the Gutenberg block editor. WordPress also announced that it is dropping security support for WordPress versions 4.1 to 4.6.
Short-Cycle Maintenance Release
This is a maintenance release that incrementally makes WordPress a smoother experience.
Some of the fixes that are representative of what’s in this release:
Dropping Security Support
WordPress announced that it is dropping support for versions 4.1 through 4.6. According to the official WordPress stats, only 0.9% of websites are using those versions of WordPress.
Statement on release page:
“Dropping security updates for WordPress versions 4.1 through 4.6 This is not directly related to the 6.8.2 maintenance release, but branches 4.1 to 4.6 had their final release today. These branches won’t receive any security update anymore.”
“As of July 2025, the WordPress Security Team will no longer provide security updates for WordPress versions 4.1 through 4.6.
These versions were first released nine or more years ago and over 99% of WordPress installations run a more recent version. The chances this will affect your site, or sites, is very small.”
Google has added a new metadata field to the Search Analytics API, making it easier for developers and SEO professionals to identify when they’re working with incomplete or still-processing data.
The update introduces new transparency into the freshness of query results, an improvement for marketers who rely on up-to-date metrics to inform real-time decisions.
What’s New In The API
The metadata field appears when requests include the dataState parameter set to all or hourly_all, enabling access to data that may still be in the process of being collected.
Two metadata values are now available:
first_incomplete_date: Indicates the earliest date for which data is still incomplete. Only appears when data is grouped by date.
first_incomplete_hour: Indicates the first hour where data remains incomplete. Only appears when data is grouped by hour.
Both values help clarify whether recent metrics can be considered stable or if they may still change as Google finalizes its processing.
Why It Matters For SEO Reporting
This enhancement allows you to better distinguish between legitimate changes in search performance and temporary gaps caused by incomplete data.
To help reduce the risk of misinterpreting short-term fluctuations, Google’s documentation states:
“All values after the first_incomplete_date may still change noticeably.”
For those running automated reports, the new metadata enables smarter logic, such as flagging or excluding fresh but incomplete data to avoid misleading stakeholders.
Time Zone Consistency
All timestamps provided in the metadata field use the America/Los_Angeles time zone, regardless of the request origin or property location. Developers may need to account for this when integrating the data into local systems.
Backward-Compatible Implementation
The new metadata is returned as an optional object and doesn’t alter existing API responses unless requested. This means no breaking changes for current implementations, and developers can begin using the feature as needed.
Best Practices For Implementation
To take full advantage of this update:
Include logic to check for the metadata object when requesting recent data.
Consider displaying warnings or footnotes in reports when metadata indicates incomplete periods.
Schedule data refreshes after the incomplete window has passed to ensure accuracy.
Google also reminds users that the Search Analytics API continues to return only top rows, not a complete dataset, due to system limitations.
Looking Ahead
This small but meaningful addition gives SEO teams more clarity around data freshness, a frequent pain point when working with hourly or near-real-time performance metrics.
It’s a welcome improvement for anyone building tools or dashboards on top of the Search Console API.
The metadata field is available now through standard API requests. Full implementation details are available in the Search Analytics API documentation.
Google’s John Mueller and Martin Splitt discussed the question of whether AI will replace the need for SEO. Mueller expressed a common-sense opinion about the reality of the web ecosystem and AI chatbots as they exist today.
Context Of Discussion
The context of the discussion was about SEO basics that a business needs to know. Mueller then mentioned that businesses might want to consider hiring an SEO who can help navigate the site through its SEO journey.
Mueller observed:
“…you also need someone like an SEO as a partner to give you updates along the way and say, ‘Okay, we did all of these things,’ and they can list them out and tell you exactly what they did, ‘These things are going to take a while, and I can show you when Google crawls, we can follow along to see like what is happening there.’”
Is There Value In Learning SEO?
It was at this point that Martin Splitt asked if generative AI will make having to learn SEO obsolete or whether entering a prompt will give all the answers a business person needs to know. Mueller’s answer was tethered to how things are right now and avoided speculating about how things will change in a year or more.
Splitt asked:
“Okay, I think that’s pretty good. Last but not least, with generative AI and chatbot AI things happening. Do you think there’s still a value in learning these kind of things? Or can I just enter a prompt and it’ll figure things out for me?”
Mueller affirmed that knowing SEO will still be needed as long as there are websites because search engines and chat bots need the information that exists on websites. He offered examples of local businesses and ecommerce sites that still need to be found, regardless of whether that’s through an AI chatbot or search.
He answered:
“Absolutely value in learning these things and in making a good website. I think there are lots of things that all of these chatbots and other ways to get information, they don’t replace a website, especially for local search and ecommerce.
So, especially if you’re a local business, maybe it’s fine if a chatbot mentions your business name and tells people how to get there. Maybe that’s perfectly fine, but oftentimes, they do that based on web content that they found.
Having a website is the basis for being visible in all of these systems, and for a lot of other things where you offer a service or something, some other kind of functionality on a website where you have products to sell, where you have subscriptions or anything, a chat response can’t replace that.
If you want a t shirt, you don’t want a description of how to make your own t-shirt. You want a link to a store where it’s like, ‘Oh, here’s t-shirt designs,’ maybe t-shirt designs in that specific style that you like, but you go to this website and buy those t-shirts there.”
Martin acknowledged the common sense of that answer and they joked around a bit about Mueller hoping that an AI will be able to do his job once he retires.
That’s the context for this part of their conversation:
“Okay. That’s very fair. Yeah, that makes sense. Okay, so you think AI is not going to take it all away from us?”
And Mueller answers with the comment about AI replacing him after he retires:
“Well, we’ll see. I can’t make any promises. I think, at some point, I would like to retire, and then maybe AI takes over my work then. But, like, there’s lots of stuff to be done until then. There are lots of things that I imagine AI is not going to just replace.”
What About CMS Platforms With AI?
Something that wasn’t discussed is the trend of AI within content management systems. Many web hosts and WordPress plugins are already integrating AI into the workflow of creating and optimizing websites. Wix has already integrated AI into their workflow and it won’t be much longer until AI makes a stronger presence within WordPress, which is what the new WordPress AI team is working on.
Screenshot Of ChatGPT Choosing Number 27
Will AI ever replace the need for SEO? Many easy things that can be scaled are already automated. However, many of the best ideas for marketing and communicating with humans are still best handled by humans, not AI. The nature of generative AI, which is to generate the most likely answer or series of words in a sentence, precludes it from ever having an original idea. AI is so locked into being average that if you ask it to pick a number between one and fifty, it will choose the number 27 because the AI training binds it to picking the likeliest number, even when instructed to randomize the choice.
Listen to Search Off The Record at about the 24 minute mark:
Meta announced that it will implement stronger measures against accounts sharing “unoriginal” content on Facebook.
This marks the second major platform policy update in days following YouTube’s similar announcement about mass-produced and repetitive content.
Meta revealed it has removed approximately 10 million profiles impersonating large content creators, and taken action against 500,000 accounts involved in “spammy behavior or fake engagement”.
A Platform-Wide Movement Against Content Farms
Meta’s announcement closely follows YouTube’s monetization update, which clarified its stance on “inauthentic” content.
Both platforms are addressing the growing problem of accounts profiting from reposting others’ work without permission or meaningful additions.
According to Meta, accounts that repeatedly reuse someone else’s videos, photos, or text posts will lose access to Facebook’s monetization programs and face reduced visibility across all content.
Facebook is also testing a system that adds links on duplicate videos to direct viewers to the original creator.
Here’s an example of what that will look like on a reposted video:
Screenshot from: creators.facebook.com/blog/combating-unoriginal-content, July 2025.
Meta stated in its official blog post:
“We believe that creators should be celebrated for their unique voices and perspectives, not drowned out by copycats and impersonators.”
What Counts As Unoriginal Content?
Both Meta and YouTube distinguish between unoriginal content and transformative content, like reaction videos or commentary.
Meta emphasizes that content becomes problematic when creators repost others’ material without permission or meaningful enhancements, such as editing or voiceover.
YouTube creator liaison Renee Richie offered a similar clarification ahead of its own update, stating:
“This is a minor update to YouTube’s long-standing YPP policies to help better identify when content is mass-produced or repetitive”.
How AI & Automation Factor In
Neither platform bans AI-generated content outright. However, their recent updates appear designed to address a wave of low-quality, automated material that offers little value to viewers.
YouTube affirms that creators may use AI tools as long as the final product includes original commentary or educational value, with proper disclosure for synthetic content.
Meta’s guidelines similarly caution against simply “stitching together clips” or relying on recycled content, and encourage “authentic storytelling.”
These concerns implicitly target AI-assisted compilations that lack originality.
Potential Impact
For content creators, the updates from Meta and YouTube reinforce the importance of originality and creative input.
Those who produce reaction videos, commentary, or curated media with meaningful additions are unlikely to be affected. They may even benefit as spammy accounts lose visibility.
On the other hand, accounts that rely on reposting others’ content with minimal editing or variation could see reduced reach and loss of monetization.
To support creators, Meta introduced new post-level insights in its Professional Dashboard and a tool to check if a page is at risk of distribution or monetization penalties. YouTube is similarly offering guidance through its Creator Liaison and support channels.
Best Practices For Staying Compliant
To maintain monetization eligibility, Meta recommends:
Posting primarily original content filmed or created by the user.
Making meaningful enhancements such as editing, narration, or commentary when using third-party content.
Prioritizing storytelling over short, low-effort posts.
Avoiding recycled content with watermarks or low production value.
Writing high-quality captions with minimal hashtags and capitalization.
Looking Ahead
Meta and YouTube’s updates indicate a wider industry move against unoriginal content, especially AI-generated “slop” and content farms.
While the enforcement rollout may not affect every creator equally, these moves indicate a shift in priorities. Originality and value-added content are becoming the new standard.
The era of effortless monetization through reposting is being phased out. Moving forward, success on platforms like Facebook and YouTube will depend on creative input, storytelling, and a commitment to original expression.
Google published details of a new kind of AI based on graphs called a Graph Foundation Model (GFM) that generalizes to previously unseen graphs and delivers a three to forty times boost in precision over previous methods, with successful testing in scaled applications such as spam detection in ads.
The announcement of this new technology is referred to as expanding the boundaries of what has been possible up to today:
“Today, we explore the possibility of designing a single model that can excel on interconnected relational tables and at the same time generalize to any arbitrary set of tables, features, and tasks without additional training. We are excited to share our recent progress on developing such graph foundation models (GFM) that push the frontiers of graph learning and tabular ML well beyond standard baselines.”
Graph Neural Networks Vs. Graph Foundation Models
Graphs are representations of data that are related to each other. The connections between the objects are called edges and the objects themselves are called nodes. In SEO, the most familiar type of graph could be said to be the Link Graph, which is a map of the entire web by the links that connect one web page to another.
Current technology uses Graph Neural Networks (GNNs) to represent data like web page content and can be used to identify the topic of a web page.
A Google Research blog post about GNNs explains their importance:
“Graph neural networks, or GNNs for short, have emerged as a powerful technique to leverage both the graph’s connectivity (as in the older algorithms DeepWalk and Node2Vec) and the input features on the various nodes and edges. GNNs can make predictions for graphs as a whole (Does this molecule react in a certain way?), for individual nodes (What’s the topic of this document, given its citations?)…
Apart from making predictions about graphs, GNNs are a powerful tool used to bridge the chasm to more typical neural network use cases. They encode a graph’s discrete, relational information in a continuous way so that it can be included naturally in another deep learning system.”
The downside to GNNs is that they are tethered to the graph on which they were trained and can’t be used on a different kind of graph. To use it on a different graph, Google has to train another model specifically for that other graph.
To make an analogy, it’s like having to train a new generative AI model on French language documents just to get it to work in another language, but that’s not the case because LLMs can generalize to other languages, which is not the case for models that work with graphs. This is the problem that the invention solves, to create a model that generalizes to other graphs without having to be trained on them first.
The breakthrough that Google announced is that with the new Graph Foundation Models, Google can now train a model that can generalize across new graphs that it hasn’t been trained on and understand patterns and connections within those graphs. And it can do it three to forty times more precisely.
Announcement But No Research Paper
Google’s announcement does not link to a research paper. It’s been variously reported that Google has decided to publish less research papers and this is a big example of that policy change. Is it because this innovation is so big they want to keep this as a competitive advantage?
How Graph Foundation Models Work
In a conventional graph, let’s say a graph of the Internet, web pages are the nodes. The links between the nodes (web pages) are called the edges. In that kind of graph, you can see similarities between pages because the pages about a specific topic tend to link to other pages about the same specific topic.
In very simple terms, a Graph Foundation Model turns every row in every table into a node and connects related nodes based on the relationships in the tables. The result is a single large graph that the model uses to learn from existing data and make predictions (like identifying spam) on new data.
Screenshot Of Five Tables
Image by Google
Transforming Tables Into A Single Graph
The research paper says this about the following images which illustrate the process:
“Data preparation consists of transforming tables into a single graph, where each row of a table becomes a node of the respective node type, and foreign key columns become edges between the nodes. Connections between five tables shown become edges in the resulting graph.”
Screenshot Of Tables Converted To Edges
Image by Google
What makes this new model exceptional is that the process of creating it is “straightforward” and it scales. The part about scaling is important because it means that the invention is able to work across Google’s massive infrastructure.
“We argue that leveraging the connectivity structure between tables is key for effective ML algorithms and better downstream performance, even when tabular feature data (e.g., price, size, category) is sparse or noisy. To this end, the only data preparation step consists of transforming a collection of tables into a single heterogeneous graph.
The process is rather straightforward and can be executed at scale: each table becomes a unique node type and each row in a table becomes a node. For each row in a table, its foreign key relations become typed edges to respective nodes from other tables while the rest of the columns are treated as node features (typically, with numerical or categorical values). Optionally, we can also keep temporal information as node or edge features.”
Tests Are Successful
Google’s announcement says that they tested it in identifying spam in Google Ads, which was difficult because it’s a system that uses dozens of large graphs. Current systems are unable to make connections between unrelated graphs and miss important context.
Google’s new Graph Foundation Model was able to make the connections between all the graphs and improved performance.
The announcement described the achievement:
“We observe a significant performance boost compared to the best tuned single-table baselines. Depending on the downstream task, GFM brings 3x – 40x gains in average precision, which indicates that the graph structure in relational tables provides a crucial signal to be leveraged by ML models.”
Is Google Using This System?
It’s notable that Google successfully tested the system with Google Ads for spam detection and reported upsides and no downsides. This means that it can be used in a live environment for a variety of real-world tasks. They used it for Google Ads spam detection and because it’s a flexible model that means it can be used for other tasks for which multiple graphs are used, from identifying content topics to identifying link spam.
Normally, when something falls short the research papers and announcement say that it points the way for future but that’s not how this new invention is presented. It’s presented as a success and it ends with a statement saying that these results can be further improved, meaning it can get even better than these already spectacular results.
“These results can be further improved by additional scaling and diverse training data collection together with a deeper theoretical understanding of generalization.”