OpenAI Releases Shared Project Feature To All Users via @sejournal, @martinibuster

OPenAI announced that it is upgrading all ChatGPT accounts to be eligible for the project sharing feature, which enables users to share a ChatGPT project with others who can then participate and make changes. The feature was previously available only to users on OpenAI’s Business, Enterprise, and Edu plans.

The new feature is available to users globally in the Free, Plus, Pro, and budget Go plans, whether accessed on the web, iOS, or Android devices.

There are limits specific to each plan according to the announcement:

  • Free users can collaborate on up to 5 files and with 5 collaborators
  • Plus and Go users can share up to 25 files with up to 10 collaborators
  • Pro users can share up to 40 files with up to 100 collaborators

    OpenAI suggested the following use cases:

“Group work: Upload notes, proposals, and contracts so collaborators can draft deliverables faster and stay in sync.

Content creation: Apply project-specific instructions to keep tone and style consistent across contributors.

Reporting: Store datasets and reports in one project, and return each week to generate updates without starting over.

Research: Keep transcripts, survey results, and market research in one place, so anyone in the project can query and build on the findings.

Project owners can choose to share a project with “Only those invited” or “Anyone with a link,” and can change visibility settings at any time including switching back to invite-only.”

Read more at OpenAI: Shared Projects

Featured Image by Shutterstock/LedyX

Microsoft Updates Copilot With Memory, Search Connectors, & More via @sejournal, @MattGSouthern

Microsoft announced its Copilot Fall Release, introducing features to make AI more personal and collaborative.

New capabilities include group collaboration, long-term memory, health tools, and voice-enabled learning.

Mustafa Suleyman, head of Microsoft AI, wrote in the announcement that the release represents a shift in how AI supports users.

Suleyman wrote:

“… technology should work in service of people. Not the other way around. Ever.”

What’s New

Search Improvements

Copilot Search combines AI-generated answers with traditional results in one view, providing cited responses for faster discovery.

Microsoft also highlighted its in-house models, including MAI-Voice-1, MAI-1-Preview, and MAI-Vision-1, as groundwork for more immersive Copilot experiences.

Memory & Personalization

Copilot now includes long-term memory that tracks user preferences and information across conversations.

You can ask Copilot to remember specific details like training for a marathon or an anniversary, and the AI can recall this information in future interactions. Users can edit, update, or delete memories at any time.

Search Across Services

New connector features link Copilot to OneDrive, Outlook, Gmail, Google Drive, and Google Calendar so you can search for documents, emails, and calendar events across multiple accounts using natural language.

Microsoft notes this is rolling out gradually and may not yet be available in all regions or languages.

Edge & Windows Integration

Copilot Mode in Edge is evolving into what Microsoft calls an “AI browser.”

With user permission, Copilot can see open tabs, summarize information, and take actions like booking hotels or filling forms.

Voice-only navigation enables hands-free browsing. Journeys and Actions are currently available in the U.S. only.

Shared AI Sessions

The Groups feature turns Copilot into a collaborative workspace for up to 32 people.

You can invite friends, classmates, or teammates to shared sessions. Start a session by sending a link, and anyone with the link can join and see the same conversation in real time.

This feature is U.S. only at launch.

Health Features

Copilot for health grounds responses in credible sources like Harvard Health for medical questions.

Health features are available only in the U.S. at copilot.microsoft.com and in the Copilot iOS app.

Voice Tutoring

Learn Live provides voice-enabled Socratic tutoring for educational topics.

Interactive whiteboards help you work through concepts for test preparation, language practice, or exploring new subjects. U.S. only.

“Mico” Character

Microsoft introduced Mico, an optional visual character that reacts during voice conversations.

Separately, Copilot adds a “real talk” conversation style that challenges assumptions and adapts to user preferences.

Why This Matters

These features change how Copilot fits into your workflow.

The move from individual to collaborative sessions means teams can use AI together rather than separately synthesizing results.

Long-term memory reduces the need to repeat context, which matters for ongoing projects where Copilot needs to understand your specific situation.

Looking Ahead

Features are live in the U.S. now. Microsoft says updates are rolling out across the UK, Canada, and beyond in the next few weeks.

Some features require a Microsoft 365 Personal, Family, or Premium subscription; usage limits apply. Specific availability varies by market, device, and platform.

Search Engine Journal Is Hiring! via @sejournal, @hethr_campbell

We’re looking for a powerhouse project manager to keep our marketing team inspired and on track.

This is a Philippines-based, fully-remote position working on U.S.-adjacent hours (8 p.m. – 4 a.m. PHT) to be my partner in crime execution on some exciting projects.

We do things a little differently here, and we’ve learned that culture fit is everything. When it’s a match, people tend to stay; nearly half our team has been with us for more than five years.

If you’re the kind of team member who has loads of experience leaning into complex projects, honest conversations, and big ideas, we’d love to meet you.

About SEJ

We help our advertisers communicate with precision and creativity in an AI-driven world. Our campaigns are built on data, empathy, and continuous experimentation. We manage multi-channel strategies across content, email, and social media, and we’re looking for someone who can keep the moving parts aligned without losing sight of the humans behind them.

We’re hiring a Senior Digital Marketing Project Manager to lead strategy execution, client relationships, and team coordination. You’ll help us build marketing systems that are smart, efficient, and grounded in trust.

Why This Role Is Different

This is an AI-first position. You already use tools like ChatGPT, Claude, or Gemini to work smarter, automate workflows, and uncover insights that move the needle. Your success here depends on seeing where AI enhances human creativity … and where it doesn’t.

We’re a team that values autonomy, initiative, and straight talk. We’d rather have one clear, respectful conversation than weeks of confusion. We care deeply about doing great work and making each other better through feedback and shared accountability.

What You’ll Do

  • Manage and optimize complex digital marketing campaigns from strategy to execution.
  • Translate business goals into clear, actionable plans for clients and internal teams.
  • Keep communication flowing: up, down, and across.
  • Identify opportunities to integrate AI tools into analytics and operations.
  • Support a culture of feedback, growth, and curiosity.

Who You Are

  • You’re organized and strategic, but not rigid. You like structure, but you also know when to improvise.
  • You’re skilled at managing both clients and creatives. You can lead with empathy and keep projects on schedule.
  • You don’t shy away from a tough conversation if it means getting to a better outcome.

You’re the kind of team member who says things like:

  • “Let’s make sure we’re solving the right problem.”
  • “I appreciate the feedback! Here’s what I’m hearing.”
  • “How can AI help us work smarter here?”

Why Work With Search Engine Journal?

We’re a remote-first, global team that values:

  • Clarity over chaos.
  • Progress over perfection.
  • Honest collaboration over hierarchy.
  • We’re remote, flexible, and results-focused. You’ll have real ownership, real support, and the chance to do your best work with people who actually care about doing theirs.

If this sounds like your kind of place, see the full job listing and apply here.


Featured Image: PeopleImages/Shutterstock

Google’s New BlockRank Democratizes Advanced Semantic Search via @sejournal, @martinibuster

A new research paper from Google DeepMind  proposes a new AI search ranking algorithm called BlockRank that works so well it puts advanced semantic search ranking within reach of individuals and organizations. The researchers conclude that it “can democratize access to powerful information discovery tools.”

In-Context Ranking (ICR)

The research paper describes the breakthrough of using In-Context Ranking (ICR), a way to rank web pages using a large language model’s contextual understanding abilities.

It prompts the model with:

  1. Instructions for the task (for example, “rank these web pages”)
  2. Candidate documents (the pages to rank)
  3. And the search query.

ICR is a relatively new approach first explored by researchers from Google DeepMind and Google Research in 2024 (Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? PDF). That earlier study showed that ICR could match the performance of retrieval systems built specifically for search.

But that improvement came with a downside in that it requires escalating computing power as the number of pages to be ranked are increased.

When a large language model (LLM) compares multiple documents to decide which are most relevant to a query, it has to “pay attention” to every word in every document and how each word relates to all others. This attention process gets much slower as more documents are added because the work grows exponentially.

The new research solves that efficiency problem, which is why the research paper is called, Scalable In-context Ranking with Generative Models, because it shows how to scale In-context Ranking (ICR) with what they call BlockRank.

How BlockRank Was Developed

The researchers examined how the model actually uses attention during In-Context Retrieval and found two patterns:

  • Inter-document block sparsity:
    The researchers found that when the model reads a group of documents, it tends to focus mainly on each document separately instead of comparing them all to each other. They call this “block sparsity,” meaning there’s little direct comparison between different documents. Building on that insight, they changed how the model reads the input so that it reviews each document on its own but still compares all of them against the question being asked. This keeps the part that matters, matching the documents to the query, while skipping the unnecessary document-to-document comparisons. The result is a system that runs much faster without losing accuracy.
  • Query-document block relevance:
    When the LLM reads the query, it doesn’t treat every word in that question as equally important. Some parts of the question, like specific keywords or punctuation that signal intent, help the model decide which document deserves more attention. The researchers found that the model’s internal attention patterns, particularly how certain words in the query focus on specific documents, often align with which documents are relevant. This behavior, which they call “query-document block relevance,” became something the researchers could train the model to use more effectively.

The researchers identified these two attention patterns and then designed a new approach informed by what they learned. The first pattern, inter-document block sparsity, revealed that the model was wasting computation by comparing documents to each other when that information wasn’t useful. The second pattern, query-document block relevance, showed that certain parts of a question already point toward the right document.

Based on these insights, they redesigned how the model handles attention and how it is trained. The result is BlockRank, a more efficient form of In-Context Retrieval that cuts unnecessary comparisons and teaches the model to focus on what truly signals relevance.

Benchmarking Accuracy Of BlockRank

The researchers tested BlockRank for how well it ranks documents on three major benchmarks:

  • BEIR
    A collection of many different search and question-answering tasks used to test how well a system can find and rank relevant information across a wide range of topics.
  • MS MARCO
    A large dataset of real Bing search queries and passages, used to measure how accurately a system can rank passages that best answer a user’s question.
  • Natural Questions (NQ)
    A benchmark built from real Google search questions, designed to test whether a system can identify and rank the passages from Wikipedia that directly answer those questions.

They used a 7-billion-parameter Mistral LLM and compared BlockRank to other strong ranking models, including FIRST, RankZephyr, RankVicuna, and a fully fine-tuned Mistral baseline.

BlockRank performed as well as or better than those systems on all three benchmarks, matching the results on MS MARCO and Natural Questions and doing slightly better on BEIR.

The researchers explained the results:

“Experiments on MSMarco and NQ show BlockRank (Mistral-7B) matches or surpasses standard fine-tuning effectiveness while being significantly more efficient at inference and training. This offers a scalable and effective approach for LLM-based ICR.”

They also acknowledged that they didn’t test multiple LLMs and that these results are specific to Mistral 7B.

Is BlockRank Used By Google?

The research paper says nothing about it being used in a live environment. So it’s purely conjecture to say that it might be used. Also, it’s natural to try to identify where BlockRank fits into AI Mode or AI Overviews but the descriptions of how AI Mode’s FastSearch and RankEmbed work are vastly different from what BlockRank does. So it’s unlikely that BlockRank is related to FastSearch or RankEmbed.

Why BlockRank Is A Breakthrough

What the research paper does say is that this is a breakthrough technology that puts an advanced ranking system within reach of individuals and organizations that wouldn’t normally be able to have this kind of high quality ranking technology.

The researchers explain:

“The BlockRank methodology, by enhancing the efficiency and scalability of In-context Retrieval (ICR) in Large Language Models (LLMs), makes advanced semantic retrieval more computationally tractable and can democratize access to powerful information discovery tools. This could accelerate research, improve educational outcomes by providing more relevant information quickly, and empower individuals and organizations with better decision-making capabilities.

Furthermore, the increased efficiency directly translates to reduced energy consumption for retrieval-intensive LLM applications, contributing to more environmentally sustainable AI development and deployment.

By enabling effective ICR on potentially smaller or more optimized models, BlockRank could also broaden the reach of these technologies in resource-constrained environments.”

SEOs and publishers are free to their opinions of whether or not this could be used by Google. I don’t think there’s evidence of that but it would be interesting to ask a Googler about it.

Google appears to be in the process of making BlockRank available on GitHub, but it doesn’t appear to have any code available there yet.

Read about BlockRank here:
Scalable In-context Ranking with Generative Models

Featured Image by Shutterstock/Nithid

AI Assistants Show Significant Issues In 45% Of News Answers via @sejournal, @MattGSouthern

Leading AI assistants misrepresented or mishandled news content in nearly half of evaluated answers, according to a European Broadcasting Union (EBU) and BBC study.

The research assessed free/consumer versions of ChatGPT, Copilot, Gemini, and Perplexity, answering news questions in 14 languages across 22 public-service media organizations in 18 countries.

The EBU said in announcing the findings:

“AI’s systemic distortion of news is consistent across languages and territories.”

What The Study Found

In total, 2,709 core responses were evaluated, with qualitative examples also drawn from custom questions.

Overall, 45% of responses contained at least one significant issue, and 81% had some issue. Sourcing was the most common problem area, affecting 31% of responses at a significant level.

How Each Assistant Performed

Performance varied by platform. Google Gemini showed the most issues: 76% of its responses contained significant problems, driven by 72% with sourcing issues.

The other assistants were at or below 37% for major issues overall and below 25% for sourcing issues.

Examples Of Errors

Accuracy problems included outdated or incorrect information.

For instance, several assistants identified Pope Francis as the current Pope in late May, despite his death in April, and Gemini incorrectly characterized changes to laws on disposable vapes.

Methodology Notes

Participants generated responses between May 24 and June 10, using a shared set of 30 core questions plus optional local questions.

The study focused on the free/consumer versions of each assistant to reflect typical usage.

Many organizations had technical blocks that normally restrict assistant access to their content. Those blocks were removed for the response-generation period and reinstated afterward.

Why This Matters

When using AI assistants for research or content planning, these findings reinforce the need to verify claims against original sources.

As a publication, this could impact how your content is represented in AI answers. The high rate of errors increases the risk of misattributed or unsupported statements appearing in summaries that cite your content.

Looking Ahead

The EBU and BBC published a News Integrity in AI Assistants Toolkit alongside the report, offering guidance for technology companies, media organizations, and researchers.

Reuters reports the EBU’s view that growing reliance on assistants for news could undermine public trust.

As EBU Media Director Jean Philip De Tender put it:

“When people don’t know what to trust, they end up trusting nothing at all, and that can deter democratic participation.”


Featured Image: Naumova Marina/Shutterstock

YouTube Expands Likeness Detection To All Monetized Channels via @sejournal, @MattGSouthern

YouTube is beginning to expand access to its likeness detection tool to all channels in the YouTube Partner Program over the next few months.

The technology helps you identify unauthorized videos where your facial likeness has been altered or generated with AI.

YouTube announced the expansion after testing the tool with a small group of creators.

The tool addresses a growing concern as AI-generated content becomes more sophisticated and accessible.

How Likeness Detection Works

Channels can access the tool through YouTube Studio’s content detection tab under a new likeness section.

The onboarding process requires identity verification. You scan a QR code with your phone’s camera, then submit a photo ID and record a brief selfie video performing specific motions.

YouTube processes this information on Google servers, typically granting access within a few days.

Once verified, creators see a dashboard displaying videos that match their facial likeness. The interface shows video titles, upload dates, upload channels, view counts, and subscriber numbers. YouTube’s systems flag some matches as higher priority for review.

Taking Action On Detected Content

You have three options when reviewing matches.

You can request removal under YouTube’s privacy guidelines, submit a copyright claim, or archive the video without action. The tool automatically fills legal name and email information when starting a removal request.

Privacy removal requests apply to altered or synthetic content that violates specific criteria. YouTube’s announcement highlighted two examples: AI-generated videos showing creators endorsing political candidates, and infomercials with creators’ faces added through AI.

Copyright claims follow different rules and must consider fair use exceptions. Videos using short clips from a creator’s channel may not qualify for privacy removal but could warrant copyright action.

See a demonstration in the video below:

Policy Differences

YouTube stressed the distinction between privacy and copyright policies.

Privacy policy violations involve altered or synthetic content judged against criteria including whether the content is parody, satire, or includes AI disclosure. Copyright infringement covers unauthorized use of original content, including cropped videos to avoid detection or videos with changed audio.

The tool surfaces some short clips from creators’ own channels. These don’t qualify for privacy removal but may be eligible for copyright claims if fair use doesn’t apply.

Why This Matters

This gives YouTube Partner Program creators direct control over how AI-generated content uses their likeness.

Monetized channels can now monitor unauthorized deepfakes and request removal when videos mislead the audience about endorsements or statements that were never made.

Looking Ahead

The tool will roll out to eligible creators over the next few months. Those who see no matches shouldn’t be concerned. YouTube says this indicates no detected unauthorized use of their likeness on the platform.

Channels can withdraw consent and stop using the tool at any time through the manage likeness detection settings.

Surfer SEO Acquired By Positive Group via @sejournal, @martinibuster

The French technology group Positive acquired Surfer, the popular content optimization tool. The acquisition helps Positive create a “full-funnel” brand visibility solution together with its marketing and CRM tools.

The acquisition of Surfer extends Positive’s reach from marketing software to AI-based brand visibility. Positive described the deal as part of a European AI strategy that supports jobs and protects data. Positive’s revenue has grown fivefold in the past five years, rising from €50 million to an expected €70 million in 2025.

Surfer SEO

Founded in 2017, Surfer developed SEO tools based on language models that help marketers improve visibility on both search engines and AI assistants, which have become a growing source of website traffic and customers.

Sign Of Broader Industry Trends

The acquisition shows that search optimization continues to be an important part of business marketing as AI search and chat play a larger role in how consumers learn about products, services, and brands. This deal enables Positive to offer AI-based visibility solutions alongside its CRM and automation products, expanding its technology portfolio.

What Acquisition Means For Customers

Positive Group, based in France, is a technology solutions company that develops digital tools for marketing, CRM, automation, and data management. It operates through several divisions: User (marketing and CRM), Signitic (email signatures), and now Surfer (AI search optimization). The company is majority-owned by its executives, employs about 400 people, and keeps its servers in France and Germany. Surfer, based in Poland, brings experience in AI content optimization and a strong presence in North America. Together, they combine infrastructure, market knowledge, and product development within one technology-focused group.

Lucjan Suski, CEO and co-founder of Surfer, commented:

“SEO is evolving fast, and it matters more than ever before. We help marketers win the AI SEO era. Positive helps them grow across every other part of their digital strategy. Together, we’ll give marketers the complete toolkit to lead across AI search, email marketing automation, and beyond.”

According to Mathieu Tarnus, Positive’s founding president, and Paul de Fombelle, its CEO:

“Artificial intelligence is at the heart of our value proposition. With the acquisition of Surfer, our customers are moving from optimizing their traditional SEO positioning to optimizing their brand presence in the responses provided by conversational AI assistants. Surfer stands out from established market players by directly integrating AI into content creation and optimization.”

The acquisition adds Surfer’s AI optimization capabilities to Positive’s product ecosystem, helping customers improve visibility in AI-generated answers. For both companies, the deal is an opportunity to expand their capabilities in AI-based brand visibility.

Featured Image by Shutterstock/GhoST RideR 98

Brave Reveals Systemic Security Issues In AI Browsers via @sejournal, @MattGSouthern

Brave disclosed security vulnerabilities in AI browsers that could allow malicious websites to hijack AI assistants and access sensitive user accounts.

The issues affect Perplexity Comet, Fellou, and potentially other AI browsers that can take actions on behalf of users.

The vulnerabilities stem from indirect prompt injection attacks where websites embed hidden instructions that AI browsers process as legitimate user commands. Brave published the findings after reporting the issues to affected companies.

What Brave Found

Perplexity Comet Vulnerability

Comet’s screenshot feature can be exploited by embedding nearly invisible text in webpages.

When users take screenshots to ask questions, the AI extracts hidden text using what appears to be OCR and processes it as commands rather than untrusted content.

Brave notes Comet isn’t open-source, so this behavior is inferred and can’t be verified from source code.

The hidden instructions use faint colors that humans can barely see but AI systems extract and execute. This lets attackers issue commands to the AI assistant without the user’s knowledge.

Fellou Navigation Vulnerability

Fellou browser sends webpage content to its AI system when users navigate to a site.

Asking the AI assistant to visit a webpage causes the browser to pass the page’s visible content to the AI in a way that lets the webpage text override user intent.

This means visiting a malicious site could trigger unintended AI actions without requiring explicit user interaction with the AI assistant.

Access To Sensitive Accounts

The vulnerabilities become dangerous because AI assistants operate with user authentication privileges.

A hijacked AI browser can access banking sites, email providers, work systems, and cloud storage where users remain logged in.

Brave notes that even summarizing a Reddit post could result in attackers stealing money or private data if the post contains hidden malicious instructions.

Industry Context

Brave describes indirect prompt injection as a systemic challenge facing AI browsers rather than an isolated issue.

The problem revolves around AI systems failing to distinguish between trusted user input and untrusted webpage content when constructing prompts.

Brave is withholding details of one additional vulnerability found in another browser until next week.

Why This Matters

Brave argues that traditional web security models break when AI agents act on behalf of users.

Natural language instructions on any webpage can trigger cross-domain actions reaching banks, healthcare providers, corporate systems, and email hosts.

Same-origin policy protections become irrelevant because AI assistants execute with full user privileges across all authenticated sites.

The disclosure arrives the same day OpenAI launched ChatGPT Atlas with agent mode capabilities, highlighting the tension between AI browser functionality and security.

People using AI browsers with agent features face a tradeoff between automation capabilities and exposure to these systemic vulnerabilities.

Looking Ahead

Brave’s research continues with additional findings scheduled for disclosure next week.

The company indicated it’s exploring longer-term solutions to address the trust boundary problems in agentic browsing.


Featured Image: Who is Danny/Shutterstock

OpenAI Launches ChatGPT Atlas Browser For macOS via @sejournal, @MattGSouthern

OpenAI released ChatGPT Atlas today, describing it as “the browser with ChatGPT built in.”

OpenAI announced the launch in a blog post and livestream featuring CEO Sam Altman and team members including Ben Goodger, who previously helped develop Google Chrome and Mozilla Firefox.

Atlas is available now on macOS worldwide for Free, Plus, Pro, and Go users. Windows, iOS, and Android versions are coming soon.

What Does ChatGPT Atlas Do?

Unified New Tab Experience

Opening a new tab creates a starting point where you can ask questions or enter URLs. Results appear with tabs to switch between links, images, videos, and news where available.

OpenAI describes this as showing faster, more useful results in one place. The tab-based navigation keeps ChatGPT answers and traditional search results within the same view.

ChatGPT Sidebar

A ChatGPT sidebar appears in any browser window to summarize content, compare products, or analyze data from the page you’re viewing.

The sidebar provides assistance without leaving the current page.

Cursor

Cursor chat lets you highlight text in emails, calendar invites, or documents and get ChatGPT help with one click.

The feature can rewrite selected text inline without opening a separate chat window.

Agent Mode

Agent mode can open tabs and click through websites to complete tasks with user approval. OpenAI says it can research products, book appointments, or organize tasks inside your browser.

The company describes it as an early experience that may make mistakes on complex workflows, but is rapidly improving reliability and task success rates.

Browser Memories

Browser memories let ChatGPT remember context from sites you visit and bring back relevant details when needed. The feature can continue product research or build to-do lists from recent activity.

Browser memories are optional. You can view all memories in settings, archive ones no longer relevant, and clear browsing history to delete them.

A site-level toggle in the address bar controls which pages ChatGPT can see.

Privacy Controls

Users control what ChatGPT can see and remember. You can clear specific pages, clear entire browsing history, or open an incognito window to temporarily log out of ChatGPT.

By default, OpenAI doesn’t use browsing content to train models. You can opt in by enabling “include web browsing” in data controls settings.

OpenAI added safeguards for agent mode. It cannot run code in the browser, download files, install extensions, or access other apps on your computer or file system. It pauses to ensure you’re watching when taking actions on sensitive sites like financial institutions.

The company acknowledges agents remain susceptible to hidden malicious instructions in webpages or emails that could override intended behavior. OpenAI ran thousands of hours of red-teaming and designed safeguards to adapt to novel attacks, but notes the safeguards won’t stop every attack.

Why This Matters

Atlas blurs the line between browser and search engine by putting ChatGPT responses alongside traditional search results in the same view. This changes the browsing model from ‘visit search engine, then navigate to sites’ to ‘ask questions and browse simultaneously.’

This matters because it’s another major platform where AI-generated answers appear before organic links.

The agent mode also introduces a new variable: AI systems that can navigate sites, fill forms, and complete purchases on behalf of users without traditional click-through patterns.

The privacy controls around site visibility and browser memories create a permission layer that hasn’t existed in traditional browsers. Sites you block from ChatGPT’s view won’t contribute to AI responses or memories, which could affect how your content gets discovered and referenced.

Looking Ahead

OpenAI is rolling out Atlas for macOS starting today. First-run setup imports bookmarks, saved passwords, and browsing history from your current browser.

Windows, iOS, and Android versions are scheduled to launch in the coming months without specific release dates.

The roadmap includes multi-profile support, improved developer tools, and guidance for websites to add ARIA tags to help the agent work better with their content.


Featured Image: Saku_rata160520/Shuterstock

Google Announces A New Era For Voice Search via @sejournal, @martinibuster

Google announced an update to its voice search, which changes how voice search queries are processed and then ranked. The new AI model uses speech as input for the search and ranking process, completely bypassing the stage where voice is converted to text.

The old system was called Cascade ASR, where a voice query is converted into text and then put through the normal ranking process. The problem with that method is that it’s prone to mistakes. The audio-to-text conversion process can lose some of the contextual cues, which can then introduce an error.

The new system is called Speech-to-Retrieval (S2R). It’s a neural network-based machine-learning model trained on large datasets of paired audio queries and documents. This training enables it to process spoken search queries (without converting them into text) and match them directly to relevant documents.

Dual-Encoder Model: Two Neural Networks

The system uses two neural networks:

  1. One of the neural networks, called the audio encoder, converts spoken queries into a vector-space representation of their meaning.
  2. The second network, the document encoder, represents written information in the same kind of vector format.

The two encoders learn to map spoken queries and text documents into a shared semantic space so that related audio and text documents end up close together according to their semantic similarity.

Audio Encoder

Speech-to-Retrieval (S2R) takes the audio of someone’s voice query and transforms it into a vector (numbers) that represents the semantic meaning of what the person is asking for.

The announcement uses the example of the famous painting The Scream by Edvard Munch. In this example, the spoken phrase “the scream painting” becomes a point in the vector space near information about Edvard Munch’s The Scream (such as the museum it’s at, etc.).

Document Encoder

The document encoder does a similar thing with text documents like web pages, turning them into their own vectors that represent what those documents are about.

During model training, both encoders learn together so that vectors for matching audio queries and documents end up near each other, while unrelated ones are far apart in the vector space.

Rich Vector Representation

Google’s announcement says that the encoders transform the audio and text into “rich vector representations.” A rich vector representation is an embedding that encodes meaning and context from the audio and the text. It’s called “rich” because it contains the intent and context.

For S2R, this means the system doesn’t rely on keyword matching; it “understands” conceptually what the user is asking for. So even if someone says “show me Munch’s screaming face painting,” the vector representation of that query will still end up near documents about The Scream.

According to Google’s announcement:

“The key to this model is how it is trained. Using a large dataset of paired audio queries and relevant documents, the system learns to adjust the parameters of both encoders simultaneously.

The training objective ensures that the vector for an audio query is geometrically close to the vectors of its corresponding documents in the representation space. This architecture allows the model to learn something closer to the essential intent required for retrieval directly from the audio, bypassing the fragile intermediate step of transcribing every word, which is the principal weakness of the cascade design.”

Ranking Layer

S2R has a ranking process, just like regular text-based search. When someone speaks a query, the audio is first processed by the pre-trained audio encoder, which converts it into a numerical form (vector) that captures what the person means. That vector is then compared to Google’s index to find pages whose meanings are most similar to the spoken request.

For example, if someone says “the scream painting,” the model turns that phrase into a vector that represents its meaning. The system then looks through its document index and finds pages that have vectors with a close match, such as information about Edvard Munch’s The Scream.

Once those likely matches are identified, a separate ranking stage takes over. This part of the system combines the similarity scores from the first stage with hundreds of other ranking signals for relevance and quality in order to decide which pages should be ranked first.

Benchmarking

Google tested the new system against Cascade ASR and against a perfect-scoring version of Cascade ASR called Cascade Groundtruth. S2R beat Cascade ASR and very nearly matched Cascade Groundtruth. Google concluded that the performance is promising but that there is room for additional improvement.

Voice Search Is Live

Although the benchmarking revealed that there is some room for improvement, Google announced that the new system is live and in use in multiple languages, calling it a new era in search. The system is presumably used in English.

Google explains:

“Voice Search is now powered by our new Speech-to-Retrieval engine, which gets answers straight from your spoken query without having to convert it to text first, resulting in a faster, more reliable search for everyone.”

Read more:

​​Speech-to-Retrieval (S2R): A new approach to voice search

Featured Image by Shutterstock/ViDI Studio