Stability AI announced the release of Stable Video Diffusion, a generative AI video model that can transform static images into video content.
As marketers look for innovative ways to create visual content, we offer a sneak peek into the future of video content creation with generative artificial intelligence (AI).
What Is Stable Video Diffusion?
Stable Video Diffusion is a foundation model in research preview from Stability AI with image-to-video capability.
It was designed to perform tasks like multi-view synthesis from a single image, a capability enhanced by finetuning on multi-view datasets.
Stability AI offers two versions, capable of generating 14 and 25 frames at frame rates ranging from 3 to 30 frames per second.
While the company is enthusiastic about incorporating feedback and updating the models with the latest advancements, it clarified that the model is not intended for real-world or commercial applications at this stage.
Stable Video Diffusion’s code is available on GitHub, and the necessary weights to run it locally can be found on the Hugging Face page. This paper includes detailed technical capabilities of the new model.
Screenshot from Stability AI, November 2023
How To Create Video From A Static Image: 3 AI Video Generator Demos
The following research demos powered by Stable Video Diffusion offer a glimpse into the future of visual content creation with generative AI.
1. Community Demo For Stable Video Diffusion – Img2Vid – XT On Hugging Face
I used this demo to create a downloadable GIF file from a ChatGPT-generated logo.
Screenshot from Hugging Face, November 2023
The resulting AI-generated video:
2. SVD On Replicate
I used this demo to generate a downloadable MP4 file from the same logo.
Screenshot from Replicate, November 2023
The resulting AI-generated video:
3. Stable Video Diffusion Playground On Fal
I tried this demo to create a downloadable GIF file.
Screenshot from Fal, November 2023
Unfortunately, there were technical difficulties with the logo test. So here is an AI-generated GIF I created a few days ago using this demo and an image generated by DALL-E 3.
Interested individuals and organizations can sign up for a waitlist to access a new upcoming web experience from Stability AI featuring a text-to-video interface, which will showcase the practical applications of this technology.
The release of Stable Video Diffusion marks a significant step in the evolution of generative AI technology, paving the way for future innovations in marketing and advertising technology.
Amazon announced the launch of Q, a generative AI chatbot designed to enhance business operations with direct integrations to over 40 enterprise applications, in limited preview.
Screenshot from Amazon, November 2023
This new technology is set to transform how businesses utilize artificial intelligence (AI), promising to streamline tasks, accelerate decision-making, and foster creativity and innovation in the workplace.
With features tailored to specific business needs and a strong emphasis on security and privacy, Amazon Q could become an indispensable tool for many business professionals.
Amazon Q For Businesses
Q offers seamless integration with your company’s data and systems. Direct connections with Adobe, Google, Microsoft, Salesforce, and Slack services make Amazon Q an adaptable and versatile tool for most business needs.
In addition, you can connect to platforms like Jira, ServiceNow, and Zendesk with Q’s third-party plugins.
Screenshot from Amazon, November 2023
Q In Content Marketing
Marketers can use Amazon Q for various tasks, such as transforming a press release into a blog post, summarizing it, or drafting an email based on the release.
Q’s ability to search through company content, including internal style guides, ensures that responses adhere to the company’s brand standards.
Screenshot from Amazon, November 2023
This feature could be invaluable for maintaining a consistent brand voice across various channels.
Additionally, Amazon Q can generate tailored social media prompts, helping promote stories effectively across social media. This ability saves time and ensures that each post is optimized for its respective platform.
Following a campaign, Q can analyze and summarize the results for leadership reviews, offering valuable insights into campaign performance.
Amazon emphasized that it also prioritizes security and privacy, ensuring it will not use customer content from Q to train its models.
Q In Slack
The Q Slack project takes the capabilities of Amazon Q to a new level by integrating it with Slack. On this platform, many users spend significant time collaborating with colleagues.
This integration allows users to harness the power of Q directly within Slack, facilitating more efficient collaboration and information sharing.
Users can interact with Q through Slack Direct Messages (DM) to ask questions based on company data, seek assistance in creating new content, or get help with various tasks.
Screenshot from Amazon, November 2023
This direct interaction makes Amazon Q easily accessible for individual queries and tasks.
Beyond individual interactions, Q can also be invited to participate in team channels on Slack. In these channels, users can engage with Q in new messages or tag it in ongoing threads.
This feature could be useful for providing additional data points, resolving debates, or summarizing conversations to capture the next steps.
Its ability to understand and contribute to the context of a conversation makes it an invaluable team member in collaborative settings.
Amazon Q For AWS
Developers will appreciate Q for its comprehensive knowledge of AWS. It draws on 17 years of AWS expertise, making it an invaluable resource for any queries related to application development.
Q provides easy access to tools integrated across multiple AWS interfaces, including the AWS Management Console, mobile application, and documentation tools.
Screenshot from Amazon, November 2023
It supports developers through various stages of application development, from initial research to deployment and maintenance, offering personalized recommendations, troubleshooting assistance for AWS services, and network troubleshooting.
A standout feature is the Code Transformation capability, facilitating Java application upgrades and exemplifying Amazon Q’s role as an informational resource and an active participant in application optimization.
Additionally, Q extends its functionalities to IDEs. It enables developers to receive context-specific coding guidance directly in their workflow, thus revolutionizing how developers interact with AWS services and enhancing the overall efficiency of the development process.
Amazon Q In QuickSight
In the realm of Business Intelligence (BI), Q brings its capabilities to QuickSight, offering enhanced productivity for BI users.
This feature allows for the quick creation of visuals and calculations, refining visuals using natural language, empowering business users to self-serve data and insights, and reducing the need for dashboard notifications.
Screenshot from Amazon, November 2023
Stories allow business users to create detailed, formatted narratives directly from QuickSight dashboards using natural language prompts and AI-driven rewriting capabilities. Executive Summaries offer a quick snapscrucial of key dashboard insights in natural language.
The Q&A experience provides more intuitive query solutions, enabling users to delve deeper into data insights with AI-suggested questions and narrative summaries, enhancing understanding and decision-making in business contexts.
Amazon Q In Connect
Q also extends its generative AI capabilities to Amazon Connect, a contact center service that assists customer service agents in providing improved customer service.
It understands customer needs and recommends responses and actions for agents, significantly reducing customer wait times and enhancing overall satisfaction.
Screenshot from Amazon, November 2023
By leveraging real-time conversations and relevant company content, Amazon Q in Connect suggests responses and actions for agents, thereby enhancing customer satisfaction.
In addition to Q in Connect, Amazon enhanced customer service efficiency with generative AI.
These improvements include post-contact summarization by Contact Lens, improved chatbot and IVR accuracy through Amazon Lex, streamlined customer profile creation using LLMs, expanded in-app, web, and video features, and two-way SMS capabilities.
Amazon Q In The Supply Chain
Q in AWS Supply Chain is an upcoming feature providing supply chain professionals with an intelligent, conversational interface for analyzing risks and visualizing trade-offs in various scenarios.
Powered by Amazon Bedrock, it integrates with the AWS Supply Chain application, allowing users to query data and receive detailed responses to complex “what,” “why,” and “what if” questions.
Q can tailor its analysis to specific business needs, offering insights like the financial impact of delayed orders and suggesting solutions like expedited shipping, aiding decision-making, and optimizing supply chain management.
How Much Does Amazon Q Cost?
Amazon offers various pricing plans for Q, starting at $20 per user per month for business insights and QuickSight access, with a $25 per user per month option for AWS development.
Some features offered for free during the preview of Q may become paid features later.
In addition, Amazon offers specialized plans for Q in QuickSight, Connect, and Supply Chain.
How Can I Try Amazon Q?
Amazon Q is currently in limited preview for customers in the US East (N. Virginia) and US West (Oregon) regions.
Really excited to share with customers Amazon Q—a new type of generative AI-powered assistant that is specifically for work and can be tailored to your business.
Amazon Q can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take…
With its range of features, focusing on customization, security, and integration with AWS services, Amazon Q could be a compelling choice for businesses seeking to leverage AI for improved efficiency and innovation.
This launch signifies Amazon’s continued commitment to AI, promising a more efficient, insightful, and innovative business landscape.
Accelerated Mobile Pages WordPress plugin, with over 100,000 installations, patched a medium severity vulnerability that could allow an attacker to inject malicious scripts to be executed by website visitors.
Cross-Site Scripting Via Shortcode
A cross-site scripting (XSS) is one of the most frequent kind of vulnerability. In the context of WordPress plugins, XSS vulnerabilities happen when a plugin has a way to input data that isn’t sufficiently secured by a process that validates or sanitizes user inputs.
Sanitization is a way to block unwanted kinds of input. For example, if a plugin allows a user to add text through an input field, then it should also sanitize anything else that is input into that form that doesn’t belong, like a script or a zip file.
A shortcode is a WordPress feature that allows users to insert a tag that looks like this [example] within posts and pages. Shortcodes embed functionalities or content that is provided by a plugin. This allows users to configure a plugin through an admin panel then copy and paste a shortcode into a post or page where they want the plugin functionality to appear.
A “cross-site scripting via shortcode” vulnerability is a security flaw that allows an attacker to inject malicious scripts into a website by exploiting the shortcode function of the plugin.
According to a report recently published by the Patchstack WordPress security company:
“This could allow a malicious actor to inject malicious scripts, such as redirects, advertisements, and other HTML payloads into your website which will be executed when guests visit your site.
This vulnerability has been fixed in version 1.0.89.”
Wordfence describes the vulnerability:
“Accelerated Mobile Pages plugin for WordPress is vulnerable to Stored Cross-Site Scripting via the plugin’s shortcode(s) in all versions up to, and including, 1.0.88.1 due to insufficient input sanitization and output escaping on user supplied attributes.”
Wordfence also clarifies that this is an authenticated vulnerability which for this specific exploit means that a hacker needs at least a contributor permission level in order to take advantage of the vulnerability.
This exploit is rated by Patchstack as a medium severity level vulnerability, scoring a 6.5 on a scale of 1-10 (with ten being the most severe).
It’s advised that users check their installations so that they are patched to at least version 1.0.89.
Google is updating its policy limiting personalized advertising to include more restrictions on ads related to consumer financial products and services.
Google’s personalized ads policy prohibits targeting users based on sensitive categories like race, religion, or sexual orientation.
Over the years, Google has continued updating the policy to introduce new limitations. The latest update to restrict consumer finance ads is part of Google’s ongoing efforts to refine its ad targeting practices.
What’s Changing?
Google will update its personalized ads policy in February 2024 to prevent advertisers from targeting audiences for credit and banking ads based on sensitive factors like gender, age, parental status, marital status, or zip code.
Google’s current policy prohibiting “Credit in personalized ads” will be renamed “Consumer finance in personalized ads” under the changes.
Google’s new policy will state:
“In the United States and Canada, the following sensitive interest categories cannot be targeted to audiences based on gender, age, parental status, marital status, or ZIP code.
Offers relating to credit or products or services related to credit lending, banking products and services, or certain financial planning and management services.”
Google provided examples, including “credit cards and loans including home loans, car loans, appliance loans, short-term loans,” as well as “banking and checking accounts” and “debt management products.”
When Does The New Policy Take Effect?
The updated limitations on personalized advertising will take effect on February 28, 2024, with full enforcement expected within six weeks.
Google said advertisers in violation will receive a warning at least seven days before any account suspension.
According to Google, the policy change aims to protect users’ privacy better and prevent discrimination in financial services advertising.
However, the company will still allow generalized ads for credit and banking products that do not use sensitive personal data for targeting.
What Do Advertisers Need To Do?
Google will begin enforcing the updated restrictions in late February 2024 but advises advertisers to review their campaigns for compliance issues sooner.
Advertisers should carefully check their ad targeting settings, remove improper personalization based on sensitive categories, and adhere to the revised policy requirements.
Failure to follow the rules could lead to account suspension after an initial warning. Google will work with advertisers to ensure a smooth transition during the ramp-up period over the next six months.
Microsoft Advertising has announced the launch of Monetize Insights, a new analytics dashboard aimed at helping publishers monitor and optimize their advertising revenue streams more efficiently.
The dashboard is now available globally within Microsoft’s supply-side platform, Monetize.
Monetize Insights Features
Monetize Insights provides publishers with visual graphs and comparison charts for a quick, holistic view of their performance.
The dashboard allows you to filter data and drill down into specifics around revenue drivers, inventory metrics, and significant changes over time. This lets you quickly identify issues, trends, and opportunities to maximize yield.
“User-friendly data analytics can help publishers identify issues sooner, diagnose issues faster, and ensure revenue is not left on the table,” said Christopher Walmsley, Senior Product Manager at Microsoft. “With Monetize Insights, publishers can easily monitor key monetization metrics and efficiently dive into the details of revenue drivers.”
Two key features of the new dashboard are the Total Revenue and Bid Rejection tabs. The Total Revenue tab simplifies business performance tracking across channels, buyers, and brands. You can view trends in revenue, impressions, ad requests, fill rates, and more.
The Bid Rejection tabs provide transparency into blocks impacting revenue, such as ad quality settings, price floors, and demand issues. This helps you understand the monetary effect of your inventory settings and potentially adjust to unblock significant revenue.
Monetize Insights Benefits
Microsoft designed Monetize Insights to streamline publishers’ workflows. The guided navigation and configurable metrics aim to save you time configuring dashboards or running manual reporting.
The analytics dashboard is now live for all Monetize platform users globally.
Publishers interested in leveraging the new tool can sign into their Monetize accounts and activate Monetize Insights.
How This Can Help You
As a publisher, having clear visibility into your advertising performance data is essential for making smart optimization decisions.
With the launch of Monetize Insights, Microsoft Advertising is aiming to provide you with an efficient analytics hub to monitor and understand what’s driving your revenue.
If you’re looking to identify issues faster, reveal opportunities, and save time analyzing granular reports, this new dashboard may be worth exploring.
LinkedIn rolled out a new content moderation framework that’s a breakthrough in optimizing moderation queues, reducing the time to catch policy violations by 60%. This technology may be the future of content moderation once the technology becomes more available.
How LinkedIn Moderates Content Violations
LinkedIn has content moderation teams that work on manually reviewing possible policy-violating content.
They use a combination of AI models, LinkedIn member reports, and human reviews to catch harmful content and remove it.
But the scale of the problem is immense because there are hundreds of thousands of items needing review every single week.
What tended to happen in the past, using the first in, first out (FIFO) process, is that every item needing a review would wait in a queue, resulting in actual offensive content taking a long time to be reviewed and removed.
Thus, the consequence of using FIFO is that users were exposed to harmful content.
LinkedIn described the drawbacks of the previously used FIFO system:
“…this approach has two notable drawbacks.
First, not all content that is reviewed by humans violates our policies – a sizable portion is evaluated as non-violative (i.e., cleared).
This takes valuable reviewer bandwidth away from reviewing content that is actually violative.
Second, when items are reviewed on a FIFO basis, violative content can take longer to detect if it is ingested after non-violative content.”
LinkedIn devised an automated framework using a machine learning model to prioritize content that is likely to be violating content policies, moving those items to the front of the queue.
This new process helped to speed up the review process.
New Framework Uses XGBoost
The new framework uses an XGBoost machine learning model to predict which content item is likely to be a violation of policy.
XGBoost is shorthand for Extreme Gradient Boosting, an open source machine learning library that helps to classify and rank items in a dataset.
This kind of machine learning model, XGBoost, uses algorithms to train the model to find specific patterns on a labeled dataset (a dataset that is labeled as to which content item is in violation).
LinkedIn used that exact process to train their new framework:
“These models are trained on a representative sample of past human labeled data from the content review queue and tested on another out-of-time sample.”
Once trained the model can identify content that, in this application of the technology, is likely in violation and needs a human review.
XGBoost is a cutting edge technology that has been found in benchmarking tests to be highly successful for this kind of use, both in accuracy and the amount of processing time it takes, outperforming other kinds of algorithms..
LinkedIn described this new approach:
“With this framework, content entering review queues is scored by a set of AI models to calculate the probability that it likely violates our policies.
Content with a high probability of being non-violative is deprioritized, saving human reviewer bandwidth and content with a higher probability of being policy-violating is prioritized over others so it can be detected and removed quicker.”
Impact On Moderation
LinkedIn reported that the new framework is able to make an automatic decisions on about 10% of the content queued for review, with what LinkedIn calls an “extremely high” level of precision. It’s so accurate that the AI model exceeds the performance of a human reviewer.
Remarkably, the new framework reduces the average time for catching policy-violating content by about 60%.
Where New AI Is Being Used
The new content review prioritization system is currently used for feed posts and comments. LinkedIn announced that they are working to add this new process elsewhere in LinkedIn.
Moderating for harmful content is super important because it can help improve the user experience by reducing the amount of users who are exposed to harmful content.
It is also useful for the moderation team because it helps them scale up and handle the large volume.
This technology is proven to be successful and in time it may become more ubiquitous as it becomes more widely available.
Google announced today that it will start supporting structured data for discussion forums and profile pages in Google Search.
This change will allow Google to better identify and display information from web forums. These updates also impact how content from forums and personal profiles appears in search results.
Lastly, Google announced new reporting capabilities for the new structured data in Search Console.
Here are all the details.
Structured Data For Online Forums
Google is introducing this new structured data markup to enhance the visibility of first-hand content from online forums in search results.
“This markup works with Google Search features that are designed to show first-person perspectives from social media platforms, forums, and other communities. Implementing this structured data will help ensure what Search shows in these features is as accurate and complete as possible.”
‘ProfilePage’ Markup
New ‘ProfilePage’ structured data will allow Google Search to better identify key information about content creators, including their name, social media handle, number of followers, profile photo, and content popularity.
The goal is to deliver accurate information about creators of first-person content in Google Search. New markup gives Google extra data to display enhanced creator profiles and metrics related to their content.
‘DiscussionForumPosting’ Markup
A new markup called DiscussionForumPosting is intended for use on forums where users share personal perspectives and experiences.
The markup will help Google Search better identify and surface online discussions from across the web. However, Google stated that this markup doesn’t guarantee inclusion in the “Perspectives” and “Discussions and Forums” sections.
Discussion Forums vs Q&As
Google provided the following guidance on when to use DiscussionForumPosting versus Q&A markup.
Q&A markup is recommended for forums structured around questions and answers, while DiscussionForumPosting is better suited for more open-ended forums that do not follow a strict question-answer format.
Search Console Support
Google is launching a new Search Console report to help monitor the use of the new structured data. The reports will provide details on errors, warnings, and valid items detected on pages with markup.
In its Rich Results Test tool, Google supports testing and validating profile pages and discussion forum markup.
Understanding Structured Data
Structured data uses unique code to help search engines better understand your website content. It provides context so search engines know what your content is about.
For example, if two known content creators have the same name, adding structured data would help Google distinguish them and display their information accurately.
Potential Implications of Google’s Announcement
With this new structured data for discussion forums and profile pages, Google is helping content creators better control how their content appears in search results.
These are some potential implications of the new structured data:
Improved Search Appearance: When implemented correctly, structured data can help your webpages stand out in Google, potentially increasing click-through rates.
More Visibility For Creators: Individual creators may have a better chance of getting highlighted in search results alongside large publishers. This could potentially democratize content visibility.
Audience Growth: Google’s new markup will allow it to better identify content creators and their information, including links to social profiles. This could be a boon to anyone who depends on their brand for visibility.
Remember that this is a voluntary feature. You’re not required to implement this structured data. However, those who do may enjoy improved visibility in Google’s search results.
Google’s John Mueller responded to a thread in Reddit about finding and fixing inbound broken links, offering a nuanced insight that some broken links are worth finding and fixing and others are not.
Reddit Question About Inbound Broken Links
Someone asked on Reddit if there’s a way to find broken links for free.
This is the question:
“Is it possible to locate broken links in a similar manner to identifying expired domain names?”
The person asking the question clarified if this was a question about an inbound broken link from an external site.
John Mueller Explains How To Find 404 Errors To Fix
John Mueller responded:
“If you want to see which links to your website are broken & “relevant”, you can look at the analytics of your 404 page and check the referrers there, filtering out your domain.
This brings up those which actually get traffic, which is probably a good proxy.
If you have access to your server logs, you could get it in a bit more detail + see which ones search engine bots crawl.
It’s a bit of technical work, but no external tools needed, and likely a better estimation of what’s useful to fix/redirect.”
In his response, John Mueller answers the question on how to find 404 responses caused by broken inbound links and identify what’s “useful to fix” or to “redirect.”
Mueller Advises On When Not To “Fix” 404 Pages
John Mueller next offered advice on when it doesn’t make sense to not fix a 404 page.
Mueller explained:
“Keep in mind that you don’t have to fix 404 pages, having things go away is normal & fine.
The SEO ‘value’ of bringing a 404 back is probably less than the work you put into it.”
Some 404s Should Be Fixed And Some Don’t Need Fixing
John Mueller said that there are situations where a 404 error generated from an inbound link is easy to fix and suggested ways to find those errors and fix them.
Mueller also said that there are some cases where it’s basically a waste of time.
What wasn’t mentioned was what the difference was between the two and this may have caused some confusion.
Inbound Broken Links To Existing Webpages
There are times when another sites links into your site but uses the wrong URL. Traffic from the broken link on the outside site will generate a 404 response code on your site.
These kinds of links are easy to find and useful to fix.
There are other situations when an outside site will link to the correct webpage but the webpage URL changed and the 301 redirect is missing.
Those kinds of inbound broken links are also easy to find and useful to fix. If in doubt, read our guide on when to redirect URLs.
In both of those cases the inbound broken links to the existing webpages will generate a 404 response and this will show up in server logs, Google Search Console and in plugins like the Redirection WordPress plugin.
If the site is on WordPress and it’s using the Redirection plugin, identifying the problem is easy because the Redirection plugin offers a report of all 404 responses with all the necessary information for diagnosing and fixing the problem.
In the case where the Redirection plugin isn’t used one can also hand code an .htaccess rule for handling the redirect.
Lastly, one can contact the other website that’s generating the broken link and ask them to fix it. There’s always a small chance that the other site might decide to remove the link altogether. So it might be easier and faster to just fix it on your side.
Whichever approach is taken to fix the external inbound broken link, finding and fixing these issues is relatively simple.
Inbound Broken Links To Removed Pages
There are other situations where an old webpage was removed for a legitimate reason, like an event passed or a service is no longer offered.
In that case it makes sense to just show a 404 response code because that’s one of the reasons why a 404 response should be shown. It’s not a bad thing to show a 404 response.
Some people might want to get some value from the inbound link and create a new webpage to stand in for the missing page.
But that might not be useful because the link is for something that is irrelevant and of no use because the reason for the page no longer exists.
Even if you create a new reason, it’s possible that some of that link equity might flow to the page but it’s useless because the topic of that inbound link is totally irrelevant to anyting but the expired reason.
Redirecting the missing page to the home page is a strategy that some people use to benefit from the link to a page that no longer exists. But Google treats those links as Soft 404s, which then passes no benefit.
These are the cases that John Mueller was probably referring to when he said:
“…you don’t have to fix 404 pages, having things go away is normal & fine.
The SEO ‘value’ of bringing a 404 back is probably less than the work you put into it.”
Mueller is right, there are some pages that should be gone and totally removed from a website and the proper server response for those pages should be a 404 error response.
Inflection AI, the creators of the PI AI Personal Assistant announced the creation of a powerful new large language model called Inflection-2 that outperforms Google’s PaLM language model across a range of benchmarking datasets.
Pi Personal Assistant
Pi is a personal assistant that is available on the web and as an app for Android and Apple mobile devices.
It can also be added as a contact in WhatsApp and accessed via Facebook and Instagram direct message.
Pi is designed to be a chatbot assistant that can answer questions, research anything from products, science, or products and it can function like a discussion companion that dispenses advice.
The new LLM will be incorporated into PI AI soon after undergoing safety testing.
Inflection-2 Large Language Model
Inflection-2 is a large language model that outperforms Google’s PaLM 2 Large model, which is currently Google’s most sophisticated model.
Inflection-2 was tested across multiple benchmarks and compared against PaLM 2 and Meta’s LLaMA 2 and other large language models (LLMs).
For example, Google’s PaLM 2 barely edged past Inflection-2 on the Natural Questions corpus, a dataset of real-world questions.
PaLM 2 scored 37.5 and Inflection-2 scored 37.3, with both outperforming LLaMA 2, which scored 33.0.
MMLU – Massive Multitask Language Understanding
Inflection AI published the benchmarking scores on the MMLU dataset, which is designed to test LLMs in a way that’s similar to testing humans.
The test is on 57 subjects in STEM (Science, Technology, Engineering, and Math) and a wide range of other subjects like law.
The purpose of the dataset is to identify where the LLM is strongest and where it is weak.
According to the research paper for this benchmarking dataset:
“We propose a new test to measure a text model’s multitask accuracy.
The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.
To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability.
We find that while most recent models have near random-chance accuracy, the very largest GPT-3 model improves over random chance by almost 20 percentage points on average.
However, on every one of the 57 tasks, the best models still need substantial improvements before they can reach expert-level accuracy.
Models also have lopsided performance and frequently do not know when they are wrong.
Worse, they still have near-random accuracy on some socially important subjects such as morality and law.
By comprehensively evaluating the breadth and depth of a model’s academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.”
These are the MMLU benchmarking dataset scores in order of weakest to strongest:
LLaMA 270b 68.9
GPT-3.5 70.0
Grok-1 73.0
PaLM-2 Large 78.3
Claude-2 _CoT 78.5
Inflection-2 79.6
GPT-4 86.4
As can be seen above, only GPT-4 scores higher than Inflection-2.
MBPP – Code and Math Reasoning Performance
Inflection AI did a head to head comparison between GPT-4, PaLM 2, LLaMA and Inflection-2 on math and code reasoning tests and did surprisingly well considering that it was not specifically trained for solving math problems.
The benchmarking dataset used is called MBPP (Mostly Basic Python Programming) This dataset consists of over 1,000 crowd-sourced Python programming problems.
What makes the scores especially notable is that Inflection AI tested against PaLM-2S, which is a variant large language model that was specifically fine-tuned for coding.
MBPP Scores:
LLaMA-2 70B: 45.0
PaLM-2S: 50.0
Inflection-2: 53.0
Screenshot of Complete MBPP Scores
HumanEval Dataset Test
Inflection-2 also outperformed PaLM-2 on the HumanEval problem solving dataset that was developed and released by OpenAI.
“The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests.
They were handwritten to ensure not to be included in the training set of code generation models.
The programming problems are written in Python and contain English natural text in comments and docstrings.
The dataset was handcrafted by engineers and researchers at OpenAI.”
These are the scores:
LLaMA-2 70B: 29.9
PaLM-2S: 37.6
Inflection-2: 44.5
GPT-4: 67.0
As can be seen above, only GPT-4 scored higher than Inflection-2. Yet it should again be noted that Inflection-2 was not fine-tuned to solve these kinds of problems, which makes these scores an impressive achievement.
Screenshot of Complete HumanEval Scores
Inflection AI explains why these scores are significant:
“Results on math and coding benchmarks.
Whilst our primary goal for Inflection-2 was not to optimize for these coding abilities, we see strong performance on both from our pre-trained model.
It’s possible to further enhance our model’s coding capabilities by fine-tuning on a code-heavy dataset.”
An Even More Powerful LLM Is Coming
The Inflection AI announcement stated that Inflection-2 was trained on 5,000 NVIDIA H100 GPUs. They are planning on training an even larger model on a 22,000 GPU cluster, several orders bigger than the 5,000 GPU cluster Inflection-2 was trained on.
Google and OpenAI are facing strong competition from both closed and open source startups. Inflection AI joins the top ranks of startups with powerful AI under development.
The PI personal assistant is a conversational AI platform with an underlying technology that is state of the art with the possibility of becoming even more powerful than other platforms that charge for access.
Google announced that it is sunsetting the search console crawl rate limiter tool, scheduled to be removed on January 8, 2024, citing improvements to crawling that has essentially made it unnecessary.
Search Console Crawl Rate Limiter Tool
The crawl rate limiter tool was introduced to search console fifteen years ago in 2008. The purpose of the tool was to provide publishers a way to control Googlebot crawling so that it didn’t overwhelm the server.
There was a time when some publishers experienced too much crawling, which could result in the server being unable to server webpages to users.
Enough people complained that Google eventually released the tool within search console.
The impact of the tool was to provide Google with data. According to Google, requests to limit crawling typically took about a day to go into effect and remained in effect for 90 days.
Why Google Is Removing Rate Limiter Tool
The announcement stated that crawling algorithms have reached a state where Googlebot can automatically sense when a server is reaching capacity and take immediate action to slow down the crawl rate.
Furthermore, Google stated that the tool was rarely used and when it was used, the crawl rate was generally set to the lowest setting.
Moving forward, the minimum crawl rate will by default be set to a lower rate similar to what publishers tended to request.
According to the announcement:
“With the deprecation of the crawl limiter tool, we’re also setting the minimum crawling speed to a lower rate, comparable to the old crawl rate limits.
This means that we effectively continue honoring the settings that some site owners have set in the past if the Search interest is low, and our crawlers don’t waste the site’s bandwidth.”
Making Search Console Less Complex
Removing the tool makes search console easier to use because it’s less cluttered with tools that are rarely used.
This in turn should improve the user experience for search console.
Publishers who continue to have problems with Googlebot’s crawl rate can still use the Googlebot report form to send feedback to Google.