In the new Google DeepMind AI fact test, Gemini wins
Advertisement
Google DeepMind presents FACTS Grounding, a new benchmark that tests the ability of AI models to provide fact-based and detailed answers based on given texts. The test includes 1,719 carefully selected examples from various disciplines such as finance, technology, retail, medicine and law.
A special feature of the test is the evaluation method: three leading AI models – Gemini 1.5 Pro, GPT-4o and Claude 3.5 Sonnet – act as judges and evaluate the answers according to two criteria. First check whether the request is answered adequately. You then evaluate the actual correctness and whether the answer is fully anchored in the provided document.
To prevent manipulation, Google DeepMind is splitting the benchmark in half: 860 public examples are immediately available, while 859 examples are withheld. The final rating is based on the average of both sets. The benchmark should be continuously developed to make language models reliable and usable for more application scenarios.
Nvidia's compact “Jetson Orin Nano Super” for generative AI
Nvidia is introducing the Jetson Orin Nano Super, a new, compact and equipped “supercomputer” for generative AI. According to Nvidia, the developer kit, which costs just under $250, offers up to 70 percent increased performance and a 50 percent higher memory bandwidth compared to its predecessor – part of this through software updates; the chip is still based on the Ampere architecture.
According to the company, Jetson Orin Nano Super is suitable for commercial AI developers, hobbyists and students to develop skills in generative AI, robotics or computer vision. The software updates to improve performance are also available to owners of the old Jetson Orin Nano Developer Kit. Products like Jetson Orin are important because they enable the operation of small AI models without a direct cloud connection – so-called “Edge AI” applications, for example AI models directly in production or robotics.
AI filters in the Nvidia app can slow down games
The GeForce graphics cards apparently lead to a loss of performance when playing games when the new Nvidia app is installed. According to user reports, performance is said to be reduced by up to 15 percent, even though the functions of the Nvidia app are not actively used. Uninstalling the app causes the problem, but according to Nvidia it is not necessary. The manufacturer recommends turning off some AI functions of the Nvidia app until a corresponding update is available. In November, the Nvidia app replaced the previous GeForce Experience. Since then, the app has served as a central point of contact for hardware control, driver downloads and game optimization for GeForce graphics cards.
The AI filters have therefore led to a loss of performance even though they are not actually used and the app simply runs in the background as Nvidia intended. The manufacturer has now confirmed this in the GeForce forum. At the same time, Nvidia describes how the problem can be resolved in the short term. Players should turn off the game filter in the Nvidia app settings and then restart the game. This should bring the performance of the graphics card back to its actual level. Nvidia has not yet said why the app's AI filter affects performance.
Germany should become a world leader with energy-saving AI
In a position paper, the Information Technology Society (ITG) emphasizes the importance of overcoming the challenges of large language models. This is the only way to maximize the benefits of the current drivers in the field of artificial intelligence in a sustainable way. “There is global talk of building more nuclear power plants to satisfy the hunger for energy,” explains Damian Dudek, ITG managing director and co-author of the paper. “We are talking about making data processing more efficient and reducing energy consumption so that AI can be used sustainably.”
In addition to reducing feature size and increasing the number of transistors per chip area in accordance with Moore's Law, there are several new approaches to overcome the limitations of the traditional method of chip design, according to the authors. “Germany should “stay on the ball” in these areas, emphasizes Dudek. Only those who achieve a technologically leading position through innovation have the opportunity to define a globally accepted framework with regard to ethical requirements and data security.
YouTube gives creators control over AI training with their videos
The video platform YouTube is introducing new settings for AI training that allow content creators to determine whether AI companies such as OpenAI, Meta or Google can use their videos to train models. The new option will be available in YouTube Studio Settings under “Third Party Training.” Sharing requires consent from all rights holders and only applies to publicly available videos that comply with YouTube guidelines.
However, YouTube emphasizes that it has no influence on how the AI companies ultimately use the content. The platform sees the new function as a first step in enabling video creators to unlock new value from their content in the age of AI.
How intelligent is artificial intelligence? What consequences does generative AI have for our work, our leisure time and society? In Heise's “AI update” we, together with The Decoder, bring you updates on the most important AI developments every weekday. On Fridays we look at the different aspects of the AI revolution with experts.
Salesforce extends Agentforce AI with pre-built workflows
Salesforce continues to develop its AI solution Agentforce and announces Agentforce 2.0 after just three months. The AI agents are designed to make decisions and carry out actions autonomously within certain limits in sales and customer service, such as managing orders. Because many customers struggle to create their own agents, Salesforce is now introducing a skill library for sales and marketing.
These skills can be searched in the Agent Builder and integrated as building blocks, with the system suggesting suitable skills for specific tasks. The skills also support integration with Tableau for data visualization and prediction, and Slack for automated channel summaries. The new version Agentforce 2.0 is scheduled for release in February 2025, with Slack integration planned for January. Sales features start at $2 per conversation.
OpenAI expands API with o1 model with function calling and more
OpenAI has released a new version of the o1 model for the API and ChatGPT. On average, they should require 60 percent fewer tokens for reasoning tasks than the previous version o1-preview, which was previously available via API. In addition, they should significantly outperform them in mathematics and programming tasks as well as when calling functions.
OpenAI also reduces the prices for the real-time API for voice assistants by 60 percent. With Preference Fine-Tuning, the company is also introducing a new method for adapting AI models to usage and development preferences by learning from pairs of preferred and non-preferred responses. According to OpenAI, this method is particularly suitable for subjective tasks such as creative writing or summaries where tone and style are important.
The autonomous “SwagBot” is intended to guide cattle to pastures
An autonomous shepherd robot called SwagBot is intended to help out with livestock and pasture management. It is being developed by the start-up Agerris, a spin-off from the University of Sydney, which has been working on robots for agriculture since 2005. In Australia's vast rangelands, the autonomous robot can relieve farmers of work by collecting data and keeping track of cattle and pastures.
The SwagBot has four legs with wheels and is equipped with various sensors and cameras. Using machine learning, he should be able to recognize the health status of cows or see whether a pasture has already been grazed. “Once the cattle get used to the robot, they will follow it everywhere,” explains Salah Sukkarieh, professor of robotics at the University of Sydney and Agerris CEO. In addition to the SwagBot, the start-up is also developing other robots such as Rippa, which monitors fields and orchards, or another robot that can pull weeds. Research teams in this country are also working on similar AI projects.
(igr)