Register now for better personalized quote!

HOT NEWS

New benchmark exposes limits of current AI tools

Jul, 25, 2025 Hi-network.com

Anew coding competition has exposed the limitations of current AI models, with the winner solving just 7.5% of programming problems. The K Prize, launched by Databricks and Perplexity co-founder, aims to challenge smaller models using real-world GitHub issues in a contamination-free format.

Despite the low score, Eduardo Rocha de Andrade took home the$50,000 top prize. Konwinski says the intentionally tough benchmark helps avoid inflated results and encourages realistic assessments of AI capability.

Unlike the better-known SWE-Bench, which may allow models to train on test material, the K Prize uses only new issues submitted after a set deadline. Its design prevents exposure during training, making it a more reliable measure of generalisation.

A $1 million prize remains for any open-source model that scores over 90%. The low results are being viewed as a necessary wake-up call in the race to build competent AI software engineers.

tag-icon Hot Tags : Artificial Intelligence Capacity development Development

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.
Our company's operations and information are independent of the manufacturers' positions, nor a part of any listed trademarks company.