Anew coding competition has exposed the limitations of current AI models, with the winner solving just 7.5% of programming problems. The K Prize, launched by Databricks and Perplexity co-founder, aims to challenge smaller models using real-world GitHub issues in a contamination-free format.
Despite the low score, Eduardo Rocha de Andrade took home the$50,000 top prize. Konwinski says the intentionally tough benchmark helps avoid inflated results and encourages realistic assessments of AI capability.
Unlike the better-known SWE-Bench, which may allow models to train on test material, the K Prize uses only new issues submitted after a set deadline. Its design prevents exposure during training, making it a more reliable measure of generalisation.
A $1 million prize remains for any open-source model that scores over 90%. The low results are being viewed as a necessary wake-up call in the race to build competent AI software engineers.