Present AI benchmarks battle to maintain up with trendy fashions. Whereas helpful for measuring a mannequin’s efficiency on a specific process, it may be tough to know whether or not a mannequin skilled on web information is definitely fixing the issue or simply remembering a solution it is already seen. As a mannequin approaches 100% on a specific benchmark, it additionally turns into much less efficient at revealing significant efficiency variations. As we proceed to put money into new and more difficult benchmarks, the trail to basic intelligence requires us to proceed to search for new analysis strategies. Extra lately, the transfer to dynamic, human-judged exams has solved these memorization and saturation issues, however has launched new challenges as a result of inherent subjectivity of human preferences.
Whereas we proceed to evolve and pursue present AI benchmarks, we additionally all the time purpose to check new approaches to evaluating fashions. So immediately we are going to introduce: Kaggle Game Arena: A brand new public AI benchmarking platform the place AI fashions compete instantly in strategic video games and supply verifiable, dynamic measurements of their capabilities.

