AI Collapse: Leading Language Models Fail Hardest Math Tests FrontierMath
The latest artificial intelligence systems performed extremely poorly on advanced math problems created by elite mathematicians, passing only 2% of tests.
The Limits of AI in Math
New Math Challenge
The Epoch AI Institute has developed an innovative suite of tests, FrontierMath, that requires deep knowledge at the doctoral level. Prominent mathematicians, including winners of the prestigious Fields Medal, participated in the creation of the tests.
Assessing AI capabilities
The traditional MMLU system, covering 57 areas of knowledge from mathematics to law, showed high efficiency of AI models - 98% successful solutions for academic level problems. However, new FrontierMath tests have changed the picture dramatically.
Test results
Leading AI systems were tested during the study. The leaders were Gemini 1.5 Pro (002) from Google and Claude 3.5 Sonnet from Anthropic, which solved 2% of the problems. OpenAI systems - o1-preview, o1-mini and GPT-4o - managed only 1%, and xAI's Grok-2 Beta was unable to solve a single problem.
Assessment features
The researchers emphasize that even correct answers did not always mean an understanding of the mathematical essence - some solutions were obtained by simply modeling without deep mathematical analysis.
Glossary
- Epoch AI is a research institute specializing in the study of artificial intelligence
- FrontierMath - a set of complex mathematical tests to evaluate the capabilities of AI
- Fields Medal - a prestigious award in the field of mathematics, considered analogous to the Nobel Prize
- Terence Tao - distinguished mathematician, winner of the 2006 Fields Medal
- MMLU - standardized testing system for assessing AI capabilities
Links
- Livescience - popular science portal
Hashtags
Save a link to this article
Discussion of the topic – AI Collapse: Leading Language Models Fail Hardest Math Tests FrontierMath
The study found that even the most advanced AI models (Gemini, Claude and GPT-4) solved only 2% of complex doctoral-level math problems developed by the world's leading mathematicians, including Fields Medal winners
Latest comments
8 comments
Write a comment
Your email address will not be published. Required fields are checked *
Maximilian
Interestingly, even the most advanced AIs have stumbled on difficult tasks. 2% is nothing at all! 🤔 Although Gemini and Claude are great, at least they decided something.
Sophie
And it seems to me that this is normal. The AI is still learning. I work with GPT-4 every day and it works great for common tasks. And the fact that he cannot solve super-complicated mathematics is even good. This means that human intelligence is still unrivaled 😊
Giuseppe
Sophie, I agree! But I was surprised that Grok-2 didn't solve anything at all. Although Musk praised him so much 🤷♂️
Viktor
All this fuss with AI is a waste of time and money. Previously, mathematicians could do it just fine without any neural networks, and they can do it now. Just another hype and nothing more. 😤
Amelie
Viktor, but AI already solves 98% of university-level problems! This is huge progress. Imagine how this can help in learning 📚
Giuseppe
Amelie is right! My son is studying at university, so he uses AI to check decisions. This really saves time 👍
Sophie
Interestingly, even when the AI gave the correct answer, it could simply be a lucky guess rather than an understanding. Just like some students on exams 😅
Maximilian
I think that in a couple of years this 2% will turn into 20%, and then it will reach 50%. Progress cannot be stopped! 🚀