On the Use of Large Language Models at Solving Math Problems: A Comparison Between GPT-4, LlaMA-2 and Gemini
Author | |
Keywords | |
Abstract |
In November 2022, ChatGPT v3.5 was announced to the world. Since then, Generative Artificial Intelligence (GAI) has appeared in the news almost daily, showing impressive capabilities at solving multiple tasks that have surprised the research community and the world in general. Indeed the number of tasks that ChatGPT and other Large Language Models (LLMs) can do are unimaginable, especially when dealing with natural text. Text generation, summarisation, translation, and transformation (into poems, songs, or other styles) are some of its strengths. However, when it comes to reasoning or mathematical calculations, ChatGPT finds difficulties. In this work, we compare different flavors of ChatGPT (v3.5, v4, and Wolfram GPT) at solving 20 mathematical tasks, from high school and first-year engineering courses. We show that GPT-4 is far more powerful than
ChatGPT-3.5, and further that the use of Wolfram GPT can even slightly improve the results obtained with GPT-4 at these mathematical tasks. |
Year of Publication |
In Press
|
Journal |
International Journal of Interactive Multimedia and Artificial Intelligence
|
Volume |
In press
|
Start Page |
1
|
Issue |
In press
|
Number |
In press
|
Number of Pages |
1-11
|
Date Published |
03/2025
|
ISSN Number |
1989-1660
|
URL | |
DOI | |
Attachment |
ip2025_03_001.pdf912.28 KB
|