01701nas a2200241 4500000000100000000000100001008004100002260001200043653001200055653001800067653002600085653002400111100003300135700002000168700002900188700002900217245011300246856005800359300000900417490001300426520100600439022001401445 9998 d c03/202510aChatGPT10aGenerative AI10aMathematical Problems10aWolfram Mathematica1 aAlejandro L. García Navarro1 aNataliia Koneva1 aJosé Alberto Hernández1 aAlfonso Sánchez-Macián00aOn the Use of Large Language Models at Solving Math Problems: A Comparison Between GPT-4, LlaMA-2 and Gemini uhttps://www.ijimai.org/journal/bibcite/reference/3565 a1-110 vIn press3 aIn November 2022, ChatGPT v3.5 was announced to the world. Since then, Generative Artificial Intelligence (GAI) has appeared in the news almost daily, showing impressive capabilities at solving multiple tasks that have surprised the research community and the world in general. Indeed the number of tasks that ChatGPT and other Large Language Models (LLMs) can do are unimaginable, especially when dealing with natural text. Text generation, summarisation, translation, and transformation (into poems, songs, or other styles) are some of its strengths. However, when it comes to reasoning or mathematical calculations, ChatGPT finds difficulties. In this work, we compare different flavors of ChatGPT (v3.5, v4, and Wolfram GPT) at solving 20 mathematical tasks, from high school and first-year engineering courses. We show that GPT-4 is far more powerful than ChatGPT-3.5, and further that the use of Wolfram GPT can even slightly improve the results obtained with GPT-4 at these mathematical tasks. a1989-1660