The Limits of AI in History: A New Benchmark Reveals Shortcomings

"AI struggles with history: New research reveals limitations of LLMs in answering complex historical questions."
Matilda
The Limits of AI in History: A New Benchmark Reveals Shortcomings
Artificial intelligence, particularly large language models (LLMs) like GPT-4 and Bard, has demonstrated remarkable capabilities across a wide range of tasks. These models can generate human-like text, translate languages, write different kinds of creative content, and answer your questions in an informative way. However, a recent study published in the esteemed NeurIPS conference has revealed a significant limitation: LLMs struggle to accurately answer complex historical questions.    The Hist-LLM Benchmark To assess the historical knowledge of LLMs, researchers developed a novel benchmark called Hist-LLM. This benchmark leverages the Seshat Global History Databank, a comprehensive repository of historical information, to evaluate the accuracy of LLM responses against established historical facts. Testing the Limits: GPT-4, Llama, and Gemini Three leading LLMs were put to the test: OpenAI's GPT-4, Meta's Llama, and Google's Gemini. The results were less than impressive. Even …