They presented five generative AI chatbots—Gemini (2.0, Google; version available December 2024), DeepSeek (V3, High-Flyer; version available December 2024), Meta AI (Llama 3.3, Meta; version available December 2024), ChatGPT (3.5, OpenAI; version available November 2022) and Grok (2, xAI; version available August 2024)—with a series of closed- and open-ended prompts across five misinformation-prone categories.
Couldn’t the researchers at least bother to use the latest models?
The study was done in Feb 2025 and they probably wrote the research proposal months before then, waited for approval / funding, etc. I don’t know the process of how academia works but I imagine it to be very slow and bureaucratic.
Well they didn’t even use the latest models in Feb 2025. They should’ve used DeepSeek R1 and OpenAI o3-mini which use additional test time compute to arrive at better answers. They used GPT 3.5 which was about 2½ years old at the time.
Couldn’t the researchers at least bother to use the latest models?
The study was done in Feb 2025 and they probably wrote the research proposal months before then, waited for approval / funding, etc. I don’t know the process of how academia works but I imagine it to be very slow and bureaucratic.
https://bmjopen.bmj.com/content/16/4/e112695
Well they didn’t even use the latest models in Feb 2025. They should’ve used DeepSeek R1 and OpenAI o3-mini which use additional test time compute to arrive at better answers. They used GPT 3.5 which was about 2½ years old at the time.