• pooterbroo@programming.dev
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      11 days ago

      Well they didn’t even use the latest models in Feb 2025. They should’ve used DeepSeek R1 and OpenAI o3-mini which use additional test time compute to arrive at better answers. They used GPT 3.5 which was about 2½ years old at the time.