• kingofras@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    1 month ago

    it doesn’t matter. the principle is that if x is the length of your context window, then at 0.4x the chance of hallucinations start increasing exponentially. we’re now at token windows of 1M, and all it does is shift that hallucination window further away, so the model ‘feels’ stronger because it takes longer before it hallucinates, but eventually it always does.