Took a look cause, as frustrating as it’d be, it’d still be a step in the right direction. But no, they’re still adamant that it’s just a “quirk”.
Conclusions
We hope that the statistical lens in our paper clarifies the nature of hallucinations and pushes back on common misconceptions:
Claim: Hallucinations will be eliminated by improving accuracy because a 100% accurate model never hallucinates. Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.
Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.
Claim: Avoiding hallucinations requires a degree of intelligence which is exclusively achievable with larger models. Finding: It can be easier for a small model to know its limits. For example, when asked to answer a Māori question, a small model which knows no Māori can simply say “I don’t know” whereas a model that knows some Māori has to determine its confidence. As discussed in the paper, being “calibrated” requires much less computation than being accurate.
Claim: Hallucinations are a mysterious glitch in modern language models. Finding: We understand the statistical mechanisms through which hallucinations arise and are rewarded in evaluations.
Claim: To measure hallucinations, we just need a good hallucination eval. Finding: Hallucination evals have been published. However, a good hallucination eval has little effect against hundreds of traditional accuracy-based evals that penalize humility and reward guessing. Instead, all of the primary eval metrics need to be reworked to reward expressions of uncertainty.
Infuriating.
Maybe design the AI to be honest and admit that it is not sure or doesn’t know?
Edit: thank you for all your interesting and thorough answers.
There’s some really good answers here already, but I want to try to key in on one part of your question in particular to try to convey why this idea just fundamentally doesn’t work.
The problem, put very simply, is that the AI never, ever “knows” anything. For it to be able to admit when it doesn’t know, it would first have to have the ability to know things, and to discern the difference between knowing and not knowing.
This is what I’ve been getting at with something I’ve been saying for a while now; LLMs don’t hallucinate some answers, they hallucinate every answer.
An LLM is basically a mathematical model whose job is to create convincing bullshit. When that bullshit happens to align with reality, we humans go “Wow, that’s amazing, how did it know that?” and when it happens to not align we go "Stupid machine hallucinated again. But this is just our propensity for anthropomorphism at work.
In reality what’s happening is closer to how “psychics” do their shtick. I can say “I’m sensing that someone here recently lost a loved one” and it looks like I have supernatural powers but really I’m just playing the odds. The only difference is that the psychic knows they’re bullshitting. The AI doesn’t, because it does not have a mind, it cannot think, so there is noting there to perceive the concept of objective reality at all. It’s just a really, really large bingo ball tumbler spitting out balls.
It’s really hard to get your head around this, because LLMs fucking crush the Turing test; it really does feel like we’re talking, if not to a human, than at least to a machine that is capable of thought. Typing a question and getting a meaningful answer back makes it really hard to digest that we’re having a conversation with a machine that has no more capacity for thought than a deck of cards.
The problem is that an LLM is a language model, not an objective reality model, so the best it can do is estimate the probability of a particular sentence appearing in the language, but not the probability that the sentence represents a true statement according to our objective reality.
They seem to think that they can use these confidence measures to filter the output when it is not confident of being correct, but there are an infinite number of highly probable sentences in a language which are false in reality. An LLM has no way of distinguishing between unlikely and false, or between likely and true.
language model, not an objective reality model
Sort of. It’s a generic prediction model. Multi-modal models work the same way as text-only models in this sense.
So do organic brains.
Right now, you are “hallucinating” most of your visual field outside the fovea centralis.
This aspect of your conscious perceptual system is exactly the same kind of high-dimensional interpolation that ML neural networks do.



