There’s some really good answers here already, but I want to try to key in on one part of your question in particular to try to convey why this idea just fundamentally doesn’t work.
The problem, put very simply, is that the AI never, ever “knows” anything. For it to be able to admit when it doesn’t know, it would first have to have the ability to know things, and to discern the difference between knowing and not knowing.
This is what I’ve been getting at with something I’ve been saying for a while now; LLMs don’t hallucinate some answers, they hallucinate every answer.
An LLM is basically a mathematical model whose job is to create convincing bullshit. When that bullshit happens to align with reality, we humans go “Wow, that’s amazing, how did it know that?” and when it happens to not align we go "Stupid machine hallucinated again. But this is just our propensity for anthropomorphism at work.
In reality what’s happening is closer to how “psychics” do their shtick. I can say “I’m sensing that someone here recently lost a loved one” and it looks like I have supernatural powers but really I’m just playing the odds. The only difference is that the psychic knows they’re bullshitting. The AI doesn’t, because it does not have a mind, it cannot think, so there is noting there to perceive the concept of objective reality at all. It’s just a really, really large bingo ball tumbler spitting out balls.
It’s really hard to get your head around this, because LLMs fucking crush the Turing test; it really does feel like we’re talking, if not to a human, than at least to a machine that is capable of thought. Typing a question and getting a meaningful answer back makes it really hard to digest that we’re having a conversation with a machine that has no more capacity for thought than a deck of cards.
There’s some really good answers here already, but I want to try to key in on one part of your question in particular to try to convey why this idea just fundamentally doesn’t work.
The problem, put very simply, is that the AI never, ever “knows” anything. For it to be able to admit when it doesn’t know, it would first have to have the ability to know things, and to discern the difference between knowing and not knowing.
This is what I’ve been getting at with something I’ve been saying for a while now; LLMs don’t hallucinate some answers, they hallucinate every answer.
An LLM is basically a mathematical model whose job is to create convincing bullshit. When that bullshit happens to align with reality, we humans go “Wow, that’s amazing, how did it know that?” and when it happens to not align we go "Stupid machine hallucinated again. But this is just our propensity for anthropomorphism at work.
In reality what’s happening is closer to how “psychics” do their shtick. I can say “I’m sensing that someone here recently lost a loved one” and it looks like I have supernatural powers but really I’m just playing the odds. The only difference is that the psychic knows they’re bullshitting. The AI doesn’t, because it does not have a mind, it cannot think, so there is noting there to perceive the concept of objective reality at all. It’s just a really, really large bingo ball tumbler spitting out balls.
It’s really hard to get your head around this, because LLMs fucking crush the Turing test; it really does feel like we’re talking, if not to a human, than at least to a machine that is capable of thought. Typing a question and getting a meaningful answer back makes it really hard to digest that we’re having a conversation with a machine that has no more capacity for thought than a deck of cards.