Chatbots Make Terrible Doctors, New Study Finds

XLE@piefed.social · edit-2 4 months ago

Chatbots Make Terrible Doctors, New Study Finds

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

deleted by creator

irate944@piefed.social · 4 months ago

I could’ve told you that for free, no need for a study

rudyharrelson@lemmy.radio · edit-2 4 months ago

People always say this on stories about “obvious” findings, but it’s important to have verifiable studies to cite in arguments for policy, law, etc. It’s kinda sad that it’s needed, but formal investigations are a big step up from just saying, “I’m pretty sure this technology is bullshit.”

I don’t need a formal study to tell me that drinking 12 cans of soda a day is bad for my health. But a study that’s been replicated by multiple independent groups makes it way easier to argue to a committee.

irate944@piefed.social · 4 months ago

Yeah you’re right, I was just making a joke.

But it does create some silly situations like you said

rudyharrelson@lemmy.radio · 4 months ago

I figured you were just being funny, but I’m feeling talkative today, lol

BillyClark@piefed.social · 4 months ago

it’s important to have verifiable studies to cite in arguments for policy, law, etc.

It’s also important to have for its own merit. Sometimes, people have strong intuitions about “obvious” things, and they’re completely wrong. Without science studying things, it’s “obvious” that the sun goes around the Earth, for example.

I don’t need a formal study to tell me that drinking 12 cans of soda a day is bad for my health.

Without those studies, you cannot know whether it’s bad for your health. You can assume it’s bad for your health. You can believe it’s bad for your health. But you cannot know. These aren’t bad assumptions or harmful beliefs, by the way. But the thing is, you simply cannot know without testing.

Slashme@lemmy.world · 4 months ago

Or how bad something is. “I don’t need a scientific study to tell me that looking at my phone before bed will make me sleep badly”, but the studies actually show that the effect is statistically robust but small.

In the same way, studies like this can make the distinction between different levels of advice and warning.

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

deleted by creator

rumba@lemmy.zip · 4 months ago

Chatbots make terrible everything.

But an LLM properly trained on sufficient patient data metrics and outcomes in the hands of a decent doctor can cut through bias, catch things that might fall through the cracks and pack thousands of doctors worth of updated CME into a thing that can look at a case and go, you know, you might want to check for X. The right model can be fucking clutch at pointing out nearly invisible abnormalities on an xray.

You can’t ask an LLM trained on general bullshit to help you diagnose anything. You’ll end up with 32,000 Reddit posts worth of incompetence.

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

deleted by creator

BeigeAgenda@lemmy.ca · 4 months ago

Anyone who have knowledge about a specific subject says the same: LLM’S are constantly incorrect and hallucinate.

Everyone else thinks it looks right.

IratePirate@feddit.org · edit-2 4 months ago

A talk on LLMs I was listening to recently put it this way:

If we hear the words of a five-year-old, we assume the knowledge of a five-year-old behind those words, and treat the content with due caution.

We’re not adapted to something with the “mind” of a five-year-old speaking to us in the words of a fifty-year-old, and thus are more likely to assume competence just based on language.

leftzero@lemmy.dbzer0.com · 4 months ago

LLMs don’t have the mind of a five year old, though.

They don’t have a mind at all.

They simply string words together according to statistical likelihood, without having any notion of what the words mean, or what words or meaning are; they don’t have any mechanism with which to have a notion.

They aren’t any more intelligent than old Markov chains (or than your average rock), they’re simply better at producing random text that looks like it could have been written by a human.

plyth@feddit.org · 4 months ago

They simply string words together according to statistical likelihood, without having any notion of what the words mean

What gives you the confidence that you don’t do the same?

Digit@lemmy.wtf · 4 months ago

human: je pense

llm: je ponce

zewm@lemmy.world · 4 months ago

It is insane to me how anyone can trust LLMs when their information is incorrect 90% of the time.

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

deleted by creator

tyler@programming.dev · 4 months ago

That’s not what the study showed though. The LLMs were right over 98% of the time…when given the full situation by a “doctor”. It was normal people who didn’t know what was important that were trying to self diagnose that were the problem.

Hence why studies are incredibly important. Even with the text of the study right in front of you, you assumed something that the study did not come to the same conclusion of.

Elting@piefed.social · edit-2 4 months ago

So in order to get decent medical advice from an LLM you just need to be a doctor and tell it whats wrong with you.

tyler@programming.dev · 4 months ago

Yes, that was the conclusion.

softwarist@programming.dev · 4 months ago

As neither a chatbot nor a doctor, I have to assume that subarachnoid hemorrhage has something to do with bleeding a lot of spiders.

dandelion@lemmy.blahaj.zone · 4 months ago

https://en.wikipedia.org/wiki/Subarachnoid_hemorrhage

https://en.wikipedia.org/wiki/Arachnoid_mater

it is one of the protective membranes around the brain and spinal cord, and it is named after its resemblance to spider webs, so - close enough

End-Stage-Ligma@lemmy.world · 4 months ago

can confirm, this is where spiders live inside your body

also pee is stored in the balls

cub Gucci@lemmy.today · 4 months ago

I’m going to open it wide open to kill every spider in my body

theunknownmuncher@lemmy.world · 4 months ago

A statistical model of language isn’t the same as medical training???

scarabic@lemmy.world · edit-2 4 months ago

It’s actually interesting. They found the LLMs gave the correct diagnosis high-90-something percent of the time if they had access to the notes doctors wrote about their symptoms. But when thrust into the room, cold, with patients, the LLMs couldn’t gather that symptom info themselves.

Hacksaw@lemmy.ca · 4 months ago

LLM gives correct answer when doctor writes it down first… Wowoweewow very nice!

scarabic@lemmy.world · 4 months ago

If you think there’s no work between symptoms and diagnosis, you’re dumber than you think LLMs are.

tyler@programming.dev · 4 months ago

You have misunderstood what they said.

Hacksaw@lemmy.ca · 4 months ago

If you seriously think the doctor’s notes about the patient’s symptoms don’t include the doctor’s diagnostic instincts then I can’t help you.

The symptom questions ARE the diagnostic work. Your doctor doesn’t ask you every possible question. You show up and you say “my stomach hurts”. The Doctor asks questions to rule things out until there is only one likely diagnosis then they stop and prescribe you a solution if available. They don’t just ask a random set of questions. If you give the AI the notes JUST BEFORE the diagnosis and treatment it’s completely trivial to diagnose because the diagnostic work is already complete.

God you AI people literally don’t even understand what skill, craft, trade, and art are and you think you can emulate them with a text predictor.

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

deleted by creator

tyler@programming.dev · 4 months ago

Dude, I hate AI. I’m not an AI person. Don’t fucking classify me as that. You’re the one not reading the article and subsequently the study. It didn’t say it included the doctor’s diagnostic work. The study wasn’t about whether LLMs are accurate for doctors, that’s already been studied. The study this article talks about literally says that. Apparently LLMs are passing medical licensing exams almost 100% of the time, so it definitely has nothing to do with diagnostic notes. This study was about using LLMs to diagnose yourself. That’s it. That’s the study. Don’t spread bullshit. It’s tiring debunking stuff that is literally two sentences in.

https://www.nature.com/articles/s41591-025-04074-y

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

deleted by creator

MrKoyun@lemmy.world · 4 months ago

Water is wet

gen/Eric Computers@lemmy.zip · 4 months ago

Um actually, water itself isn’t wet. What water touches is wet.

alzjim@lemmy.world · 4 months ago

Calling chatbots “terrible doctors” misses what actually makes a good GP — accessibility, consistency, pattern recognition, and prevention — not just physical exams. AI shines here — it’s available 24/7 🕒, never rushed or dismissive, asks structured follow-up questions, and reliably applies up-to-date guidelines without fatigue. It’s excellent at triage — spotting red flags early 🚩, monitoring symptoms over time, and knowing when to escalate to a human clinician — which is exactly where many real-world failures happen. AI shouldn’t replace hands-on care — and no serious advocate claims it should — but as a first-line GP focused on education, reassurance, and early detection, it can already reduce errors, widen access, and ease overloaded systems — which is a win for patients 💙 and doctors alike.

/s

XLE@piefed.social · 4 months ago

FelixCress@lemmy.world · 4 months ago

… You don’t say.

cub Gucci@lemmy.today · 4 months ago

“but have they tried Opus 4.6/ChatGPT 5.3? No? Then disregard the research, we’re on the exponential curve, nothing is relevant”

Sorry, I’ve opened reddit this week