ChatGPT generates cancer treatment plans that are full of errors — Study finds that ChatGPT provided false information when asked to design cancer treatment plans::Researchers at Brigham and Women’s Hospital found that cancer treatment plans generated by OpenAI’s revolutionary chatbot were full of errors.
I’m still confused that people don’t realize this. It’s not an oracle. It’s a program that generates sentences word by word based on statistical analysis, with no concept of fact checking. It’s even worse that someone actually did a study instead of simply acknowledging or realizing that ChatGPT is happy to just make stuff up.
while I agree it has become more of a common knowledge that they’re unreliable, this can add on to the myriad of examples for corporations, big organizations and government to abstain from using them, or at least be informed about these various cases with their nuances to know how to integrate them.
Why? I think partly because many of these organizations are racing to adopt them, for cost-cutting purposes, to chase the hype, or too slow to regulate them, … and there are/could still be very good uses that justify it in the first place.
I don’t think it’s good enough to have a blanket conception to not trust them completely. I think we need multiple examples of the good, the bad and the questionable in different domains to inform the people in charge, the people using them, and the people who might be affected by their use.
Kinda like the recent event at DefCon trying to exploit LLMs, it’s not enough we have some intuition about their harms, the people at the event aim to demonstrate the extremes of such harms AFAIK. These efforts can help inform developers/researchers to mitigate them, as well as showing concretely to anyone trying to adopt them how harmful they could be.
Regulators also need these examples in specific domains so they may be informed on how to create policies on them, sometimes building or modifying already existing policies of such domains.
This is true and well-stated. Mainly what I wish people would understand is there are current appropriate uses, like ‘rewrite my marketing email’, but generating information that could result in great harm if inaccurate is an inappropriate use. It’s all about the specific model, though - if you had a ChatGPT system trained extensively on medical information, it would result in greater accuracy, but still the information would need expert human review before any decision were made. Mainly I wish the media had been more responsible and accurate in portraying these systems to the public.
I don’t think it’s good enough to have a blanket conception to not trust them completely.
On the other hand, I actually think we should, as a rule, not trust the output of an LLM.
They’re great for generative purposes, but I don’t think there’s a single valid case where the accuracy of their response should be outright trusted. Any information you get from an AI model should be validated outright.
There are many cases where a simple once-over from a human is good enough, but any time it tells you something you didn’t already know you should not trust it and, if you want to rely on that information, you should validate that it’s accurate.
I know university professors struggling with this concept. They are so convinced using an LLM is plagiarism.
It can lead to plagiarism if you use it poorly, which is why you control the information you feed it. Then proofread and edit.
Another related confusion in academia recently is the ‘AI detector’. It could easily be defeated with minor rewrites, if they were even accurate in the first place. My favorite misconception is there was a story of a professor who told students “I asked ChatGPT if it wrote this, and it said yes” which is just really not how it works.
I can understand the plagiarism argument, though you have to extend the definition of it. If I am expected to write an essay, but I use ChatGPT instead, then I am fraudulently presenting the work as my own. Plagiarism might not be the right word, or maybe it’s a case where language is going to evolve so that plagiarism includes passing off AI generated work as your own. Either way it’s cheating unless I was specifically allowed to use AI.
If the argument and the sources are incongruous, that isn’t the fault of the LLM/AI. That’s the authors fault for not proofreading and editing.
You assume an inherent morality of LLMs but they are amoral constructs. They are tools, and you limit yourself by not learning them.
It’s even worse that someone actually did a study instead of simply acknowledging or realizing that ChatGPT is happy to just make stuff up.
Sure, the world should just trust preconceptions instead of doing science to check our beliefs. That worked great for tens of thousands of years of prehistory.
It’s not merely a preconception. It’s a rather obvious and well-known limitation of these systems. What I am decrying is that some people, from apparent ignorance, think things like “ChatGPT can give a reliable cancer treatment plan!” or “here, I’ll have it write a legal brief and not even check it for accuracy”. But sure, I agree with you, minus the needless sarcasm. It’s useful to prove or disprove even absurd hypotheses. And clearly people need to be definitely told that ChatGPT is not always factual, so hopefully this helps.
I’d say that a measurement always trumps arguments. At least you know how accurate they are, this statement cannot follow from reason:
The JAMA study found that 12.5% of ChatGPT’s responses were “hallucinated,” and that the chatbot was most likely to present incorrect information when asked about localized treatment for advanced diseases or immunotherapy.
“After an extensive three-year study, I have discovered that touching a hot element with one’s bare hand does, in fact, hurt.”
“That seems like it was unnecessary…”
“Do U even science bro?!”
Not everything automatically deserves a study. Were there any non-rando people out there claiming that ChatGPT could totally generate legit cancer treatment plans that people could then follow?
It’s not even a preconception, it’s willful ignorance, the website itself tells you multiple times that it is not accurate.
The bottom of every chat has this text: “Free Research Preview. ChatGPT may produce inaccurate information about people, places, or facts. ChatGPT August 3 Version”
And when you first use it, a modal pops up explaining the same thing.
ChatGPT isn’t some newly discovered sentient species.
It’s a machine designed and built by human engineers.
This is like suggesting that we study fortune cookies to see if they can accurately forecast the future. The manufacturer can simply tell you the limitation of their product… Being that they can not divine the future.
This is why without some hitherto unknown or so far undeveloped capability of these sorts of LLM models, they’ll never actually be useful for performing any kind of mission critical work. The catch-22 is this: You can’t trust the AI to produce correct work without some kind of potentially dangerous, showstopping, or embarassing error. This isn’t a problem if you’re just, say, having it paint pictures. Or maybe even helping you twiddle the CSS on your web site. If there is a failure here, no one dies.
But what if your application is critical to life or safety? Like prescribing medical care, or designing a building that won’t fall down, or deciding which building the drone should bomb. Well, you have to get a trained or accredited professional in whatever field we’re talking about to check all of its work. And how much effort does that entail? As it turns out, pretty much exactly as much as having said trained or accredited professional do the work in the first place.
No duh - why would it have any ability to do that sort of task?
Part of the reason for studies like this is to debunk peoples’ expectations of AI’s capabilities. A lot of people are under the impression that cgatGPT can do ANYTHING and can think and reason when in reality it is a bullshitter that does nothing more than mimic what it thinks a suitable answer looks like. Just like a parrot.
It doesn’t check the stuff it generates other than on grammatical and orthographical errors. It’s not intelligent or has knowledge outside of how to create text. The text looks useful, but it doesn’t know what it contains in a way something intelligent would.
Recent papers have shown that LLMs build internal world models but about a topic as niche and complicated as cancer treatment, a chatbot based on GPT-3.5 be woefully ill-equipped to do any kind of proper reasoning.
If you want an AI that can create cancer treatment, you need to train it on creating cancer treatment, and not just use one that is trained on general knowledge. Even if you train it on science publications, all it can now reliably do is mimic a science journal since it has not been trained on how to parse the knowledge in the journal itself.
Which is exactly the problem people think has been solved but isn’t anywhere near being solved. It cannot comprehend semantics, the meaning of things is completely beyond it and all other AIs.
Unfortunately saying I made a thing that creates vaguely human looking speech with little content isn’t astonishing to most people hence they are looking for something useful this breakthrough machine must be able to do and then they don’t find anything leading to these articles.
“Hey, program that is basically just regurgitating information, how do we do this incredibly complex things that even we don’t understand yet?”
“Here ya go.”
“Wow, this is wrong.”
“No shit.”
Why the fuck would anybody think a chat bot could create a cancer treatment plan?
Because it’s been hyped. They had announced it could pass the medical licensing exam with good scores. The belief that it can replace a doctor has already been put forward
On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Charles Babbage
Better tech, same stupid end users lmao
Scientist: Askes question to magic conch about cancer.
Conch: “Trying shoving bees up your ass.”
Scientists: 😡