I think the problem is that it portrays them as weird exceptions, possibly even echoes from some kind of ghost in the machine. Instead of being a statistical inevitability when you’re asking for the next predicted token instead of meaningfully examining a model of reality.
“Hallucination” applies only to the times when the output is obviously bad, and hides the fact that it’s doing exactly the same thing when it incidentally produces a true statement.
I get the gist, but also it’s kinda hard to come up with a better alternative. A simple “being wrong” doesn’t exactly communicate it either. I don’t think “hallucination” is a perfect word for the phenomenon of “a statistically probable sequence of language tokens forming a factually incorrect claim” by any means, but in terms of the available options I find it pretty good.
I don’t think the issue here is the word, it’s just that a lot of people think the machines are smart when they’re not. Not anthropomorphizing the machines is a battle that was lost no later than the time computer data representation devices were named “memory”, so I don’t think that’s really the issue here either.
As a side note, I’ve seen cases of people (admittedly, mostly critics of AI in the first place) call anything produced by an LLM a hallucination regardless of truthfulness.