Ars OpenForum

There's a very important part of software tools that you're missing with this point: Software can and will be treated as an unerring authority if it takes more than a second to realize it produced an incorrect answer. People do not treat software tools like Doctor Jim who sometimes has a long day, who sometimes misunderstands what you meant by "minor bleeding."

People treat software tools like calculators, and calculators are only allowed to be wrong if one of three conditions are met:
  • The user made a mistake on their end
  • The software has identified that it has failed or will fail to produce a correct answer
  • The software accurately calculates the odds of its answer being incorrect due to random variations inherent to the problem being solved.
Hallucinations are not caused by user error, are not identified by the operational software, and the odds of the operation producing a hallucination are not accurately calculated because they do not result from uncertainties inherent to the user's query, but from undocumented gaps and conflicts in the LLM's training data.
OpenAI's SimpleQA Hallucination Rate benchmark cannot tell you the chances of a model hallucinating for a given prompt, it's the percentage of queries in a sample pool that resulted in hallucinations, using those benchmarks as a 'chance of incorrect information' is a statistical overgeneralization.

No hallucination warning label is going to get someone to double check a model's answer before they put it into practice. If a user was willing to do the work to double check an LLM before using the LLM's answer, they wouldn't be using an LLM in the first place.
Disagree. I use an LLM bot daily for coding. It usually works as intended, and I move on, with no need to check. Or it doesn't work as intended, in which case I either enhance my prompt, or double check with another source. The latter scenario is maybe 10% of my prompts, if that. In no circumstance would I not be using a LLM in the first place just because 10% of the time I need to do a little extra leg work, because the alternative is to do that legwork 100% of the time.
 
Upvote
-22 (17 / -39)