AI search engines give incorrect answers at an alarming 60% rate, study says

graylshaped · Mar 13, 2025

ninjonxb said:
And AGI is right around the corner and these people love to claim that hallucination isn’t as big of a problem anymore >.>

Waiting for this article to be on hacker news for everyone to come out and defend and downplay this.

There is a huge amount of money and advertising effort in convincing the average user that these tools are reliable.

On that particular point:

Grok 3 demonstrated the highest error rate, at 94 percent.

It is beyond dispute to assert that Grok, doesn't. It follows the Muskian convention of explicitly NOT doing what his products' names describe, with perhaps the exception of Boring, though in that case it is more of an accurate adjective of its results than a verb of action and accomplishment.

rbryanh · Mar 13, 2025

faffod said:
AI companies want to bypass copyright laws , but aren't willing to wait for laws to be updated... I am shocked.

Of course in this context, "updated" translates as "Congressional revision or repeal of existing law or defunding of its enforcement in response to perfectly legal corporate bribery."

rbryanh · Mar 13, 2025

Speaking of artificial stupidity, never has it been clearer that every human endeavor is ultimately a metaphor for humanity itself. We can create nothing except in our own image.

Publishers, editors, peer-review, academic competition, and professional criticism were once a significant part of humanity's collective intelligence. The internet having eliminated all these and more, it's now anyone's guess which will kill us first: willful destruction of our own habitats or digitally enforced idiocy.

Megahedron · Mar 13, 2025

EBone said:
Is there any, any chance that this whole A.I. balloon is going to pop, the way dot com bubble popped in 2001? It really starting to seem like it’s never going to reliably perform at even 50% of the projections we’ve seen over the past three years.

It's still really, really uncertain, but if you're really into reading the tea leaves there are some possible signs:

1) Microsoft starting to pull back on data center build-out, suggesting that it thinks either A) its current compute capacity is sufficient for current and future demand, or that it's overbuilt. Even in the best-case scenario of "Deepseek made MS realize they can run LLMs much more efficiently," this still means they're jettisoning OpenAI, who has shown zero interest in adopting Deepseek's approach. This is more likely considering that...
2) OpenAI has turned to Softbank for continuing investment, and Softbank is borrowing like a gambling addict trying to claw back from the penny slots to the high-roller table in order to fund its investment in OpenAI (borrowing to invest in OpenAI itself, and borrowing to invest in data centers for OpenAI to use).
3) Oh, and this is ignoring nVidia, who made a fortune selling shovels for the gold rush and may soon find itself getting rugpulled.
4) There's also a company called CoreWeave whose entire purpose is renting out GPU data centers for AI, and whose biggest customer is Microsoft. They were the second biggest owners of data centers, and were already unprofitable, and then Microsoft canceled a bunch of contacts (see 1).

Granted, it could all go the other way and shake out in the end, who knows? It's also unlikely to "pop" in the sense of Google, Microsoft, nVidia, Meta, etc. going out of business, or LLMs going away entirely, since they do have some use cases. But I simply don't see OpenAI, Anthropic, Perplexity, or other companies whose entire business is AI surviving long-term.

The Sheep Look Up · Mar 13, 2025

Megahedron said:
OpenAI has turned to Softbank for continuing investment, and Softbank is borrowing like a gambling addict trying to claw back from the penny slots to the high-roller table in order to fund its investment in OpenAI

And let's not forget that Masayoshi Son has backed some of the stupidest sucker bets to come down the VC pipe. Something something Wirecard...

alienluvchild · Mar 14, 2025

As a former developer and high-level manager at a development firm I’m so sick and tired and utterly disgusted by the attitudes of these companies. I fought like hell to keep people (most especially the CEO) from going down the path that these morally corrupt shitstains continue to follow.

It’s depressing that so many people are lazy and completely lacking common sense and critical thinking skills to not see the totality of the sham that all of these companies and their boosters are foisting on the public.

Auie · Mar 14, 2025

At least it lies less than Trump...

sjl · Mar 14, 2025

Auie said:
At least it lies less than Trump...

Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).

Auie · Mar 14, 2025

sjl said:
Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).

That is the joke.

Rosyna · Mar 14, 2025

vvax56nM said:
Is it just me or is the Google Search AI summary much worse than asking an AI chatbot directly?

Significantly. Google’s AI search can’t even give you a list of winged dinosaurs or tell you which reptiles are/were warm-blooded. Even worse. It’s confident in its wrongness.

halfelven · Mar 14, 2025

Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried.

I may be having a Friday morning moment, but do the graphs not show the opposite?

Edit: I'm having a Friday morning moment. I read that as "correct information" rather that "incorrect information".

Nemexis · Mar 14, 2025

DaveSimmons said:
If I was evil scum I'd totally be marketing MAGAI® Search to the deplorables. Guaranteed to return doubleplusgood truthiness.

Don't give Mr. Trumpet ideas man...

Bigdoinks · Mar 14, 2025

sarusa said:
Only 60% wrong? That's actually better than the results I've seen when I accidentally get Bing or Google on some other work PC (I use Kagi). I would have put it closer to 75% wrong, but fully admit there may be some expectation bias on my part. Or maybe it's just because my use cases are very different - I'm asking for factual specifics, not just asking it to cite where this random thing came from.

I think you might have "whoooshed" yourself mate. What is your use case where search is 75% wrong? Are you searching for a wendigo or bigfoot on google?

SpaceHamster · Mar 14, 2025

If these models would just have the ability to decline to answer or come with a confidence score instead of confabulating, they would be a lot less useless. Do the underlying technology even allow that as a possibility?

orangedan · Mar 14, 2025

Fake citations is still the easiest way to catch and prove students are writing research papers in my class with AI. Like most AI crap, it passes a quick skim inspection, but is still way easier to prove to student disciplinary committees than other "maybe" AI generated text. That said, I worry that these "look there are citations" AI output will fool more people into believing fake science info because very few people go to check original citations.

Haravikk · Mar 14, 2025

So Grok is almost as dishonest as the man pushing it?

gnasher729 · Mar 14, 2025

DrewW said:
I think of dumb ai like malicious compliance from dumb people. The cheese will stick to a pizza if you add a 1/8 cup of glue; the request was fulfilled and the solution will work. I expect the same uselessness from a chatbot or from a stoned undergrad.

This is excellent and useful information if you are a food photographer. That 1/8th cup of glue makes your pizza photo look so much better.

gnasher729 · Mar 14, 2025

fkaOld_one said:
It’s understandable that no one here appears to be very surprised, but it needs to be stressed that this is a horrific result. Wrong ninety percent of the time is shocking and unacceptable under any circumstances. There’s a huge problem right in front of us.

When you consider that a pure guess should be wrong only 50% of the time.

One off · Mar 14, 2025

WXW said:
Yeah, shocking... The other day I asked o3-mini a question, it had no idea, but still answered some bullshit again and again. I saved the "reasoning" text when I called its bullshit out, because it surprised me how revealing it was:

Not used any of the reasoning models. I'm curious if your experience matches my assumption that the reasoning outputs are written to satisfy the request to show workings rather than reflecting any actual process followed.

One off · Mar 14, 2025

nmack said:
This doesn’t seem remotely informative? Traditional search is much better than GenAI for finding the origin of an exact piece of text. This seems like a study designed to find what it wants, that’s not even close to a real world use case.

Often on LLM articles, I see people saying what amounts to "I can't see an immediate use for me, therefore they are useless to everyone". Your post seems to be more "I don't use LLMs this way so no one else does." When LLMs are being promoted as research tools, being able to identify sources and provide links is important.

TVPaulD · Mar 14, 2025

However, Howard also did some user shaming, suggesting it's the user's fault if they aren't skeptical of free AI tools’ accuracy: "If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."

I think this is unfair. I am plenty sceptical of the results and I've found that pretty much 100% of the time when I have deduced that the response to a prompt is incorrect, even when I did not already know the answer going in, I have been right about that. And even on occasions where I couldn't immediately tell the answer was wrong, as I always follow the "citations" to verify the information, a good chunk of the time the LLM still got it wrong at least in some way (such as, for example, giving battery replacement instructions for a different model than the one requested because the manufacturer provides both models' instructions on a single page and the chatbot has incorrectly parsed it).

But I don't really blame anyone else, especially less technical users, for trusting the tool they are provided. Should they? Of course not, but there's no earthly reason why everyone should be aware of the details of how these things work. Everyone has different priorities and interests. I do my best to impress upon those around me who are less informed and engaged with these things what the limitations and risks are, but I do not shame them for not already knowing. Looking at it from their point of view, they have been presented with a service that purports to provide information using a natural language interface.

And it is, let's be honest, superficially very impressive to observe what they do. People generally are not used to computers being able to do things like this in this specific way. It's easy to see how that would engender trust and appreciation from users, especially those less technically inclined who do not have the frame of reference to understand what's going on under the hood. And, again, there is no reason why they should either - it's fine and even good to be interested in how tech works, but it should not be obligatory in order to use it for its most basic purposes. We don't expect every driver to be able to to explain in detail how an internal combustion engine works or be capable of constructing precise suspension geometry.

The shame should be directed instead at the purveyors of this technology who are overpromising and underdelivering, setting their users up to fail by misleading them into using and trusting their services for financial gain. Howard is blaming some of the victims. The blame, and the shame, rightly belongs to OpenAI, Perplexity, Anthropic, Microsoft, Google et al, along with their sycophants and boosters - particularly in the media.

One off · Mar 14, 2025

DoktorYes said:
Huh, so they ARE actually acting more like humans.

lol. There is an interesting point though. First, training data. Much of online commentary is authoritative sounding bullshit, good faith people who don't know tend not to try and answer questions. Plus, no one publishes text books filled with "I don't know" as the answer to problems or publishes articles without a view point or narrative. Then there are the commercial drivers, you don't want your chatbot to appear less capable than the competition by treating 'IDK' as a good output. Finally the tech itself seems to be exploiting the relationships between words and phrases, not the things or concepts (such as correct and incorrect answers) those words and phrases evoke in a human mind.

I'm sure there is room for LLMs to improve outputs, but any reporting on realistic avenues for experimentation seems to be drowned out by marketing guff. "It's a magic mind in a box, just make the box bigger!".

One off · Mar 14, 2025

mattCCC said:
The sad thing is that the people most likely to believe everything the AI said and not fact-check are also the people most likely to never read this article.

I'm cynical enough not to think this is the tech's problem. I've met too many people who just don't care if information is true or not. Is it a simple answer? Does it keep teacher / boss off my back? Does it confirm my biases? Does it give me a feeling of security? Is it something I can leverage for advantage? etc.

disquiet · Mar 14, 2025

The search engine the modern web deserves, really. We thought we hit enshittification bedrock but the hole can still get deeper.

One off · Mar 14, 2025

Megahedron said:
It's still really, really uncertain, but if you're really into reading the tea leaves there are some possible signs:

1) Microsoft starting to pull back on data center build-out, suggesting that it thinks either A) its current compute capacity is sufficient for current and future demand, or that it's overbuilt. Even in the best-case scenario of "Deepseek made MS realize they can run LLMs much more efficiently," this still means they're jettisoning OpenAI, who has shown zero interest in adopting Deepseek's approach. This is more likely considering that...
2) OpenAI has turned to Softbank for continuing investment, and Softbank is borrowing like a gambling addict trying to claw back from the penny slots to the high-roller table in order to fund its investment in OpenAI (borrowing to invest in OpenAI itself, and borrowing to invest in data centers for OpenAI to use).
3) Oh, and this is ignoring nVidia, who made a fortune selling shovels for the gold rush and may soon find itself getting rugpulled.
4) There's also a company called CoreWeave whose entire purpose is renting out GPU data centers for AI, and whose biggest customer is Microsoft. They were the second biggest owners of data centers, and were already unprofitable, and then Microsoft canceled a bunch of contacts (see 1).

Granted, it could all go the other way and shake out in the end, who knows? It's also unlikely to "pop" in the sense of Google, Microsoft, nVidia, Meta, etc. going out of business, or LLMs going away entirely, since they do have some use cases. But I simply don't see OpenAI, Anthropic, Perplexity, or other companies whose entire business is AI surviving long-term.

In an article partly about absent links to sources, I feel compelled to append Ed Zitron's article to your post.

PeterRNYC · Mar 14, 2025

I know I'm setting myself up for some flaming here... but I think this headline is a bit misleading.

Yes the study highlights a critical flaw in AI search engines' ability to provide accurate source attribution. This has implications for news integrity and publisher control.

But I think that, for most people, the headline suggests that AI search engines site incorrect or fabricated sources. When it's more talking about that, given a text, they can't identify the correct source.

In terms of the research itself, it's a bit disappointing that they didn't also test the underlying search engines. This would have been an interesting comparison. My guess is that the first source of listed when you copy a piece of article text into a search engine is probably not the correct one given how may times things are copied. Most likely the AI layer on top of those searches has not been set up to specifically try to find the ultimate source. But that's a less sexy headline.

As with most scenarios it's probably good to consider an AI to be a well read, articulate temp worker that will sometimes make things up because they must write at a fixed word per minute rate.

One off · Mar 14, 2025

orangedan said:
Fake citations is still the easiest way to catch and prove students are writing research papers in my class with AI. Like most AI crap, it passes a quick skim inspection, but is still way easier to prove to student disciplinary committees than other "maybe" AI generated text. That said, I worry that these "look there are citations" AI output will fool more people into believing fake science info because very few people go to check original citations.

The spreaders of fake science have been around since long before LLMs, in my experience their links will lead somewhere but it will be to a very low quality source or they will misrepresent the content of a trustworthy source. It's the ease of astroturfing enabled by LLMs that worries me. Drown the truth in shit.

One off · Mar 14, 2025

PeterRNYC said:
In terms of the research itself, it's a bit disappointing that they didn't also test the underlying search engines. This would have been an interesting comparison.

I sort of agree with you, in that the first paragraph of this article is a bit misleading, 'queries about news sources' could have been better worded, but the headline I'm seeing puts it better, 'AI search engines cite incorrect sources at an alarming 60% rate, study says'. I don't see the inaccuracy there that you do. In regard to your point above, they did. They only used extracts that returned the original source in the top three results of traditional search.

Uncivil Servant · Mar 14, 2025

EBone said:
Is there any, any chance that this whole A.I. balloon is going to pop, the way dot com bubble popped in 2001? It really starting to seem like it’s never going to reliably perform at even 50% of the projections we’ve seen over the past three years.

If feels a bit like the mid-late 90s again, but this time SNL is funny, I'm no longer a teenage virgin, and a presidential blowjob seems almost quaint.

BemusedPenguin · Mar 14, 2025

markgo said:
That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.

And even if it were true… then why release something that performs so horribly?

I know it's a fictionalized version of the events, but more tech bros should consider that memorable exchange in Blackberry (2023):

Mike Lazaridis: I will build a prototype, but I'll do it perfectly or I don't do it.
Jim Balsillie: Mike, are you familiar with the saying "Perfect is the enemy of good"?
Mike Lazaridis: Well, "good enough" is the enemy of humanity.

dooferorg · Mar 14, 2025

Totally surprising that the most lyingAI model is promoted by one of the biggest liars

mmiller7 · Mar 14, 2025

Does this mean I have to stop my 400 amp home electrical upgrade with the 22AWG hobby wire?

AI said 22AWG is fine for 500-some amps, it'll be so much cheaper than the BS huge wires the electricians said they wanted to use!

TinCoyote · Mar 14, 2025

markgo said:
That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.

On top of that, it doesn't take into account enshittification. Clearly, products can peak and get worse. Another good example - U.S. made cars in the late 1970s and early 1980s. Badge engineered shitboxes.

hambone · Mar 14, 2025

The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.

Even when these AI search tools cited sources, they often directed users to syndicated versions of content on platforms like Yahoo News rather than original publisher sites. This occurred even in cases where publishers had formal licensing agreements with AI companies.

"bUt tRuSt uS wE CaN sELf ReGuLAtE!!!!!" - AI companies

KingAZAZ · Mar 14, 2025

"If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."

Then don't label it as a search specific tool as though the accuracy problem were anywhere near solved. You're not putting this out there only for savvy users, but for the general public so take a modicum of responsibility.

Anyway, what's the difference between my asking for some clear facts about something that's easily discernable through traditional search and asking for any other accurate information? Apparently if it can handle the task it does so a low percentage of the time, and when it can't do it it spits out confabulatory garbage indiscernible from an accurate response unless I do the legwork myself. So why in the world would I use such a tool in the context of search? These so-called "search" flavors of models should simply spit out, oh I dunno, a list of 10 links for me to peruse on my own I guess.

multimediavt · Mar 14, 2025

markgo said:
Did I say product?

Name a TECHNOLOGY that went downhill.

Should I start at the Stone Age or the Iron Age with the list? Clearly, you're not that in touch with how technology has evolved over the millenia.

David651 · Mar 14, 2025

Can we go back to just the links please? With AI at the helm of Internet searches and able to infer context with our inquires and knowing AI can be biased by its maintainers, this quote from Noam Chomsky comes to mind:

The smart way to keep people passive and obedient is to strictly limit the spectrum of acceptable opinion, but allow very lively debate within that spectrum - even encourage the more critical and dissident views. That gives people the sense that there's free thinking going on, while all the time the presuppositions of the system are being reinforced by the limits put on the range of the debate.

Lazymonk · Mar 14, 2025

R-V said:
If AI companies are allowed to bypass copyright laws, so should I.

Can you afford 10,000,000$ a year for a battalion of layers?

marsilies · Mar 14, 2025

sjl said:
Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).

The thing is, LLM's don't know what they "know" and don't know, in terms of knowledge, because they're not designed that way, and that's not how they were trained. The only thing they "know" are statistical relationships between tokens, where the tokens are words or phrases, and the output is just usually picking the next statistically likely token, giving the input and/or previous output, with a bit of randomness thrown in to occasionally pick a less likely token, so the output isn't fully deterministic. The only reason it provides anything factual at all is that it was trained on a bunch of text that was largely factual, so the relationships between tokens that describe a fact are particularly strong. But lies can have strong statistical relationships between tokens, and since it's always going to pick something, even some weak statistical relationships could get picked.

The best it could do is provide the statistical probabilities of the next token it picked, but that's no inherently an indication of factualness or "knowledge".

Deny_Deflect_Disavow · Mar 14, 2025

gnasher729 said:
When you consider that a pure guess should be wrong only 50% of the time.

Worse than that. Artificial General Intelligence systems gobble up Internet content irrespective of copyright.

Considering the rampant amount of disinformation prevalent on the Internet, I frankly wouldn’t trust anything AI responded with. In fact, the big tech companies in not policing disinformation, have sown the seeds of uselessness in this potential tool.

Garbage in, garbage out.

AI search engines give incorrect answers at an alarming 60% rate, study says

Ars Legatus Legionis

Ars Tribunus Militum

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Centurion

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Seniorius Lurkius

Ars Praefectus

Ars Praefectus

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Seniorius Lurkius

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Centurion

Ars Praefectus

Ars Centurion

Ars Scholae Palatinae

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Legatus Legionis

Wise, Aged Ars Veteran

nproxy.org