AI search engines give incorrect answers at an alarming 60% rate, study says

graylshaped

Ars Legatus Legionis
61,378
Subscriptor++
And AGI is right around the corner and these people love to claim that hallucination isn’t as big of a problem anymore >.>

Waiting for this article to be on hacker news for everyone to come out and defend and downplay this.

There is a huge amount of money and advertising effort in convincing the average user that these tools are reliable.
On that particular point:
Grok 3 demonstrated the highest error rate, at 94 percent.
It is beyond dispute to assert that Grok, doesn't. It follows the Muskian convention of explicitly NOT doing what his products' names describe, with perhaps the exception of Boring, though in that case it is more of an accurate adjective of its results than a verb of action and accomplishment.
 
Upvote
8 (8 / 0)

rbryanh

Ars Tribunus Militum
1,781
AI companies want to bypass copyright laws , but aren't willing to wait for laws to be updated... I am shocked.
Of course in this context, "updated" translates as "Congressional revision or repeal of existing law or defunding of its enforcement in response to perfectly legal corporate bribery."
 
Upvote
3 (3 / 0)

rbryanh

Ars Tribunus Militum
1,781
Speaking of artificial stupidity, never has it been clearer that every human endeavor is ultimately a metaphor for humanity itself. We can create nothing except in our own image.

Publishers, editors, peer-review, academic competition, and professional criticism were once a significant part of humanity's collective intelligence. The internet having eliminated all these and more, it's now anyone's guess which will kill us first: willful destruction of our own habitats or digitally enforced idiocy.
 
Upvote
6 (6 / 0)

Megahedron

Smack-Fu Master, in training
63
Is there any, any chance that this whole A.I. balloon is going to pop, the way dot com bubble popped in 2001? It really starting to seem like it’s never going to reliably perform at even 50% of the projections we’ve seen over the past three years.
It's still really, really uncertain, but if you're really into reading the tea leaves there are some possible signs:

1) Microsoft starting to pull back on data center build-out, suggesting that it thinks either A) its current compute capacity is sufficient for current and future demand, or that it's overbuilt. Even in the best-case scenario of "Deepseek made MS realize they can run LLMs much more efficiently," this still means they're jettisoning OpenAI, who has shown zero interest in adopting Deepseek's approach. This is more likely considering that...
2) OpenAI has turned to Softbank for continuing investment, and Softbank is borrowing like a gambling addict trying to claw back from the penny slots to the high-roller table in order to fund its investment in OpenAI (borrowing to invest in OpenAI itself, and borrowing to invest in data centers for OpenAI to use).
3) Oh, and this is ignoring nVidia, who made a fortune selling shovels for the gold rush and may soon find itself getting rugpulled.
4) There's also a company called CoreWeave whose entire purpose is renting out GPU data centers for AI, and whose biggest customer is Microsoft. They were the second biggest owners of data centers, and were already unprofitable, and then Microsoft canceled a bunch of contacts (see 1).

Granted, it could all go the other way and shake out in the end, who knows? It's also unlikely to "pop" in the sense of Google, Microsoft, nVidia, Meta, etc. going out of business, or LLMs going away entirely, since they do have some use cases. But I simply don't see OpenAI, Anthropic, Perplexity, or other companies whose entire business is AI surviving long-term.
 
Upvote
17 (17 / 0)
OpenAI has turned to Softbank for continuing investment, and Softbank is borrowing like a gambling addict trying to claw back from the penny slots to the high-roller table in order to fund its investment in OpenAI
And let's not forget that Masayoshi Son has backed some of the stupidest sucker bets to come down the VC pipe. Something something Wirecard...
 
Upvote
12 (12 / 0)

alienluvchild

Wise, Aged Ars Veteran
176
Subscriptor++
As a former developer and high-level manager at a development firm I’m so sick and tired and utterly disgusted by the attitudes of these companies. I fought like hell to keep people (most especially the CEO) from going down the path that these morally corrupt shitstains continue to follow.

It’s depressing that so many people are lazy and completely lacking common sense and critical thinking skills to not see the totality of the sham that all of these companies and their boosters are foisting on the public.
 
Upvote
13 (13 / 0)

sjl

Ars Tribunus Militum
2,715
At least it lies less than Trump...
Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).
 
Upvote
6 (6 / 0)

Auie

Ars Scholae Palatinae
1,878
Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).
That is the joke.
 
Last edited:
Upvote
0 (1 / -1)

Rosyna

Ars Tribunus Angusticlavius
6,879
Is it just me or is the Google Search AI summary much worse than asking an AI chatbot directly?
Significantly. Google’s AI search can’t even give you a list of winged dinosaurs or tell you which reptiles are/were warm-blooded. Even worse. It’s confident in its wrongness.
 
Upvote
5 (5 / 0)

halfelven

Smack-Fu Master, in training
2
Subscriptor
Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried.

I may be having a Friday morning moment, but do the graphs not show the opposite?

Edit: I'm having a Friday morning moment. I read that as "correct information" rather that "incorrect information".
 
Upvote
1 (1 / 0)

Bigdoinks

Ars Scholae Palatinae
868
Only 60% wrong? That's actually better than the results I've seen when I accidentally get Bing or Google on some other work PC (I use Kagi). I would have put it closer to 75% wrong, but fully admit there may be some expectation bias on my part. Or maybe it's just because my use cases are very different - I'm asking for factual specifics, not just asking it to cite where this random thing came from.
I think you might have "whoooshed" yourself mate. What is your use case where search is 75% wrong? Are you searching for a wendigo or bigfoot on google?
 
Upvote
8 (9 / -1)

orangedan

Seniorius Lurkius
3
Subscriptor
Fake citations is still the easiest way to catch and prove students are writing research papers in my class with AI. Like most AI crap, it passes a quick skim inspection, but is still way easier to prove to student disciplinary committees than other "maybe" AI generated text. That said, I worry that these "look there are citations" AI output will fool more people into believing fake science info because very few people go to check original citations.
 
Upvote
10 (10 / 0)
I think of dumb ai like malicious compliance from dumb people. The cheese will stick to a pizza if you add a 1/8 cup of glue; the request was fulfilled and the solution will work. I expect the same uselessness from a chatbot or from a stoned undergrad.
This is excellent and useful information if you are a food photographer. That 1/8th cup of glue makes your pizza photo look so much better.
 
Upvote
4 (4 / 0)
It’s understandable that no one here appears to be very surprised, but it needs to be stressed that this is a horrific result. Wrong ninety percent of the time is shocking and unacceptable under any circumstances. There’s a huge problem right in front of us.
When you consider that a pure guess should be wrong only 50% of the time.
 
Upvote
-1 (3 / -4)

One off

Ars Scholae Palatinae
1,235
Yeah, shocking... The other day I asked o3-mini a question, it had no idea, but still answered some bullshit again and again. I saved the "reasoning" text when I called its bullshit out, because it surprised me how revealing it was:
Not used any of the reasoning models. I'm curious if your experience matches my assumption that the reasoning outputs are written to satisfy the request to show workings rather than reflecting any actual process followed.
 
Last edited:
Upvote
2 (2 / 0)

One off

Ars Scholae Palatinae
1,235
This doesn’t seem remotely informative? Traditional search is much better than GenAI for finding the origin of an exact piece of text. This seems like a study designed to find what it wants, that’s not even close to a real world use case.
Often on LLM articles, I see people saying what amounts to "I can't see an immediate use for me, therefore they are useless to everyone". Your post seems to be more "I don't use LLMs this way so no one else does." When LLMs are being promoted as research tools, being able to identify sources and provide links is important.
 
Last edited:
Upvote
11 (11 / 0)

TVPaulD

Ars Tribunus Militum
1,697
However, Howard also did some user shaming, suggesting it's the user's fault if they aren't skeptical of free AI tools’ accuracy: "If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."
I think this is unfair. I am plenty sceptical of the results and I've found that pretty much 100% of the time when I have deduced that the response to a prompt is incorrect, even when I did not already know the answer going in, I have been right about that. And even on occasions where I couldn't immediately tell the answer was wrong, as I always follow the "citations" to verify the information, a good chunk of the time the LLM still got it wrong at least in some way (such as, for example, giving battery replacement instructions for a different model than the one requested because the manufacturer provides both models' instructions on a single page and the chatbot has incorrectly parsed it).

But I don't really blame anyone else, especially less technical users, for trusting the tool they are provided. Should they? Of course not, but there's no earthly reason why everyone should be aware of the details of how these things work. Everyone has different priorities and interests. I do my best to impress upon those around me who are less informed and engaged with these things what the limitations and risks are, but I do not shame them for not already knowing. Looking at it from their point of view, they have been presented with a service that purports to provide information using a natural language interface.

And it is, let's be honest, superficially very impressive to observe what they do. People generally are not used to computers being able to do things like this in this specific way. It's easy to see how that would engender trust and appreciation from users, especially those less technically inclined who do not have the frame of reference to understand what's going on under the hood. And, again, there is no reason why they should either - it's fine and even good to be interested in how tech works, but it should not be obligatory in order to use it for its most basic purposes. We don't expect every driver to be able to to explain in detail how an internal combustion engine works or be capable of constructing precise suspension geometry.

The shame should be directed instead at the purveyors of this technology who are overpromising and underdelivering, setting their users up to fail by misleading them into using and trusting their services for financial gain. Howard is blaming some of the victims. The blame, and the shame, rightly belongs to OpenAI, Perplexity, Anthropic, Microsoft, Google et al, along with their sycophants and boosters - particularly in the media.
 
Upvote
8 (8 / 0)

One off

Ars Scholae Palatinae
1,235
Huh, so they ARE actually acting more like humans.
lol. There is an interesting point though. First, training data. Much of online commentary is authoritative sounding bullshit, good faith people who don't know tend not to try and answer questions. Plus, no one publishes text books filled with "I don't know" as the answer to problems or publishes articles without a view point or narrative. Then there are the commercial drivers, you don't want your chatbot to appear less capable than the competition by treating 'IDK' as a good output. Finally the tech itself seems to be exploiting the relationships between words and phrases, not the things or concepts (such as correct and incorrect answers) those words and phrases evoke in a human mind.

I'm sure there is room for LLMs to improve outputs, but any reporting on realistic avenues for experimentation seems to be drowned out by marketing guff. "It's a magic mind in a box, just make the box bigger!".
 
Upvote
6 (6 / 0)

One off

Ars Scholae Palatinae
1,235
The sad thing is that the people most likely to believe everything the AI said and not fact-check are also the people most likely to never read this article.
I'm cynical enough not to think this is the tech's problem. I've met too many people who just don't care if information is true or not. Is it a simple answer? Does it keep teacher / boss off my back? Does it confirm my biases? Does it give me a feeling of security? Is it something I can leverage for advantage? etc.
 
Upvote
5 (5 / 0)

One off

Ars Scholae Palatinae
1,235
It's still really, really uncertain, but if you're really into reading the tea leaves there are some possible signs:

1) Microsoft starting to pull back on data center build-out, suggesting that it thinks either A) its current compute capacity is sufficient for current and future demand, or that it's overbuilt. Even in the best-case scenario of "Deepseek made MS realize they can run LLMs much more efficiently," this still means they're jettisoning OpenAI, who has shown zero interest in adopting Deepseek's approach. This is more likely considering that...
2) OpenAI has turned to Softbank for continuing investment, and Softbank is borrowing like a gambling addict trying to claw back from the penny slots to the high-roller table in order to fund its investment in OpenAI (borrowing to invest in OpenAI itself, and borrowing to invest in data centers for OpenAI to use).
3) Oh, and this is ignoring nVidia, who made a fortune selling shovels for the gold rush and may soon find itself getting rugpulled.
4) There's also a company called CoreWeave whose entire purpose is renting out GPU data centers for AI, and whose biggest customer is Microsoft. They were the second biggest owners of data centers, and were already unprofitable, and then Microsoft canceled a bunch of contacts (see 1).

Granted, it could all go the other way and shake out in the end, who knows? It's also unlikely to "pop" in the sense of Google, Microsoft, nVidia, Meta, etc. going out of business, or LLMs going away entirely, since they do have some use cases. But I simply don't see OpenAI, Anthropic, Perplexity, or other companies whose entire business is AI surviving long-term.
In an article partly about absent links to sources, I feel compelled to append Ed Zitron's article to your post.
 
Upvote
6 (6 / 0)

PeterRNYC

Smack-Fu Master, in training
1
I know I'm setting myself up for some flaming here... but I think this headline is a bit misleading.

Yes the study highlights a critical flaw in AI search engines' ability to provide accurate source attribution. This has implications for news integrity and publisher control.

But I think that, for most people, the headline suggests that AI search engines site incorrect or fabricated sources. When it's more talking about that, given a text, they can't identify the correct source.

In terms of the research itself, it's a bit disappointing that they didn't also test the underlying search engines. This would have been an interesting comparison. My guess is that the first source of listed when you copy a piece of article text into a search engine is probably not the correct one given how may times things are copied. Most likely the AI layer on top of those searches has not been set up to specifically try to find the ultimate source. But that's a less sexy headline.

As with most scenarios it's probably good to consider an AI to be a well read, articulate temp worker that will sometimes make things up because they must write at a fixed word per minute rate.
 
Upvote
-6 (0 / -6)

One off

Ars Scholae Palatinae
1,235
Fake citations is still the easiest way to catch and prove students are writing research papers in my class with AI. Like most AI crap, it passes a quick skim inspection, but is still way easier to prove to student disciplinary committees than other "maybe" AI generated text. That said, I worry that these "look there are citations" AI output will fool more people into believing fake science info because very few people go to check original citations.
The spreaders of fake science have been around since long before LLMs, in my experience their links will lead somewhere but it will be to a very low quality source or they will misrepresent the content of a trustworthy source. It's the ease of astroturfing enabled by LLMs that worries me. Drown the truth in shit.
 
Upvote
3 (3 / 0)

One off

Ars Scholae Palatinae
1,235
In terms of the research itself, it's a bit disappointing that they didn't also test the underlying search engines. This would have been an interesting comparison.
I sort of agree with you, in that the first paragraph of this article is a bit misleading, 'queries about news sources' could have been better worded, but the headline I'm seeing puts it better, 'AI search engines cite incorrect sources at an alarming 60% rate, study says'. I don't see the inaccuracy there that you do. In regard to your point above, they did. They only used extracts that returned the original source in the top three results of traditional search.
 
Last edited:
Upvote
0 (0 / 0)
Is there any, any chance that this whole A.I. balloon is going to pop, the way dot com bubble popped in 2001? It really starting to seem like it’s never going to reliably perform at even 50% of the projections we’ve seen over the past three years.

If feels a bit like the mid-late 90s again, but this time SNL is funny, I'm no longer a teenage virgin, and a presidential blowjob seems almost quaint.
 
Upvote
3 (3 / 0)
That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.
And even if it were true… then why release something that performs so horribly?

I know it's a fictionalized version of the events, but more tech bros should consider that memorable exchange in Blackberry (2023):

Mike Lazaridis: I will build a prototype, but I'll do it perfectly or I don't do it.
Jim Balsillie: Mike, are you familiar with the saying "Perfect is the enemy of good"?
Mike Lazaridis: Well, "good enough" is the enemy of humanity.
 
Upvote
2 (2 / 0)
That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.

He should be fired for saying something that stupid.
On top of that, it doesn't take into account enshittification. Clearly, products can peak and get worse. Another good example - U.S. made cars in the late 1970s and early 1980s. Badge engineered shitboxes.
 
Upvote
3 (3 / 0)

hambone

Ars Praefectus
4,319
Subscriptor
The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.


Even when these AI search tools cited sources, they often directed users to syndicated versions of content on platforms like Yahoo News rather than original publisher sites. This occurred even in cases where publishers had formal licensing agreements with AI companies.

"bUt tRuSt uS wE CaN sELf ReGuLAtE!!!!!" - AI companies

:rolleyes:
 
Upvote
7 (7 / 0)

KingAZAZ

Ars Centurion
321
Subscriptor
"If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."
Then don't label it as a search specific tool as though the accuracy problem were anywhere near solved. You're not putting this out there only for savvy users, but for the general public so take a modicum of responsibility.

Anyway, what's the difference between my asking for some clear facts about something that's easily discernable through traditional search and asking for any other accurate information? Apparently if it can handle the task it does so a low percentage of the time, and when it can't do it it spits out confabulatory garbage indiscernible from an accurate response unless I do the legwork myself. So why in the world would I use such a tool in the context of search? These so-called "search" flavors of models should simply spit out, oh I dunno, a list of 10 links for me to peruse on my own I guess.
 
Upvote
5 (5 / 0)

David651

Smack-Fu Master, in training
61
Can we go back to just the links please? With AI at the helm of Internet searches and able to infer context with our inquires and knowing AI can be biased by its maintainers, this quote from Noam Chomsky comes to mind:

The smart way to keep people passive and obedient is to strictly limit the spectrum of acceptable opinion, but allow very lively debate within that spectrum - even encourage the more critical and dissident views. That gives people the sense that there's free thinking going on, while all the time the presuppositions of the system are being reinforced by the limits put on the range of the debate.
 
Upvote
3 (3 / 0)

marsilies

Ars Legatus Legionis
23,253
Subscriptor++
Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).
The thing is, LLM's don't know what they "know" and don't know, in terms of knowledge, because they're not designed that way, and that's not how they were trained. The only thing they "know" are statistical relationships between tokens, where the tokens are words or phrases, and the output is just usually picking the next statistically likely token, giving the input and/or previous output, with a bit of randomness thrown in to occasionally pick a less likely token, so the output isn't fully deterministic. The only reason it provides anything factual at all is that it was trained on a bunch of text that was largely factual, so the relationships between tokens that describe a fact are particularly strong. But lies can have strong statistical relationships between tokens, and since it's always going to pick something, even some weak statistical relationships could get picked.

The best it could do is provide the statistical probabilities of the next token it picked, but that's no inherently an indication of factualness or "knowledge".
 
Upvote
13 (13 / 0)
When you consider that a pure guess should be wrong only 50% of the time.
Worse than that. Artificial General Intelligence systems gobble up Internet content irrespective of copyright.

Considering the rampant amount of disinformation prevalent on the Internet, I frankly wouldn’t trust anything AI responded with. In fact, the big tech companies in not policing disinformation, have sown the seeds of uselessness in this potential tool.

Garbage in, garbage out.
 
Upvote
5 (5 / 0)