CJR study shows AI search services misinform users and ignore publisher exclusion requests.
See full article...
See full article...
On that particular point:And AGI is right around the corner and these people love to claim that hallucination isn’t as big of a problem anymore >.>
Waiting for this article to be on hacker news for everyone to come out and defend and downplay this.
There is a huge amount of money and advertising effort in convincing the average user that these tools are reliable.
It is beyond dispute to assert that Grok, doesn't. It follows the Muskian convention of explicitly NOT doing what his products' names describe, with perhaps the exception of Boring, though in that case it is more of an accurate adjective of its results than a verb of action and accomplishment.Grok 3 demonstrated the highest error rate, at 94 percent.
Of course in this context, "updated" translates as "Congressional revision or repeal of existing law or defunding of its enforcement in response to perfectly legal corporate bribery."AI companies want to bypass copyright laws , but aren't willing to wait for laws to be updated... I am shocked.
It's still really, really uncertain, but if you're really into reading the tea leaves there are some possible signs:Is there any, any chance that this whole A.I. balloon is going to pop, the way dot com bubble popped in 2001? It really starting to seem like it’s never going to reliably perform at even 50% of the projections we’ve seen over the past three years.
And let's not forget that Masayoshi Son has backed some of the stupidest sucker bets to come down the VC pipe. Something something Wirecard...OpenAI has turned to Softbank for continuing investment, and Softbank is borrowing like a gambling addict trying to claw back from the penny slots to the high-roller table in order to fund its investment in OpenAI
Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).At least it lies less than Trump...
That is the joke.Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).
Significantly. Google’s AI search can’t even give you a list of winged dinosaurs or tell you which reptiles are/were warm-blooded. Even worse. It’s confident in its wrongness.Is it just me or is the Google Search AI summary much worse than asking an AI chatbot directly?
Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried.
If I was evil scum I'd totally be marketing MAGAI® Search to the deplorables. Guaranteed to return doubleplusgood truthiness.
I think you might have "whoooshed" yourself mate. What is your use case where search is 75% wrong? Are you searching for a wendigo or bigfoot on google?Only 60% wrong? That's actually better than the results I've seen when I accidentally get Bing or Google on some other work PC (I use Kagi). I would have put it closer to 75% wrong, but fully admit there may be some expectation bias on my part. Or maybe it's just because my use cases are very different - I'm asking for factual specifics, not just asking it to cite where this random thing came from.
This is excellent and useful information if you are a food photographer. That 1/8th cup of glue makes your pizza photo look so much better.I think of dumb ai like malicious compliance from dumb people. The cheese will stick to a pizza if you add a 1/8 cup of glue; the request was fulfilled and the solution will work. I expect the same uselessness from a chatbot or from a stoned undergrad.
When you consider that a pure guess should be wrong only 50% of the time.It’s understandable that no one here appears to be very surprised, but it needs to be stressed that this is a horrific result. Wrong ninety percent of the time is shocking and unacceptable under any circumstances. There’s a huge problem right in front of us.
Not used any of the reasoning models. I'm curious if your experience matches my assumption that the reasoning outputs are written to satisfy the request to show workings rather than reflecting any actual process followed.Yeah, shocking... The other day I asked o3-mini a question, it had no idea, but still answered some bullshit again and again. I saved the "reasoning" text when I called its bullshit out, because it surprised me how revealing it was:
Often on LLM articles, I see people saying what amounts to "I can't see an immediate use for me, therefore they are useless to everyone". Your post seems to be more "I don't use LLMs this way so no one else does." When LLMs are being promoted as research tools, being able to identify sources and provide links is important.This doesn’t seem remotely informative? Traditional search is much better than GenAI for finding the origin of an exact piece of text. This seems like a study designed to find what it wants, that’s not even close to a real world use case.
I think this is unfair. I am plenty sceptical of the results and I've found that pretty much 100% of the time when I have deduced that the response to a prompt is incorrect, even when I did not already know the answer going in, I have been right about that. And even on occasions where I couldn't immediately tell the answer was wrong, as I always follow the "citations" to verify the information, a good chunk of the time the LLM still got it wrong at least in some way (such as, for example, giving battery replacement instructions for a different model than the one requested because the manufacturer provides both models' instructions on a single page and the chatbot has incorrectly parsed it).However, Howard also did some user shaming, suggesting it's the user's fault if they aren't skeptical of free AI tools’ accuracy: "If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."
lol. There is an interesting point though. First, training data. Much of online commentary is authoritative sounding bullshit, good faith people who don't know tend not to try and answer questions. Plus, no one publishes text books filled with "I don't know" as the answer to problems or publishes articles without a view point or narrative. Then there are the commercial drivers, you don't want your chatbot to appear less capable than the competition by treating 'IDK' as a good output. Finally the tech itself seems to be exploiting the relationships between words and phrases, not the things or concepts (such as correct and incorrect answers) those words and phrases evoke in a human mind.Huh, so they ARE actually acting more like humans.
I'm cynical enough not to think this is the tech's problem. I've met too many people who just don't care if information is true or not. Is it a simple answer? Does it keep teacher / boss off my back? Does it confirm my biases? Does it give me a feeling of security? Is it something I can leverage for advantage? etc.The sad thing is that the people most likely to believe everything the AI said and not fact-check are also the people most likely to never read this article.
In an article partly about absent links to sources, I feel compelled to append Ed Zitron's article to your post.It's still really, really uncertain, but if you're really into reading the tea leaves there are some possible signs:
1) Microsoft starting to pull back on data center build-out, suggesting that it thinks either A) its current compute capacity is sufficient for current and future demand, or that it's overbuilt. Even in the best-case scenario of "Deepseek made MS realize they can run LLMs much more efficiently," this still means they're jettisoning OpenAI, who has shown zero interest in adopting Deepseek's approach. This is more likely considering that...
2) OpenAI has turned to Softbank for continuing investment, and Softbank is borrowing like a gambling addict trying to claw back from the penny slots to the high-roller table in order to fund its investment in OpenAI (borrowing to invest in OpenAI itself, and borrowing to invest in data centers for OpenAI to use).
3) Oh, and this is ignoring nVidia, who made a fortune selling shovels for the gold rush and may soon find itself getting rugpulled.
4) There's also a company called CoreWeave whose entire purpose is renting out GPU data centers for AI, and whose biggest customer is Microsoft. They were the second biggest owners of data centers, and were already unprofitable, and then Microsoft canceled a bunch of contacts (see 1).
Granted, it could all go the other way and shake out in the end, who knows? It's also unlikely to "pop" in the sense of Google, Microsoft, nVidia, Meta, etc. going out of business, or LLMs going away entirely, since they do have some use cases. But I simply don't see OpenAI, Anthropic, Perplexity, or other companies whose entire business is AI surviving long-term.
The spreaders of fake science have been around since long before LLMs, in my experience their links will lead somewhere but it will be to a very low quality source or they will misrepresent the content of a trustworthy source. It's the ease of astroturfing enabled by LLMs that worries me. Drown the truth in shit.Fake citations is still the easiest way to catch and prove students are writing research papers in my class with AI. Like most AI crap, it passes a quick skim inspection, but is still way easier to prove to student disciplinary committees than other "maybe" AI generated text. That said, I worry that these "look there are citations" AI output will fool more people into believing fake science info because very few people go to check original citations.
I sort of agree with you, in that the first paragraph of this article is a bit misleading, 'queries about news sources' could have been better worded, but the headline I'm seeing puts it better, 'AI search engines cite incorrect sources at an alarming 60% rate, study says'. I don't see the inaccuracy there that you do. In regard to your point above, they did. They only used extracts that returned the original source in the top three results of traditional search.In terms of the research itself, it's a bit disappointing that they didn't also test the underlying search engines. This would have been an interesting comparison.
Is there any, any chance that this whole A.I. balloon is going to pop, the way dot com bubble popped in 2001? It really starting to seem like it’s never going to reliably perform at even 50% of the projections we’ve seen over the past three years.
And even if it were true… then why release something that performs so horribly?That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.
He should be fired for saying something that stupid.
On top of that, it doesn't take into account enshittification. Clearly, products can peak and get worse. Another good example - U.S. made cars in the late 1970s and early 1980s. Badge engineered shitboxes.That could be said about literally any technology at any point in history. It’s not even remotely an excuse. It’s semantically null.
He should be fired for saying something that stupid.
The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.
Even when these AI search tools cited sources, they often directed users to syndicated versions of content on platforms like Yahoo News rather than original publisher sites. This occurred even in cases where publishers had formal licensing agreements with AI companies.
Then don't label it as a search specific tool as though the accuracy problem were anywhere near solved. You're not putting this out there only for savvy users, but for the general public so take a modicum of responsibility."If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."
Should I start at the Stone Age or the Iron Age with the list? Clearly, you're not that in touch with how technology has evolved over the millenia.Did I say product?
Name a TECHNOLOGY that went downhill.
Can you afford 10,000,000$ a year for a battalion of layers?If AI companies are allowed to bypass copyright laws, so should I.
The thing is, LLM's don't know what they "know" and don't know, in terms of knowledge, because they're not designed that way, and that's not how they were trained. The only thing they "know" are statistical relationships between tokens, where the tokens are words or phrases, and the output is just usually picking the next statistically likely token, giving the input and/or previous output, with a bit of randomness thrown in to occasionally pick a less likely token, so the output isn't fully deterministic. The only reason it provides anything factual at all is that it was trained on a bunch of text that was largely factual, so the relationships between tokens that describe a fact are particularly strong. But lies can have strong statistical relationships between tokens, and since it's always going to pick something, even some weak statistical relationships could get picked.Burying the bar there. We should be aiming for something where if it doesn't "know", it says that it doesn't "know", instead of making stuff up out of whole cloth. Not accepting something just because it lies less than a known liar who lies as naturally as he breathes (if not more so).
Worse than that. Artificial General Intelligence systems gobble up Internet content irrespective of copyright.When you consider that a pure guess should be wrong only 50% of the time.