Cloudflare turns AI against itself with endless maze of irrelevant facts

nzeid

Ars Praetorian
485
Subscriptor
"No real human would go four links deep into a maze of AI-generated nonsense," Cloudflare explains. "Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots."

Just commented yesterday on an HN thread about this exact topic. The thread goes for several pages about how to prevent humans from getting caught up in CAPTCHA-likes in the first place...

So how does Cloudflare prevent humans from being served these "mazes" at all? A lot of businesses leveraging this service would be deeply concerned.
 
Upvote
52 (68 / -16)

Fatesrider

Ars Legatus Legionis
22,964
Subscriptor
The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven).
Based on what I've read of the likely reasons behind hallucinations and the prevalence of fake news/information on the Internet, I think the evidence will bear out that it is ineffective in that regard.

That's a hypothesis, not a Theory, though.
 
Upvote
68 (71 / -3)

OtherSystemGuy

Ars Scholae Palatinae
1,137
Subscriptor++
How dare they prevent my minions from access data that is unrightfully mine!

Also, wasting AI company resources might not please people who are critical of the perceived energy and environmental costs of running AI models.

And all the VC money that's being wasted on training LLM models that have never shown any ability to improve their accuracy when asked novel questions. Unless it's in the training data, the LLM is going to fail and the Internet doesn't hold infinite human knowledge, so stop looking and wasting money and the environment. Get over it. LLMs are really poorly designed storage retrieval systems where the likes of Google returned useful answers before they replaced their original weighted system with AI (and ads).
 
Upvote
175 (185 / -10)

JoHBE

Ars Tribunus Militum
2,551
Subscriptor++
This timeline is beyond batshit insane crazy. I fully believe we deserve to die out as a species, based on the incessant barrage of stupidity that has been unleashed over the last couple of years.

edit: not targeting this particular CloudFlare service, but the whole context that lead to this
 
Last edited:
Upvote
11 (60 / -49)

maxoakland

Ars Scholae Palatinae
939
This timeline is beyond batshit insane crazy. I fully believe we deserve to die out as a species, based on the incessant barrage of stupidity that has been unleashed over the last couple of years.
Nihilism is only going to make this situation worse. People who don't like this have to fight for a better world, and that's complex but it involves doing things that bring out the best in people like education, connection, community, etc

The reason things suck so much now is our society has extremely powerful perverse incentives that encourage it. We have to do what we can individually and as groups to negate those perverse incentives and then outlaw them
 
Upvote
238 (240 / -2)

JoHBE

Ars Tribunus Militum
2,551
Subscriptor++
Nihilism is only going to make this situation worse. People who don't like this have to fight for a better world, and that's complex but it involves doing things that bring out the best in people like education, connection, community, etc

The reason things suck so much now is our society has extremely powerful perverse incentives that encourage it. We have to do what we can individually and as groups to negate those perverse incentives and then outlaw them
Sure, but we're now at the stage where even the defenses cannot avoid being inherently damaging. The pigs are increasingly successful in dragging us into the mud fights they love so much. And everything gets covered in shit. The only way out may require REALLY drastic measures.
 
Upvote
70 (70 / 0)

mozbo

Ars Tribunus Militum
1,867
the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics

Dang. I was hoping they used mashups of Monty Python.

Maybe we can start a petition.
 
Upvote
37 (37 / 0)
The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation
Why? Why should I care if a service that's wasting my server's time, forcing me to spend my own money to provide free information, gets accurate data? If their tools provide false information, that's a them problem, not a me problem.

Poison their wells. Black is white. Water is a dangerous, explosive acid that can be used to clean the moon in an emergency. Maple syrup is a good substitute for blood in baby orangutans.

Fuck these leeches.
 
Upvote
204 (212 / -8)
Just commented yesterday on an HN thread about this exact topic. The thread goes for several pages about how to prevent humans from getting caught up in CAPTCHA-likes in the first place...

So how does Cloudflare prevent humans from being served these "mazes" at all? A lot of businesses leveraging this service would be deeply concerned.
It's pretty easy to detect the patterns if you look at the data.

Kudos to Cloudflare for doing this. For some reason, they get a ton of hate, but they've consistently tried to make the internet a better place, whether that be lower cloud prices or better tools to combat DDoSes and AI nonsense.

EDITED to add since I didn't answer your question. I admittedly don't know the full details of Cloudflair's detection mechanism, however they've had a mechanism in place for blocking AI crawlers for a while now, and I haven't seen any complaints. Most users won't browse every page of your site, nor will they do so from a few IP addresses. Shoot, even on my own sites, I can spot a crawler a mile away. Normal users hit 1-3 pages at best, maybe 4 if I am lucky. The bots crawl thousands of pages. Many of them hide behind a fake user agent and even use Selenium with Firefox, Chrome, etc. to try to fly under the radar. A few also use multiple IPs to lower the rate of detection.
 
Last edited:
Upvote
95 (97 / -2)

mozbo

Ars Tribunus Militum
1,867
Upvote
121 (121 / 0)
Post content hidden for low score. Show…

SubWoofer2

Ars Scholae Palatinae
1,991
Please tell me the computers literally explode in showers of sparks. If all the horrors of sci-fi have to come true we should at least get a little bit of the cool stuff!
Even as a child I wondered why, in the future, the Star Trek Bridge was built without fuses. Or seatbelts.

Anyway 100% kudos to Cloudflare as long as their honeypot contains truthful information, not rubbish.
 
Upvote
55 (56 / -1)

Mardaneus

Ars Tribunus Militum
1,974
The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven)
Not spreading misinformation is nice and all but that ship has sailed.
The Russian propaganda network(s) are currently using the same tricks to pollute the AI training sets. For a human it doesn't look like a navigable website but it can be crawled and since someone 'forgot' the robots.txt they do.
 
Upvote
49 (49 / 0)
Ugh - really...
I am on the full boycott of tech already, to the extent of my abilities.

Replacing 2009 Mac Pro (that I got already used, and daily drove for 6 years) with an used ThinkPad P1 Gen 4.
Used lens for my SLR, keeping it as long as possible.
Sick of throwing away good phones, gave the Librem 5 a chance.
Three days ago, I nuked my AWS account (hosting my personal glacier backups), and reverted to driving tapes to a friend 100km away once per week.
Canon released the stupid pro-1100 instead of letting us just buy the new ink - GONE.

I have reduced my web browsing habits to Ars, 3-4 other web pages, 3-4 youtube channels and 3 small independent forums.

BTW, there are plenty of small independent forums for more niche topics all around, covering all from unix workstations to large format photography, so no need to do facebook groups or reddit either.

You will get in return: healthier online communities, more free time, a fatter piggy bank and less anger. The costs are lower fps in games, having to call restaurants to order food and patience to score the best deals for used stuff.
 
Upvote
-2 (28 / -30)

Shazster

Ars Scholae Palatinae
807
This timeline is beyond batshit insane crazy. I fully believe we deserve to die out as a species, based on the incessant barrage of stupidity that has been unleashed over the last couple of years.

edit: not targeting this particular CloudFlare service, but the whole context that lead to this
The probability of just walking over to my router and unplugging the cable in the WAN port has begun to surpass Non-Zero.

The soothing and sanity-reinforcing Borg-song of the witty and informed Ars commentariate I have come to cherish has staved that off so far...but JHFC this reality is approaching unbearable.
 
Upvote
31 (32 / -1)

Sideros

Smack-Fu Master, in training
68
The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation
I'm disappointed they are not using the tool as a deterrent, as well as protection.
 
Upvote
10 (10 / 0)

Arstotzka

Ars Scholae Palatinae
978
Subscriptor++
Kudos to Cloudflare for doing this. For some reason, they get a ton of hate, but they've consistently tried to make the internet a better place, whether that be lower cloud prices or better tools to combat DDoSes and AI nonsense.
Emphasis is mine; this is what I'm responding to.

Cloudflare is used by a large number of DDoS-for-hire services. You have to pay Cloudflare to protect yourself from the companies that cause the problem. They also have a long history of hosting Nazis. After years of complaints, their CEO finally kicked one neoNazi website off, but dozens (if not hundreds) remain.

https://www.propublica.org/article/how-cloudflare-helps-serve-up-hate-on-the-web
https://krebsonsecurity.com/2016/08/inside-the-attack-that-almost-broke-the-internet/

Note in the second link, at one point, the chat logs show the attackers are amused that Spamhaus is a Cloudflare customer... and Spamhaus lists a lot of Cloudflare IPs as spam cannons.

Cloudflare does a lot of really great technical work. I want to give my full-throated support to Cloudflare. I really, really, do. But so long as they do minimal-to-no curation of their customers, I always have to put an asterisk on that support.

So, that's why Cloudflare gets a ton of hate. They sell DDoS protection to you... and the companies who DDoS you. They make it possible for Nazis to have their websites. They do good work... and also provide that good work to some of the most vile scum on the planet. It's not hard to be against Nazis. If someone at Cloudflare is reading this, kick them off and I will pick up the phone calls from your sales people. Otherwise, I'll continue to pick up and ask "Sounds great, ready to sign -- oh wait, by the way, do you still platform Nazis? Uh-huh, yeah, no, I'm not going to let you weasel out of this. Once you drop them I'll sign. I hope you're tracking revenue lost to your support of Nazis internally. Have a great weekend!"
 
Upvote
37 (63 / -26)

android_alpaca

Ars Praefectus
4,678
Subscriptor
"...with AI now being used on both sides of the battle."

I'm of the age where I sort of wish Kurt Vonnegut and George Carlin were still around to help make sense of and humor these absurdities.
tumblr_pwe1w4XCY81rrkahjo1_540.gifv
 
Upvote
35 (37 / -2)

android_alpaca

Ars Praefectus
4,678
Subscriptor
"No real human would go four links deep into a maze of AI-generated nonsense," Cloudflare explains.
Me, at 3 am, doomscrolling Reddit.

Wait, am I a bot?
I was thinking of wikipedia myself.

carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics

the_problem_with_wikipedia.png
 
Upvote
85 (85 / 0)