Cloudflare turns AI against itself with endless maze of irrelevant facts

dalef

Seniorius Lurkius
29
Honestly the fact that Cloudflare doesn't attempt to play judge on what sites they protect seems like a strength to me. No one likes neonazis and such but I'd rather not have a company arbitrarily decide whether my website deserves to stay online or not.

Sure, my opinions may align with those of the company today but there's no guarantee that it'll still be that way tomorrow, be it because of a change in leadership or a presidential order. If Cloudflare were to aggressively reject nazis then they could do the same with e.g. good information about vaccines so the less precedent there is for Cloudflare refusing customers and playing judge the better imo.


Yes, you get it. Businesses should not be in the business of judging their customers, but businesses should provide the goods and/or services that customers pay to receive. Internet companies should be like McDonald's: Regardless of race, ethnicity, sex, gender identification, religion, political beliefs, what sports teams you support, etc., if you pay for a Bic Mac, you get a Big Mac.

I can certainly understand being against Neo-Nazis and other race supremacists, but once you deny one group something, it becomes easier to deny other groups something as well. For example, it can easily be imagined that Cloudflare could be put under pressure to censor any content related to DEI. Free expression should not be subject to whoever is in power or the current cultural zeitgeist.
 
Upvote
-2 (11 / -13)

mozbo

Ars Tribunus Militum
1,839
Yes, you get it. Businesses should not be in the business of judging their customers, but businesses should provide the goods and/or services that customers pay to receive. Internet companies should be like McDonald's: Regardless of race, ethnicity, sex, gender identification, religion, political beliefs, what sports teams you support, etc., if you pay for a Bic Mac, you get a Big Mac.

I can certainly understand being against Neo-Nazis and other race supremacists, but once you deny one group something, it becomes easier to deny other groups something as well. For example, it can easily be imagined that Cloudflare could be put under pressure to censor any content related to DEI. Free expression should not be subject to whoever is in power or the current cultural zeitgeist.
And when the product is access to consumers' eyeballs?

It's no coincidence that fascism spreads easily through In social media, where the user is the product.
 
Upvote
12 (13 / -1)

aiken_d

Ars Tribunus Militum
2,019
Yes, you get it. Businesses should not be in the business of judging their customers, but businesses should provide the goods and/or services that customers pay to receive. Internet companies should be like McDonald's: Regardless of race, ethnicity, sex, gender identification, religion, political beliefs, what sports teams you support, etc., if you pay for a Bic Mac, you get a Big Mac.

I can certainly understand being against Neo-Nazis and other race supremacists, but once you deny one group something, it becomes easier to deny other groups something as well. For example, it can easily be imagined that Cloudflare could be put under pressure to censor any content related to DEI. Free expression should not be subject to whoever is in power or the current cultural zeitgeist.
This is a take only a stright white Christian cis white male could have.

Many businesses value their employees. The idea of telling a black, or female, or gay employee that they must help / support a group calling for their marginalization or imprisonment is just abhorrent. Businesses can and should exercise editorial judgment on who they do business with.

You’re advocating for “just following orders” as not just acceptable but mandatory. No.
 
Upvote
12 (20 / -8)

Gunman

Ars Scholae Palatinae
1,112
Subscriptor
Nope it is not a slippery slope to deny service to nazis. There is literally nothing that forces you to deny services to other groups after you start doing that to nazis. Nazis shouldn't be welcome, and indeed many business refuse them services for very, very good reasons. The biggest one being fucking human decency, but also that they are subversive by nature and don't play by the rules, so accepting them is in fact the actual, real slippery slope.
Linking to the famous nazi bar Twitter thread screenshot: https://pbs.twimg.com/media/FKdZP_rXMAMczzq?format=jpg&name=medium
 
Upvote
24 (24 / 0)

faffod

Ars Praetorian
471
Subscriptor
we reported on "Nepenthes," software that similarly lures AI crawlers into mazes
I don't think that it is fair to say that the web site is luring the crawler - the crawler found the site, the crawler disregarded the site's "no crawler's sign", the crawler spammed the site to suck it's content for the crawler's own profit with no remuneration to the site for their creations.
A predator lures its prey. This is just judo flipping the assailant.
 
Upvote
28 (28 / 0)

jabra86

Smack-Fu Master, in training
9
So how I'm seeing it is, we are now creating an alterweb adjacent to the one we use, designed only for bots. Looking at the sorry state of the current internet makes me wonder if this is also a fake internet, one designed for capitalistic purposes and manipulated by algorithms to a finite end. So then are we in a parallel internet already? And what is the real internet?
 
Upvote
1 (1 / 0)

kaleberg

Ars Scholae Palatinae
1,125
Subscriptor
So, there is a secret robot internet!

https://www.smbc-comics.com/comics/20130605.png
 
Upvote
12 (13 / -1)
I have been running Nepenthes since January 26th, even going so far as to set up an Apache load-balancer to allow 4x Nepenthes processes to handle the load. I am up to 57,019,966 hits in that time, and crack a grin every time I check the stats. That's 167 GiB of pseudo-gibberish returned. The only way it could be sweeter is if I used an LLM to churn the data.

The author is working on a no-SQL option for Nepenthes, which I highly recommend. It is around 50x more CPU efficient compared to the SQL version, and only takes 5x more RAM (43MB to 240MB) for my 496,467 word corpus.
Got a link to a good how-to? I can serve 100G a month without worries!
 
Upvote
3 (3 / 0)
Like it or not, you're never going to be able to allow the google crawler access to your content while also blocking AI crawlers, simply because google is training AI as well. We're going to have to fight these plagiarism tools in court. If you are concerned with blocking access short-term, then my advice to you is to login gate content and aggressively IP ban.
 
Upvote
2 (4 / -2)
So, question. These automated scanners are accessing remote servers to which they don't have permission, correct? Hence the whole point of robots.txt - which has always been an 'optional' file to let search engines and other scrapers know that hey, you don't want to be scanned.

If the bot still scans your site, aren't they now running afoul of the Computer Misuse Act of 1990 (UK) or the Computer Fraud and Abuse Act (USA)?

Example: how The Realm members Electron and Phoenix were taken down in 1990 for finding an open telnet connection, which was publicly available, at NASA?

Reference: https://en.wikipedia.org/wiki/In_the_Realm_of_the_Hackers
 
Upvote
8 (8 / 0)

mcswell

Ars Scholae Palatinae
659
Related to this: maybe 6 or 7 years ago Stanford computer engineering developed a ‘search engine’ that worked in the background, constantly stringing three random words together as search phrases, to confuse and fill up any personal data mining files used for targeting advertising with literal junk. Anyone remember what’s its called?
I don't know about that one, but there is a Chomskybot that creates long texts strongly resembling Chomsky's linguistic writing, but which are gibberish. Perhaps this could be modified to resemble most any writer, and used to lead AI dredgers (I use that word intentionally) astray.

Note 1: Before someone says Chomsky himself wrote gibberish, I'm a linguist, and have read much of Chomsky's writing on linguistics (not on politics). It is not always easy to follow, but it is definitely not gibberish.
Note 2: Chomsky recently suffered a stroke, and apparently can no longer communicate.
 
Upvote
10 (10 / 0)
article said:
"No real human would go four links deep into a maze of AI-generated nonsense," Cloudflare explains. "Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots."
Cloudflare has obviously never witnessed my father in law trying to surf the web...
 
Upvote
2 (2 / 0)

The Geeman

Wise, Aged Ars Veteran
125
Why? Why should I care if a service that's wasting my server's time, forcing me to spend my own money to provide free information, gets accurate data? If their tools provide false information, that's a them problem, not a me problem.
...
Fuck these leeches.
As we know half the AI output is tainted by shit, we also know the public at large will think it's the truth. We can ignore it and they won't know that they should.
It's an all or nothing situation in which I try not to come anywhere close to AI but that will get harder and harder to ensure.
And don't get me started on wholly AI generated and narrated YouTube videos, FFS I'll never get those minutes back!
 
Upvote
3 (3 / 0)

Ralf The Dog

Ars Praefectus
4,400
Subscriptor++
Dang. I was hoping they used mashups of Monty Python.

Maybe we can start a petition.
I did that. There was a site with a Moscow IP that was spamming one of my web forms with injection attacks. They were hitting me with about five per second. When I detected them, I wrote a quick filter that sent them the Monty Python Spam Song, 65863 verses of it. They quickly went away.
 
Upvote
7 (7 / 0)
I don't know about that one, but there is a Chomskybot that creates long texts strongly resembling Chomsky's linguistic writing, but which are gibberish. Perhaps this could be modified to resemble most any writer, and used to lead AI dredgers (I use that word intentionally) astray.

Note 1: Before someone says Chomsky himself wrote gibberish, I'm a linguist, and have read much of Chomsky's writing on linguistics (not on politics). It is not always easy to follow, but it is definitely not gibberish.
Note 2: Chomsky recently suffered a stroke, and apparently can no longer communicate.
Chromskybot has the capacity to produce irrelevant strings for about 10% of the time until the heat death of the universe. That should be sufficient to screw the AI developers.
 
Upvote
3 (3 / 0)

Carewolf

Ars Tribunus Angusticlavius
9,655
Emphasis is mine; this is what I'm responding to.

Cloudflare is used by a large number of DDoS-for-hire services. You have to pay Cloudflare to protect yourself from the companies that cause the problem. They also have a long history of hosting Nazis. After years of complaints, their CEO finally kicked one neoNazi website off, but dozens (if not hundreds) remain.

https://www.propublica.org/article/how-cloudflare-helps-serve-up-hate-on-the-web
https://krebsonsecurity.com/2016/08/inside-the-attack-that-almost-broke-the-internet/

Note in the second link, at one point, the chat logs show the attackers are amused that Spamhaus is a Cloudflare customer... and Spamhaus lists a lot of Cloudflare IPs as spam cannons.

Cloudflare does a lot of really great technical work. I want to give my full-throated support to Cloudflare. I really, really, do. But so long as they do minimal-to-no curation of their customers, I always have to put an asterisk on that support.

So, that's why Cloudflare gets a ton of hate. They sell DDoS protection to you... and the companies who DDoS you. They make it possible for Nazis to have their websites. They do good work... and also provide that good work to some of the most vile scum on the planet. It's not hard to be against Nazis. If someone at Cloudflare is reading this, kick them off and I will pick up the phone calls from your sales people. Otherwise, I'll continue to pick up and ask "Sounds great, ready to sign -- oh wait, by the way, do you still platform Nazis? Uh-huh, yeah, no, I'm not going to let you weasel out of this. Once you drop them I'll sign. I hope you're tracking revenue lost to your support of Nazis internally. Have a great weekend!"
Cloudflare also block smaller browsers, like privacy specialed forks and the like.
 
Upvote
0 (0 / 0)

Carewolf

Ars Tribunus Angusticlavius
9,655
Like it or not, you're never going to be able to allow the google crawler access to your content while also blocking AI crawlers, simply because google is training AI as well. We're going to have to fight these plagiarism tools in court. If you are concerned with blocking access short-term, then my advice to you is to login gate content and aggressively IP ban.
The google crawler respects the robot.txt files. This is against crawlers that ignore it. So not indexing context explicitly marked not to be indexed is fine...
 
Upvote
2 (2 / 0)

graylshaped

Ars Legatus Legionis
61,378
Subscriptor++
The google crawler respects the robot.txt files. This is against crawlers that ignore it. So not indexing context explicitly marked not to be indexed is fine...
I’m sure google has a crawler that respects a boundary or two, that they can point to and say “See?”

I have no confidence they do not also have other crawlers.
 
Upvote
3 (4 / -1)
the irony here is that we're building increasingly elaborate mazes that only humans will eventually get lost in. AI systems will adapt, they always do.

what happens when the complexity of these defensive systems exceeds what an average person can navigate? we could end up with a web where sophisticated AI can pass as human but many humans can't pass as "human enough."

it's an arms race, but with a twist: the bots will learn to solve the maze by simplifying their approach. sometimes lower "intelligence" is more effective than higher intelligence for specific tasks.

the entire history of captchas suggests this outcome. first we built tests only humans could pass, now we build tests that frustrate humans while bots breeze through.

(strange thought: maybe the ultimate defense isn't complexity but unpredictability; systems that make no sense at all but still function. humans excel at navigating absurdity in ways AI still struggles with.)
 
Upvote
-10 (1 / -11)

scrimbul

Ars Tribunus Militum
2,429
So how I'm seeing it is, we are now creating an alterweb adjacent to the one we use, designed only for bots. Looking at the sorry state of the current internet makes me wonder if this is also a fake internet, one designed for capitalistic purposes and manipulated by algorithms to a finite end. So then are we in a parallel internet already? And what is the real internet?
https://en.wikipedia.org/wiki/Dead_Internet_theory

The main issue with this theory was largely the timing, it was ridiculous to be proposing its end culmination in 2016 or 2017 unless you were on 4Chan, nowadays however with things like Cloudflare and Nepenthes necessary and the cat-and-mouse game to get crawlers and scrapers working, the only thing that changes about this theory are the details but the logic is enough to argue for the immediate cessation of publically accessible LLMs/generative AI models and their associated crawlers.

Making the conspiracy theory more 'fun' is the fact that localized generative AI means Pandora's Box is open and even if state actors don't use it themselves or regulate it, private actors of various sizes will all be ready and willing to 'flood the zone with shit'.

As any person who thinks about these things for two seconds, there is an endpoint where 'flood the zone with shit' is another version of the https://en.wikipedia.org/wiki/Tragedy_of_the_commons and because of that fact there is no 'win condition' or way to not harm everyone at once. The only logical endpoint for all of it is another spin on https://en.wikipedia.org/wiki/Accelerationism

The logical, ethical and moral counter to Accelerationism, of course, is varying beneficial forms of https://en.wikipedia.org/wiki/Utilitarianism which requires no further debate if brought up as a counter to Accelerationism and the Dead Internet Theory, e.g. why regulation and/or an external force may be necessary to break those cycles.
 
Last edited:
Upvote
0 (0 / 0)

MMarsh

Ars Praefectus
4,318
Subscriptor
This is the site that 'checks my browser' for 5 seconds every time I connect to a site hosted / protected by it, right?
Yes, the very same. That relatively minor inconvenience (to a human, on occasion) is great for throwing a spanner in the works of someone trying to do a DDoS attack.

CloudFlare is usually seamless. When you notice it, it's because it's trying to keep a particular site alive in the face of something that would ordinarily bring a top tier web server to its knees.
 
Upvote
4 (4 / 0)

maxoakland

Ars Scholae Palatinae
923
Sure, but we're now at the stage where even the defenses cannot avoid being inherently damaging. The pigs are increasingly successful in dragging us into the mud fights they love so much. And everything gets covered in shit. The only way out may require REALLY drastic measures.
Drastic measures are needed. Drastic things like completely ending the use of fossil fuels immediately, for example
 
Upvote
3 (3 / 0)

maxoakland

Ars Scholae Palatinae
923
We live in a time when there's not enough money for healthcare or housing, but UNLIMITED FUNDS to throw at AI. . ..
That’s because the rich and powerful believe AI will be able to free them from paying workers anything at all

Right now, we don’t collectively realize it, but they need us more than we need them. We have the power, even though we aren’t using it

AI promises to change that and make workers completely powerless. And we have to recognize our power before that comes to fruition
 
Upvote
2 (2 / 0)

DarthSlack

Ars Legatus Legionis
20,657
Subscriptor++
We're just touching surface of what AI can do for humanity and we're already trying to poison it. Sad.

No, the goal is to shut down the shitgibbons who seem to think that they can scrape any and all content they might ever possibly reach, for free.

If said shitgibbons were to train their models only on public access information or with information they actually have permission to use, none of this would be needed.
 
Upvote
9 (9 / 0)

Arstotzka

Ars Scholae Palatinae
969
Subscriptor++
Honestly the fact that Cloudflare doesn't attempt to play judge on what sites they protect seems like a strength to me. No one likes neonazis and such but I'd rather not have a company arbitrarily decide whether my website deserves to stay online or not.

Sure, my opinions may align with those of the company today but there's no guarantee that it'll still be that way tomorrow, be it because of a change in leadership or a presidential order. If Cloudflare were to aggressively reject nazis then they could do the same with e.g. good information about vaccines so the less precedent there is for Cloudflare refusing customers and playing judge the better imo.
CloudFlare can play judge tomorrow if they like. You are always at the whim of a private entity; so long as they aren’t discriminatory (in the legal sense of the word) they don’t have to do business with you.

And your argument is a very American one. A lot of other countries — ie Germany — have laws about content and don’t seem to be having the issue you are concerned about. While I appreciate the protections of the 1st Amendment, is it also possible for the Paradox of Tolerance to be at play?
 
Upvote
4 (4 / 0)

graylshaped

Ars Legatus Legionis
61,378
Subscriptor++
CloudFlare can play judge tomorrow if they like. You are always at the whim of a private entity; so long as they aren’t discriminatory (in the legal sense of the word) they don’t have to do business with you.

And your argument is a very American one. A lot of other countries — ie Germany — have laws about content and don’t seem to be having the issue you are concerned about. While I appreciate the protections of the 1st Amendment, is it also possible for the Paradox of Tolerance to be at play?
Suggesting it is an American ethos plays into the commonly-held misperception that the First Amendment is relevant to business relationships between non-governmental entities. It seems more rooted in mythical constructs of sanctified “neutral grounds” or “honor among thieves,” which would have been imports from older cultures.

A more apt analogy might be the “neutrality” of Swiss banking. In contrast to that approach, even in states where marijuana is legal, federal banking regulations hamstring the industry’s ability in the US to operate in what would be considered a reputable, aboveboard manner.

Without defending or even commenting on Cloudflare, there is a legitimate argument for providers of certain functions essential to the smooth functioning of modern society and its economy with whose interference by malicious actors—explicitly including sovereign actions of nation-states—to have leeway to provide such services IF they choose to do so—that fall outside of reasonable defined aiding and abetting. As with Swiss banking, there is ample room for reasonable regulation of such services.

Those who choose not to do business with providers because of their other customers, similarly, should not face sanction for their exercise of free association.

Personally, my DNS requests are resolved through a bunch of 1s rather than a bunch of 8s because of my personal evaluation of the business ethics of their respective providers.
 
Upvote
5 (5 / 0)