Claudebot - Thousands of Hits

alexm · Apr 2, 2024

Just in case you didn't see this, in recent weeks suddenly...

Added to robots.txt:

Code:

User-agent: anthropic-ai
Disallow: /

It seems to obey the rules with robots.txt at this moment in time, so that is one solution.

Regards,

Alex

philmckrackon · Apr 3, 2024

Also, if it stops obeying robots.txt. Add this to the .htaccess file.

Code:

BrowserMatchNoCase "anthropic-ai" bad_bot
Order Deny,Allow
Deny from env=bad_bot

RandallC · May 4, 2024

Not going to lie- they have been tearing my site up and I didnt know who it was until I installed the KnownBots addon. I even started to block Amazon AWS IP addresses.

duderuud · May 4, 2024

Wow, just blocked that useragent in Cloudflare and it already blocked 300 requests in a few minutes.
Looks like it is a bot that gathers data for AI.

I (as a website owner) have an opinion about that...

Wildcat Media · May 20, 2024

duderuud said:
Wow, just blocked that useragent in Cloudflare and it already blocked 300 requests in a few minutes.
Looks like it is a bot that gathers data for AI.

I (as a website owner) have an opinion about that...

Yeah, as an "oldtimer" in this Internet stuff, having "AI" shoved in my face every time I turn around is getting annoying.

Wildcat Media · May 20, 2024

duderuud said:
Wow, just blocked that useragent in Cloudflare and it already blocked 300 requests in a few minutes.

Just found this at Reddit...

User Agent: compatible; "ClaudeBot/1.0; +claudebot\@anthropic.com"

Before April 19, it was just: "claudebot"

What should we be blocking as a user-agent? Under our domain in Cloudflare, I can add user agent blocking under Security/WAF and add something like (lower(http.user_agent) contains "claude") or (lower(http.user_agent) contains "anthropic") which should catch them, I think? (I'm thinking "contains" might be enough of a wildcard on both terms to catch both variants...?)

Sim · May 20, 2024

Wildcat Media said:
What should we be blocking as a user-agent?

Yes, a simple match on claudebot should do the trick.

I capture user agents from sites using my KnownBots addon so I can analyse them to identify bots - I currently have 59,540 user agents in the database. This is useful because it allows me to search the database of user agents to see whether a simple match will be sufficient or whether it will lead to false-positives.

In this case, the only user agents matching claudebot are:

Mozilla/5.0 AppleWebKit/0.0 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +support@anthropic.com)
Mozilla/5.0 AppleWebKit/0.0 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Mozilla/5.0 AppleWebKit/0.0 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com),gzip(gfe)
ClaudeBot

(note that I strip certain variables like version numbers from certain strings to minimise duplicates - hence the AppleWebKit/0.0)

I've had to block claudebot at the Cloudflare level on several of my sites because of this bad behaviour.

Wildcat Media · May 20, 2024

Wow, thanks for that. And I should probably get that addon one of these days. I had one of my low-traffic sites get slammed a couple of weeks ago and I'm betting that was the culprit, as I've blocked most other bots using Cloudflare.

Claudebot - Thousands of Hits

alexm

Active member

philmckrackon

Well-known member

RandallC

Well-known member

duderuud

Well-known member

Wildcat Media

Well-known member

Wildcat Media

Well-known member

Sim

Well-known member

Wildcat Media

Well-known member

Similar threads

We value your privacy