How to block Ahrefs, Semrush, Serpstat, Majestic SEO by htaccess or any method far away robots.txt

If your site uses Cloudflare (the site in your signature does), you can block the request at the network edge before it even gets to your server with Cloudflare.

Cloudflare allows you to block requests based on user agent.

1654827882221.png

1654827906675.png

You can also manage Cloudflare firewall rules without leaving the XenForo admin (if you want):

 
You can accomplish this various ways depending on your web server configuration:

Using Apache/NGINX Ultimate Bad Bots Blocker (not compatible with shared hosting)
We've been using this tool on a few of our servers to help keep a current and up to date bad bots list that are blocked directly at the NGINX level before it hits our fpm; You can configure it for both Apache or NGINX but you will likely need elevated access to your server to install it as described on the GitHub pages below:

Block Bots With Rules (case-insensitive)

The below code block can be used with NGINX in the server block for your website, it is important that this directive is set before any of your routing for XenForo happens. I like to return 418 I'm a Teapot to robots that I block (for a laugh), but generally a 403 Forbidden is the better response code.
NGINX:
if ($http_user_agent ~* (ahrefs|semrush|serpstat|majestic)) {
    return 418;
}

Place the below configuration in your .htaccess file (Apache) or .conf (LiteSpeed)
Apache config:
RewriteEngine on
RewriteCond %{HTTP_REFERER} ahrefs|semrush|serpstat|majestic [NC]
RewriteRule . - [R=418,L]


You could also block with iptables, but this is generally considered a bad idea because your firewall will then need to inspect every packet, adding a significant overhead to each connection; If you want to block it this way, please use a service like Cloudflare as @digitalpoint pointed out above 💪
 
If your site uses Cloudflare (the site in your signature does), you can block the request at the network edge before it even gets to your server with Cloudflare.

Cloudflare allows you to block requests based on user agent.

View attachment 269341

View attachment 269342

You can also manage Cloudflare firewall rules without leaving the XenForo admin (if you want):

Hello digitalpoint. Do you perhaps have a list of bots that should be blocked? And is there a guide on how to do this quickly and effectively via Cloudflare?

Best regards,
Chris
 
Hello digitalpoint. Do you perhaps have a list of bots that should be blocked? And is there a guide on how to do this quickly and effectively via Cloudflare?

Best regards,
Chris
All sites are different and have different needs/bits spidering them. So no, I don’t have a list of bots that necessarily should be blocked. You need to figure out what you want to block and why for your specific site.
 
cfstats.webp

How can I see which Bots/IPs are accessing from Singapore and what is the easiest way to block them completely?
I suspect "Bytedance/Bytespyder".
 
Hello digitalpoint. I appreciate you and your work. But...if you don't really want to help, then it's best not to answer at all.
 

this addon was very useful in identify loads of spiders that i ended up blocking. bytespider and variants do not seem to respect robots file but at least they have proper user agent so blocking them is kind of alright. a few require blocking based on their ip range. it's a pretty big mess tbh. pretty hard to keep an eye on raw logs and block bots that do not use proper user agent. this week i noticed someone pretending to be googlebot accessing server from aws network.
 
Hello digitalpoint. I appreciate you and your work. But...if you don't really want to help, then it's best not to answer at all.
I don't have access to your web logs, so I can't help beyond telling you what you should be doing. But fair enough, I'll keep out of this thread. 🤷🏻‍♂️
 
I don't have access to your web logs, so I can't help beyond telling you what you should be doing. But fair enough, I'll keep out of this thread. 🤷🏻‍♂️
You know Cloudflare very well. It would therefore make sense for you to simply tell me briefly how I should proceed, for example, to block Bytespider using my CF account. That would have helped me a lot.
 
You know Cloudflare very well. It would therefore make sense for you to simply tell me briefly how I should proceed, for example, to block Bytespider using my CF account. That would have helped me a lot.
That wasn't your question... your question was, "How can I see which Bots/IPs are accessing from Singapore".

...and looking at your web logs is the correct answer.
 
That wasn't your question... your question was, "How can I see which Bots/IPs are accessing from Singapore".

...and looking at your web logs is the correct answer.

Actually, I already asked you that in #10. You didn't even address it.
But let's leave it at that. I will read up on the topic myself and, if necessary, ask other competent and helpful people for advice.
I wish you a nice evening.
 
For blocking ahrefs, I just applied tonight, I'll see how it does. You can keep adding more as needed but the 2 biggest offenders imho are ahrefs & semrush.

Select your domain, then security, then waf. Add a custom rule as shown below. I suppose for some like the guy above, cloudflare can be a bit tricky sometimes so for such people, there you go.
 

Attachments

  • block-seo-crawlers-using-cloudflare-868x1024.webp
    block-seo-crawlers-using-cloudflare-868x1024.webp
    31.4 KB · Views: 16
Last edited:
Top Bottom