MagpieRSS robot/spider is pulling 7GB a month?


If I check AWstats I see MagpieRSS is generating 7 GB traffic a month!
Google only has 500 mb/month and bing 1 GB/month.

I can't find what they are requesting in the logs. It seems like they are not using the magpie-crawler user agent. Not sure how AWstats can detect it then.
Also, I don't see the magpie crawler on the forum robots page.
Anyway, I think it's using way too much traffic.
Looking for ways to investigate this more before I block anything.

Does anyone see the same happening?
Any tips to investigate this?

Thanks :)


Based on the name (MagpieRSS), could it be RSS feeds which are being accessed?


Ok, I found it. It was in another log file from a domain that 301 redirects to my domain.

Code: - - [05/Dec/2016:18:13:19 +0100] "GET /forum-2606-Bucharest.html HTTP/1.1" 301 178 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +"
So it's and I can see that robot on the robot's page.
I never signed up for brandwatch, maybe it's from the old owner.

I think I will block this crawler.

User-agent: magpie-crawler
Disallow: /
There are loads of services that analyze forums to see how brands are performing. They suck up a lot of bandwidth unless you block them.
On vbulletin we use to block such bots. Look into @DragonByte Tech security. They are looking into such function to stop malicious bots & scrapers.
hmmm this is getting interesting...
No, it's not honouring that. So when I tried to block the IP it said

deny failed: is in the allow file /etc/csf/csf.allow is a IP from cloudflare, and they are allowed to connect to my server.
I think about contacting cloudflare about this.

Dragonbyte Security is on the list, I just need to know if it's going to be upgraded to XF2 and if I need to pay extra for that or buy a new add-on.
Dragonbyte could not confirm that earlier. So I wait.
Ideally would be as long as I have an active licence I get the update to XF2 without paying extra for XF2 version.


I didn't get an email back from cloudflare so far but it looks like it stopped. Not sure if that's because I blocked it in robots.txt or if cloudflare did something.