Not a bug Not all bots are being detected

ivp

Active member
Affected version
2.3.3
XF\Data\Robot::getRobotUserAgents() detects 35 bots only.

This function should be replaced with CrawlerDetect or similar library, which detects 1,000's of bots/spiders/crawlers:

Here is the example for one thread:
  • Internal XenForo counter reports 519 views.
  • In server logs seeing 903 requests. After manually removing bots from the list, there are 355 views.
  • Google Analytics reports 167 views.
 
Last edited:
We've touched on this before, but our bot detection is not intended to be comprehensive. The add-on above is a more robust implementation.
 
We’ve introduced controls to disable view counts as a factor as well as the ability to disable guest view count contributions.
 
View count is one of the crucial factors for trending content and should not be disabled.

Furthermore, incorrect view count values are displayed in the forum view.

It's puzzling why you don't want to address this, given that it's an integral part of the core.

Detecting more bots can be efficiently managed using the well-maintained library I mentioned or any other suitable alternative.
 
User agent sniffing alone can only identify some bots, whereas comprehensive identification is an entire industry unto itself (Cloudflare, etc.). Even the add-on above is part of a service with continually updated heuristics. The 35 bots we detect are among the most popular we've measured, but even here we see a lot of traffic from headless browsers and server farms which initially skewed our trending content statistics. We disabled view counting for guests and have found the trending system to still be reasonably accurate and useful.
 
I understand, but that's completely different topic.

The point is that there's no justification for detecting just 35 bots when there's an open-source library capable of identifying thousands.
 
My point is that it's not likely to address the issue sufficiently enough to make guest view counts a reliable metric for the trending system, and so the value of investing the time to integrate a 3rd party library or service to identify eg. RuxitSynthetic as a bot seems questionable.

You're still welcome to post a suggestion, but this won't be actioned as a bug.
 
Back
Top Bottom