Known Bots

Known Bots 6.0.3

No permission to download
Compatible XF 2.x versions
  1. 2.0
  2. 2.1
  3. 2.2
Additional requirements
PHP v7.0+
License
MIT
Visible branding
No
This XenForo 2.x addon adds additional definitions for bot detection in sessions.

This allows more accurate indication of the number of users vs robots online at any given time.

New bot definitions are pulled automatically via API and potential new bots detected are sent back to the addon author for analysis for inclusion in the bot definitions.

As of October 2023, the addon can detect over 1,400 different bots and the API database now contains more than 3,200 user agent strings that have been identified as bots.

Statistics from the KnownBots API server management system, custom built to track and identify bots and serve the updates API:

1697747343953.png


Requirements

This addon requires PHP 7.0.0 or higher and works on XenForo 2.x

Usage

When you look at Current Visitors, you'll see additional robots identified - also look at the "Robots" list on that page http://www.example.com/community/online/?type=robot

1569992317809.png



We also add the current robot count to the Members online widget and the Online statistics widget (from the current visitor page). This can be disabled via the options.

1574032520691.png


1574032528995.png


There is a tool in the admin area for displaying a complete list of currently known bots:

1695069691124.png



... and another tool which lets you test a user agent string to see if it would be detected as a bot or not:

1589696060644.png


... and a third tool which shows a list of the last 100 bots detected, along with any unidentified user agents waiting to be sent back for analysis.

1697747428661.png



New in Version 6

Verion 6 is a complete rewrite from previous versions - with bot detection now up to 15x faster than the previous method. Bot detection is also more sophisticated - with a secondary regex-based detection system to help identify complex bot strings that can't be matched using the simple text string matches used by default.

If no bots are detected, the addon will check the user agent against a list of valid browser regex strings - and if not considered to be a valid user-driven browser, it will store the information in the database temporarily before sending the information back to the addon author via email for further analysis.

Previously, the system only sent back user agents which matched key words: bot|crawl|spider - this new detection method is significantly more comprehensive and allows much greater accuracy for detecting new bots that may not identify themselves using the traditional keywords.

Also new in this version is the deprecation of the email sending facility - new user agents are now sent directly via API, with authentication facilitated by the XenForo license validation system and very easy to configure.

Options

Show robot statistics


1695070173593.png


Enable to show robot statistics in the sidebar widgets

Fetch new bots

1695070187734.png


Enable to automatically fetch new bot identifiers from the KnownBots API maintained by the addon author. If you disable this option, you must set up your own system to update the known bots defnitions - see the Command Line Interface section for options to assist here

Store user agents in database and purge after

1695070239128.png


Enable to store unknown and bot user agents in the database for further analysis. This should be used in conjunction with the "Send user agents via API" and "Email user agents" option to send unknown user agents back to the KnownBots management system for further analysis and identification of new bots. You may also manually send user agent information to the addon author via the addon discussion thread - use the "List detected bots" tool to show unknown user agents.

With this option enabled, you may also choose how long to retain user agent records before they are automatically purged from the database. This uses a "last seen" mechanism to maintain the list of recently seen bots - only bots not recently seen will be purged. For busy sites, consider lowering this value from the default 90 days, to reduce the size of data stored. Set days to zero to never purge user agent data (not recommended).

Send user agents via API
1697747610246.png


New in version 6, enabling this option will send unidentified user agents back to the addon author for further analysis and identification of new bots.

To configure the API, enter the License validation token for your site, found in the XenForo customer interrface. The validation token will be sent to the XenForo customer validation API and if valid, an KnownBots API token will be generated and returned back for subsequent authentication purposes.

With a validated license, the authentication process is automatic. API tokens are regenerated every 28 days and are re-authenticated automatically. Customer details are automatically purged from the KnownBots database after 30 days of inactivity (see privacy details below). Regenerating your license validation token will automatically cause API revalidation to fail and customer details to be purged - unless you re-configure the addon options with the new license validation token.

Email user agents

1697747723873.png


Note: emailing user agents to the addon author is now deprecated

The email interface will remain operational for a short period to allow time for addon users to upgrade to version 6, but will soon be deactivated at which point emails sent to the knownbots@hampel.io address will start bouncing back as undeliverable.

This option will remain available to allow users to periodically send emails to an address of their choosing for monitoring newly detected user agents. This option only has effect if the "Store user agents in database" option is also enabled. You may specify multiple email addresses separated by commas.

Logging

Install the Monolog Logging Service addon to log information about emails sent and API queries made.

Command Line Interface

There are CLI tools provided which may be helpful for certain use-cases.

Fetch Bots

Fetch new bots via the API. Use this to run your own cron - does the same task that the supplied XenForo Cron job does.

Bash:
$ php cmd.php known-bots:fetch

Use the -f option to force updates, bypassing the "last updated" flag and re-fetching the latest bot definitions.

If using this command via your own cron - be sure to disable the supplied XenForo Cron job titled "Known Bots: Fetch New Bots from API".

Load Bots

Load bots data from the knownbots.json file already on the filesystem.

Bash:
$ php cmd.php known-bots:load

The addon looks for the file internal_data/knownbots.json - if you download it manually and place the updated version in this location, you may then execute the above command to load the data without calling the API.

Reprocess User Agents

Cycle through user agents stored in the database and update the definitions based on the latest bot detection data.

Bash:
$ php cmd.php known-bots:reprocess

By default, only "unknown" user agents are reprocessed. Use the -a option to reprocess all known and unknown user agents - useful if a bot was misidentified but has been updated in the latest API data.

Test Bots

Test the supplied user agent string and identify whether it is detected as a bot.

Bash:
$ php cmd.php known-bots:test {user-agent}

Don't forget to quote user agent strings that contain spaces. For example:

Bash:
$ php cmd.php known-bots:test "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)"
Found robot: [ahrefs]
Title: AhrefsBot

Use the -s option to save your user agent strings to the database for further processing.

Check API Token

(New in v6) Test if the KnownBots API Token is valid.
Bash:
$ php cmd.php known-bots:check-token
Use the -r or --revalidate option to automatically attempt to revalidate and generate a new API token if athentication fails.

Send Agents

(New in v6) Send newly detected user agents to the KnownBots API. Does the same thing the cron job does - but can be run externally via a system cron or manually for testing purposes.

Bash:
[/SIZE]$ php cmd.php known-bots:send[SIZE=4]

Parse Log File

Reads web server log file information and lists all detected bots.

Bash:
[/SIZE][/SIZE]$ php cmd.php known-bots:parse {file}[SIZE=4][SIZE=4]
Use the -a or --agents option to simply list all user agents found in the supplied log file - no bot detection occurs

Command will also accept data from stdin if you specify a hyphen as the file name.

Note that this tool assumes you are using standard "Apache" log file format, which is used by most web servers.

Usage examples:

List all user agents. Invalid (or no) data will be returned if you are not using standard Apache log file format.

Bash:
[/SIZE][/SIZE]$ php cmd.php known-bots:parse --agents /var/log/nginx/xenforo/xenforo.access.log
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36
Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1
Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Mobile Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36 Edg/94.0.992.37
...[SIZE=4][SIZE=4]
List bots:
Bash:
[/SIZE][/SIZE]$ php cmd.php known-bots:parse /var/log/nginx/xenforo/xenforo.access.log
[postman                  ] PostmanRuntime/7.28.4
[guzzlehttp               ] GuzzleHttp/7
[postman                  ] PostmanRuntime/7.29.0
[curl                     ] curl/7.68.0
...[SIZE=4][SIZE=4]
Examples of reading data from stdin:
Bash:
[/SIZE][/SIZE]$ cat /var/log/nginx/xenforo/xenforo.access.log | php cmd.php known-bots:parse -
...

$ php cmd.php known-bots:parse - < /var/log/nginx/xenforo/xenforo.access.log
...

$ tail -f /var/log/nginx/xenforo/xenforo.access.log | php cmd.php known-bots:parse -[SIZE=4][SIZE=4]
Privacy Statement

Privacy Summary

Depending on the options enabled, there is some information collected about sites using this addon by the addon author - but it is only ever used for anonymous analytics and for troubleshooting purposes and never disclosed to third parties, nor is it ever used for marketing or any purposes other than for the operation of this addon.

Options are provided to disable or bypass certain functionality if you remain uncomfortable about using the systems as designed.

Details

With the "Fetch new bots" option enabled, the addon will automatically, send a request to the KnownBots API, downloading an updated list of bot definitions. You may query that API directly at any time to see what is contained in the data returned.

API calls by the addon are made using the standard "untrusted" HTTP client built into XenForo, which means they are forwarded through a proxy server if you have one configured. Standard web server log files on the API server will contain information about requests made, including the IP address of your server (or proxy if used), and your forum name as supplied in the user agent of the XenForo HTTP client. For example:

Code:
2400:8907:e001:xx::xxx - - [15/Aug/2023:00:00:07 +0000] "GET /api/bots?since=1691729327 HTTP/1.1" 200 119647 "-" "XenForo/2.x (https://www.example.com)"

HTTP server log information is used solely for analytics and troubleshooting purposes and is never made available to third parties.

Enabling the "Send user agents via API" setting requires the submission of a XenForo license validation token, retrieved from your customer account on the XenForo website. This token is intended to be supplied to 3rd party addon authors for the purposes of validating XenForo licenses. In this case, it is being utilised as a simple authentication mechanism. You may regenerate your license validation token at any time via the XenForo customer interface, which will automatically cause authentication against the KnownBots API to fail once the existing API token expires.

API tokens expire every 28 days and are automatically regenerated, provided the XenForo license validation token remains valid. Customer details retrieved from the XenForo license validation API are automatically purged from the database after 30 days of inactivity.

When validating licenses, only the license validation token and forum URL are sent to the KnownBots API. Once validation is complete, only the generated API token and a list of new user agents are sent to the server. Web sever logs similar to those described above are generated for all API calls.

As of version 6, no information is sent to the addon author via email.
Related resources
Monolog Logging Service - optional integration for logging info about detected bot emails being sent
Author
Sim
Downloads
665
Views
14,743
First release
Last update

Ratings

4.44 star(s) 9 ratings

More resources from Sim

Latest updates

  1. 6.0.3 bugfix - handle malformed UTF-8 in user agent strings

    This update contains an important bugfix to handle malformed UTF-8 in user agent strings. This...
  2. 6.0.2 minor bug fix

    Minor bug fix - no need to update unless you are experiencing problems sending bot updates...
  3. v6 major rewrite

    v6 is another major rewrite of the core functionality of the addon, aimed at improving the...

Latest reviews

Nice and easy to install (v5.x) simple to use and identifies tons of bots that you can then control.
Adds the 'robots online' to the side widget, but not anywhere on the Members page. Pretty useless really.
Sim
Sim
The whole point of the addon is to identify a lot more Robots than the core product identifies - and these are shown on the Members page in the "Everyone" view and in the "Robots" view.
Really nice addon that removes "fake traffic" from showing up in the members online stats. Also well-made in the sense that it also finds new bots and can automatically report them back to the dev :) :+1:
This add-on will help me to know which bot has just visited my forum, its always interesting to know which bot is visiting.
This is an extremely useful addon for sorting out the bots you want crawling your site versus those that are just nuisance bots taking up bandwidth and giving nothing useful back. It's especially useful as a starting point if your server is being overrun by Chinese and Russian bot to the point where it is intermittently bringing down the server in a DDoS-like scenario (which happened with one forum I was brought in to repair).
A useful extension to XenForo's bot classification system to handle more bots that are seen in the wild
Great effort and I'm sure it's a well made addon, but the bots visiting my website are still coming up as guests after correct addon installation. It would be good if there was a way to easily add entries. For example, (ecs-114-119-162-196.compute.hwclouds-dns.com) is not showing up as a bot, but as a guest, so I assume the addon doesn't have this listed. I have 60 "guests" and 95% of them are bots.
Sim
Sim
This addon cannot and will not identify bots by IP address or hostname - it only uses the user agent. If you have identified bots which are masking their user agent, you should block them at the web server level.
A very simple plugin that does what it says on the tin. I recommend if you love having statistics, like me!
I think this is a very underrated addon. Every board has a lot of bots lurking in the background and this useful addon helps you to separate them with your normal visitors. Because it then shows the real amount of guests you have right now and not the fake number which comes from the bots. Thank you.
Top Bottom