Known Bots

Known Bots 6.1.0

No permission to download
@Sim : A question and two possible ideas for feature requests/additon:

Question: What exactly makes Bots appear in the list of recently visiting bots? Why I'm asking: I do gro thorough the list of recently visiting bots regularly and adjust my robots.txt based on it. While it works for some it does not for others - despite some of them claim to respect robots.txt. Some that had been contiously visiting were absent for quite a while after adding them to robots.txt - but showed up again months later. So I am wondering what a bot has to do to show up in the list.

I.e. in my list of recently visiting bots GPTBot showed up again after having been absent for literally months. It has beed blocked by my robots.txt file for quite a while already. I checked the logs of my web server and it seems the only URL it visited within the last two weeks has been robots.txt (a somewhat annoying 120 times over the last two weeks). Maybe I did something wrong while checking but if it only checks the robots.txt and nothing else that's ok for me.


Possible feature requests:

1. Traffic-light-alike tagging for bots


If you go through the list regularly and check the bots that visited your forum there are a bunch of bots that you checked before and already considered either ok, tolerated or unwanted. However - I cannot remember the name of each bot I checked in the past (and even less if I arealdy put them into my robots.txt) , therefore ending up checking some of them again. If it was possible to add a green/yellow/red flagging to the bots that visited this would come in handy and speed up the process of identifying unwanted bots. One could i.e. use the red for bots that have already been blocked via entries in the robots.txt ant this way - if they visit again - check further action.

2. Connecting the entries in the list with the IP-addresses the bots used when visiting the forum

As the user agent is not a reliable thing (as it is self-provided by the bots) and some bots do not respect the robots.txt anyway it would be helpful to have an easy way to see the IP-addresses a certain bot has been using. For one to have the possibility to check wether he is what he claims (some companies declare the IP-ranges their bots come from) but even more having a quick way to block out badly behaving bots via .htaccess or other mechanisms.

What do you think?
 
Last edited:
@Sim : A question and two possible ideas for feature requests/additon:

Question: What exactly makes Bots appear in the list of recently visiting bots? Why I'm asking: I do gro thorough the list of recently visiting bots regularly and adjust my robots.txt based on it. While it works for some it does not for others - despite some of them claim to respect robots.txt. Some that had been contiously visiting were absent for quite a while after adding them to robots.txt - but showed up again months later. So I am wondering what a bot has to do to show up in the list.

Each time we identify a bot has visited, we update the last_updated date for that bot stored in the database.

The recently visiting bots list is simply listing those bots by date visited.

All they have to do to be identified by the forum as a bot is to visit any forum page. Every session created by XenForo parses the user agent to match against the known bots list.

A lot of malicious bots ignore robots.txt; any bot poorly written will ignore robots.txt; any bot which does simple things like check that the domain resolves to a URL, or extracting oembed information for embedding URLs (like the forum does when you paste a link from an external website), and so on - will likely ignore robots.txt because they aren't crawling the entire site - just responding to user actions.

I.e. in my list of recently visiting bots GPTBot showed up again after having been absent for literally months. It has beed blocked by my robots.txt file for quite a while already. I checked the logs of my web server and it seems the only URL it visited within the last two weeks has been robots.txt (a somewhat annoying 120 times over the last two weeks). Maybe I did something wrong while checking but if it only checks the robots.txt and nothing else that's ok for me.

This is operating as designed then - well behaved bots will always check robots.txt first before crawling - even if it has previously been blocked, it will check robots.txt again to see if that has changed.

Given it's a static text file and not particularly large - it won't matter how frequently a bot checks robots.txt - it won't have a significant impact on your site.
 
This is operating as designed then - well behaved bots will always check robots.txt first before crawling - even if it has previously been blocked, it will check robots.txt again to see if that has changed.

Given it's a static text file and not particularly large - it won't matter how frequently a bot checks robots.txt - it won't have a significant impact on your site.
Absolutely - but I am wondering why it shows up in "known bots" then as it seems to only have visited the robots.txt
 
Absolutely - but I am wondering why it shows up in "known bots" then as it seems to only have visited the robots.txt
The "Recently seen bots" simply shows the last 100 most recently detected - there is no date cutoff, so a bot on that list may not have been seen for quite some time, especially if you're not automatically sending bot updates back to my server.

But also it may have hit a page on your site in addition to accessing robots.txt

It would depend on the way the bot is built and what it is intended to do.
 
Hi, Sim

Tell me please. I get the same error from time to time with your plugin. Can you tell me what he doesn't like?

Code:
Hampel\KnownBots\Exception\CustomerException: Customer error fetching bots: 403 Forbidden src/addons/Hampel/KnownBots/Api/KnownBots.php:94
Generated by: Unknown account Feb 11, 2025 at 9:54 PM

Stack trace:​

Code:
#0 src/addons/Hampel/KnownBots/SubContainer/Api.php(68): Hampel\KnownBots\Api\KnownBots->fetch(1732910518, false)
#1 src/addons/Hampel/KnownBots/Cron/FetchBots.php(23): Hampel\KnownBots\SubContainer\Api->fetchBots()
#2 src/XF/Job/Cron.php(37): Hampel\KnownBots\Cron\FetchBots::fetchBots(Object(XF\Entity\CronEntry))
#3 src/XF/Job/Manager.php(260): XF\Job\Cron->run(8)
#4 src/XF/Job/Manager.php(202): XF\Job\Manager->runJobInternal(Array, 8)
#5 src/XF/Job/Manager.php(86): XF\Job\Manager->runJobEntry(Array, 8)
#6 job.php(43): XF\Job\Manager->runQueue(false, 8)
#7 {main}

Request state​

Code:
array(4) {
  ["url"] => string(8) "/job.php"
  ["referrer"] => string(43) "https://******/mods/565/updates?page=2"
  ["_GET"] => array(0) {
  }
  ["_POST"] => array(0) {
  }
}
  • XF 2.2.12
  • Known Bots 6.1.0
 
For me, It looks like you have a problem with the permissions.

Error: 403 Forbidden

Code 403 indicates that the requested URL does exist, but the client's request was not executed.
In short: the client calling the URL is not authorized to access it.
 
For me, It looks like you have a problem with the permissions.

Error: 403 Forbidden

Code 403 indicates that the requested URL does exist, but the client's request was not executed.
In short: the client calling the URL is not authorized to access it.
Can my settings on robots.txt affect this? The error doesn't bother me, I just want to understand if I should be worried?
 
Can my settings on robots.txt affect this? The error doesn't bother me, I just want to understand if I should be worried?

No - this is nothing to do with your web server - the error is coming from my end, but something at your end is triggering it. I just haven't yet tracked down what that is - I need to hunt through the log files to find the error details.
 
No - this is nothing to do with your web server - the error is coming from my end, but something at your end is triggering it. I just haven't yet tracked down what that is - I need to hunt through the log files to find the error details.
I disabled cron - Known Bots: Fetch New Bots from API and the problem went away. So, you were right.
1739478248460.webp
Will disabling cron have any effect on the plugin?
 
Can you let me know exact times you've seen the error in your Server Error logs? It will help me track down the issue.
I manually enabled the plugin's cron and ran it and immediately caught the error, I attach the screenshot below:
bot-error.webp
 
  • Like
Reactions: Sim
Can you let me know the name and URL of your forum? You can DM me if you like.
Yes, it's not a problem.

The question is closed. The reason is due to the blocking of all traffic from Russia. Sad, but nothing can be done. Who will encounter the same, just disable cron, the plugin will work, but will not get access to update the base.
Sim, thank you for checking.:)
 
Last edited:
Yes, it's not a problem.

The question is closed. The reason is due to the blocking of all traffic from Russia. Sad, but nothing can be done. Who will encounter the same, just disable cron, the plugin will work, but will not get access to update the base.
Sim, thank you for checking.:)

Yes, I block all traffic from Russia and a few other countries at the Cloudflare level across all of my websites (not just my forums) - mainly because of malicious bot traffic and hacking attempts I've detected on my sites.
 
Back
Top Bottom