Custom 404 Page by Siropu 1.2.0

smallwheels · Apr 8, 2025

Siropu said:
It could be a bot/proxy that doesn't send any IP information.

smallwheels said:
How could that be? I'd have assumed that it is unavoidable to send the source IP given how TCP/IP works and apart from that: The IPs are logged in the server's web.log when you grep it with the URL that is noted in the log of the 404-add on. So they are there.

Siropu said:
Bots can manipulate the headers used to get the user IP so they are not reliable.

Siropu said:
I'm using XF's getIp() method with $allowProxied set to true to get the IP so unless convertIpStringToBinary fails, which is used to get the value stored for the IP, what you see is what you get in the 404 logs.

Had a further look into it and apart from malicious bots also the bing bot it affected. I.e I had this call in the log that ended with a 404:

Bildschirmfoto 2025-04-08 um 17.58.05.webp

upper one:
msnbot-40-77-167-149.search.msn.com - - [08/Apr/2025:15:15:17 +0200] "GET /tags/start/ HTTP/1.1" 404 9472 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 315 10202

The URL "/tags/start/" exists (there is a tag "start"). It seems only accessble when you are logged in (which bingbot is not). If I call the URL when not logged in I do indeed get a 404 - a bit strange, I would have expected a 401 here. Clearly not your fault - rather a bug in XF that may impact SEO ranking to the negative (as bing will this way get hundreds of 404s on my forum)

second one:
msnbot-40-77-167-76.search.msn.com - - [08/Apr/2025:14:26:41 +0200] "GET /threads/Threadurl.1338/Picture-Name.jpg HTTP/1.1" 404 9505 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 796 14614

There are hundreds of those. The requested picture does exist (it is embedded in the thread) but as full size pictures are only accessible to logged in users the bot won't get the ful resolution pic. It would however have a different URL than the bot requested anyway - no idea where he got his from.

Anyway: This makes the log pretty useless as it is spoiled with 100s of entries caused by bing that can only manually be identified by grepping through the web.log.

Siropu · Apr 9, 2025

smallwheels said:
This makes the log pretty useless as it is spoiled with 100s of entries caused by bing that can only manually be identified by grepping through the web.log.

Do you have any suggestions on how to improve it? Maybe identify legit bots and do not log those?

smallwheels · Apr 9, 2025

Siropu said:
Do you have any suggestions on how to improve it? Maybe identify legit bots and do not log those?

Could be an idea. A low hanging fruit would be to add the user agent to the log, possibly abbreviated, and in the next step to give the opportunity to filter on or a number of them out.

Reason is that potentially one wants to see if indexing bots get a 404 as this might be important for SEO ranking and maybe some of those are fixable.

In the perfect world one would have kind of clusters like bot/no known bot and within the bots cluster a classification into groups like wanted, unwanted/ignorable or eventually something like "Search engine indexing bot" or alike. If one could create or name such clusters indivually and add useragents to them manually this would be even better.

Could be complicated but on the other hand maybe be an interesting idea to somehow interact with "Known Bots" by @Sim

I think the most important thing is to know on the spot what source a request that results in a 404 has to be able to identify if and how one wants to deal with that 404. Everything on top of that is a comfort function that makes life easier and the tool more useful.

Sim · Apr 9, 2025

smallwheels said:
Could be complicated but on the other hand maybe be an interesting idea to somehow interact with "Known Bots" by @Sim

The good news is that my KnownBots addon simply extends the core functionality which already flags user sessions when it detects they are a bot - so if you want to indicate that somehow, the information is already there in the user session.

Garfield™ · Apr 22, 2025

https://webmasterforum.net/konular/en-guclu-oyun-karakterleri.179163/

This URL appears as 404 in the records, but this URL is not listed on Google? How does this happen? Does this plugin also count bots? I don't understand?

smallwheels · Apr 23, 2025

Garfield™ said:
This URL appears as 404 in the records, but this URL is not listed on Google? How does this happen? Does this plugin also count bots? I don't understand?

It counts 404 - page not found answers by XF. So it clearly counts in bots as well. You can easily crosscheck with your webserver log and will pretty safely find the call there (including the IP and useragent it came from).

MattW · Jun 10, 2025

Is there any reason why this table would by MyISAM?

I've also had to repair this table on a few of my customers servers as it was marked as crashed.

Code:

MariaDB [XXXXXX]> SHOW TABLE STATUS LIKE 'xf_siropu_custom_404_page_not_found'\G
*************************** 1. row ***************************
            Name: xf_siropu_custom_404_page_not_found
          Engine: MyISAM
         Version: 10
      Row_format: Dynamic
            Rows: 362210
  Avg_row_length: 230
     Data_length: 83663396
 Max_data_length: 281474976710655
    Index_length: 65752064
       Data_free: 0
  Auto_increment: 362211
     Create_time: 2023-12-08 16:42:05
     Update_time: 2025-06-10 15:58:19
      Check_time: 2025-06-09 10:26:52
       Collation: utf8mb4_general_ci
        Checksum: NULL
  Create_options:
         Comment:
Max_index_length: 288230376151710720
       Temporary: N
1 row in set (0.003 sec)

Siropu · Jun 11, 2025

MattW said:
Is there any reason why this table would by MyISAM?

You can convert it to innodb.

Qdeathstar · Aug 15, 2025

So does this only work with links? I am cloud hosted and have some issues with people having booked mark the old site and are getting 404 (white pages) when they visit the site. I tried installing this but it didn’t seem to work.

Siropu · Aug 25, 2025

Qdeathstar said:
I am cloud hosted and have some issues with people having booked mark the old site and are getting 404 (white pages) when they visit the site.

It works with 404 pages generated in XF.

association · Sep 11, 2025

@Siropu Although it is stated that the extension works with 2.2, when I tried to install it, a message appeared stating that it could not be installed because the version should be 2.3.

Siropu · Sep 11, 2025

association said:
Although it is stated that the extension works with 2.2, when I tried to install it, a message appeared stating that it could not be installed because the version should be 2.3.

You need an older version: https://xenforo.com/community/resources/custom-404-page-by-siropu.7188/history

Alpha1 · Sep 16, 2025

We can currently filter on URL and we can filter on 'without redirect', but we cannot combine these filters. Please add functionality to combine these.

It would also be very useful to search for part of the URL. For example, I have a large number of 404 entries because the code {valueRaw} does not work in XF2. I changed the code and resolved these entries. Now I need to find a way to find and delete them.

I would also like to be able to bulk delete entries. We added a redirect of /ams/comment/* to ams/comments/*
But 8k entries for /ams/comment/ stay visible in the list and pollute it. This makes it really hard to see what we still need to address. There are multiple such examples causing tens of thousands of entries to be listed that are already addressed.

USCSS_Nostromo · Friday at 7:14 PM

Hi

How to disable log /admin.php?404-not-found/
and have only 404 page?

Thank You

Custom 404 Page by Siropu 1.2.0

smallwheels

Well-known member

Siropu

Well-known member

smallwheels

Well-known member

Sim

Well-known member

Garfield™

Active member

Attachments

smallwheels

Well-known member

MattW

Well-known member

Siropu

Well-known member

Qdeathstar

New member

Siropu

Well-known member

association

Active member

Siropu

Well-known member

Alpha1

Well-known member

USCSS_Nostromo

Active member

Similar threads

We value your privacy