404 when user (not logged in) klicks on a tag

smallwheels

Well-known member
Affected version
2.2.17
XF 2.2.17

I went through the weblog of my server on the hunt for 404s and found a lot of entries caused by the bing indexing bot. Among them was this one:

msnbot-40-77-167-149.search.msn.com - - [08/Apr/2025:15:15:17 +0200] "GET /tags/start/ HTTP/1.1" 404 9472 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 315 10202

The URL "/tags/start/" exists (there is a tag "start") and I can open it w/o issue. It seems only accessble when you are logged in (which bingbot is not). If I call the URL when not logged in I do indeed get a 404 - a bit strange, I would have expected a 401 here.

I am a little bit in sorrow that this way bing is collecting hundreds of 404s on my webpage from tags alone and this might negatively influence the SEO ranking. While technically those should not be 404s, as the URL exists, just the unlogged user does not have enough privileges to access it.
 
A 401 would give you a HTTP Authorzation login dialog (".htaccess login"), you surely would not want to have that.

It is unknown / difficult to compute (maybe even impossible) if there is (currently) at least one user account that can view at least one content with that tag.

Examples
There is only one thread that has this tag, this thread is in a forum that currently can't be accessed by anyone (not even an admin account).
Would it make sense to show a login screen for the tag URL or would that be rather bad UX?

Only one content has this tag and the canView() method only allows this content to be accessed on christmas eve.
Would it make sense to show a login screen for the tag URL or would that be rather bad UX?

So for all practical reasons I'd say 404 is the way to go - there is no such content (for a guest).
 
It is unknown / difficult to compute (maybe even impossible) if there is (currently) at least one user account that can view at least one content with that tag.
Thanks! Sounds like an explanation. However - when I call the RM in my forum while not logged in I end up with the (expected) dialog inc. the notice "you have to be logged in to see that":

Bildschirmfoto 2025-05-24 um 08.33.07.webp

The web.log shows a 403:

Code:
"GET /resources HTTP/1.1" 403

I would have expected the same behavior with tags. Instead, when not logged in, I get a 404:

Bildschirmfoto 2025-05-24 um 08.32.34.webp

correspondingly in the web.log:

Code:
"GET /tags/start/ HTTP/1.1" 404

I would have expected the exact same behaviour as with RM. As I said: Not the end of the world and not a drama, just - in my eyes - an inconsistency that has or has not side effects. As this behaviour not only affects search engines but also visitors that are not logged in or not members yet a login dialog may foster registrations whereas a 404 creates irritation and the impression of a bady maintained broken site.

I turned out, that the tag "start" has only been used once: for a picture in MG that users that are not logged in cannot see. If I use a more common tag that has been used for various threads even a user who is not logged in sees results, however, those for pictures (that a user who is logged in sees additonally in the results list) are vanished for someone who is not logged in.

So indeed you are right about the 401, a 403 seems more adequate - a 404 still not very desirable in my eyes. Given that a user who is not logged in can see tagged threads but not i.e. tagged media items from MG the 404 here seems a bit of an edge case (the only result tagged is a media item and a user who is not logged in cannot access MG/items from there at all and thus gets a 404) and plausible.

Thanks for setting me on the correct route to understand the issue better! :)

Possibly the best way to enhance the user experience here would be to display a notice "you may see more results when you are logged in" in case a user who is not logged in searches for a tag or klicks on one, no matter if he gets results or not instead of a 404. However, this seems a bit of hitting the high notes and not really necessary (though no doubt nice).

So I think this is indeed not a bug but a lack of understanding of mine due to insufficient research in the beginning.
 
Last edited:
Back
Top Bottom