XF 2.3 3k+ 'Blocked due to other 4xx issue' in Google Search Console

Anatoliy

Well-known member
They appeared after I upgraded to 2.3.2 about a month ago.
All of them are these types:
domain/misc/style-variation?reset=1&t=1724763460%2Cad52e9ce6de9c6b628dd7feaac79357a
or
domain/misc/style-variation?variation=default&t=1724762134%2C9ac39b17ff1a324c9d0afeefa19a8472
or
domain/misc/style-variation?variation=alternate&t=1724762133%2Cea0e99e67f692418bd180af747492fed

Where are those links? Pages are not indexed, so Google doesn't show referring pages.
Why those links are there if they lead to closed doors?
How to fix that?
 
I'm not too bothered about guests being able to change variations so I have removed the variation switch from page container altogether and put into user account (account_visitor_menu template) which I think is a better place anyway.
XF team could just add an user group option 'can use style-variations'. 🤷‍♂️
 
To fix that pages we don't want indexed appear in a list of pages that are blocked from being indexed? Unlikely.
How about fixing that pages you don't want indexed, e.g. by adding noindex metatag to template as advised by xenForo (Mike), by easily removing from sitemap.

 
These pages aren't in the sitemap, so I'm not sure what the relevance is. Pages which have noindex also appear in the same list, because it's a list of pages which aren't indexed. My entire point is that this list is informational and you should not treat every entry in it as a problem to be solved.
 
It can take a while for changes to be reflected in Search Console. It might also be worth using a test tool to ensure the pages are blocked as you expect. And if you're using Cloudflare, be sure the robots.txt page has not been cached with a long TTL as it may prevent updates from getting picked up.
 
It can take a while for changes to be reflected in Search Console. It might also be worth using a test tool to ensure the pages are blocked as you expect. And if you're using Cloudflare, be sure the robots.txt page has not been cached with a long TTL as it may prevent updates from getting picked up.
Yeah, I did all that. And I understand that it takes time for google revisit pages it already marked as trash.

Jeremy, look, your point - don't worry about those 10k 'styele-variation*' urls that google discovered and marked as 4xx. They are not real pages, as threads, so who cares...

The real situation is this - google moved 4k of my thread pages from 'indexed' to 'visited but not indexed'. And I agree with them - pages that has links that returns 4xx should not be suggested to people.

I understand, that **** happens, and when you implement new features results can bring something that was not expected.

But I guess we should try to fix the damage, if it's discovered. Not just behave like nothing happened and everything is ok.

That's my personal subjective opinion. )
 
I provided the solution in my very first reply, and noted that we already ask Google not to crawl or index these pages out of the box. I'm not sure what more you'd like. I'm also pretty skeptical of the claim that Google deindexed any thread pages because of this. This very site has 136k entries under 'Blocked due to other 4xx issue', but we haven't observed any corresponding impact to thread index rates or overall search performance.
 
I agree that you provided the solution. Plus you personally helped me many times in other threads, when you could not even participate there.
And I appreciate that.

And you are also right that I can't prove that the damage was made by upgrade. Yeah, graphs show clear that everything was going up, and then went down. But who knows why.

I'm not demanding anything. I'm asking for help. I'm not a programmer. I used to write simple php scripts 20 years ago.

I removed style-variation switch for guests so google will not discover new 4xx. I disallowed /misc/style-variation* in robots.txt and waiting (hoping) that it will help with those that already discovered by google. And I could say "everything I could I did".

But I think I could do more. Without technical details (because I don't yet know exactly how to do that) - would it be better if google when revisit each of those 4xx links would get 410 and remove them from database (referring pages don't have those links anymore for guests)? I think that would be way better than disallowing to revisit those links. (and we know that 'disallow' doesn't stop google from crawling).

So I'm thinking about how to do that. I'm thinking of an add-on with a code event listener something like

PHP:
<?php

namespace AV\Gone;

use XF;
use XF\Http\Response;

class Listener
{
    public static function app_complete(XF\App $app, Response $response)
    {
        if ($response->httpCode() == 400 && $user_is_not_logged_in && $misc_style_variation_is_in_uri)
        {
            $response->redirect('/'), 410);
        }
    }
}
 
But I think I could do more. Without technical details (because I don't yet know exactly how to do that) - would it be better if google when revisit each of those 4xx links would get 410 and remove them from database (referring pages don't have those links anymore for guests)? I think that would be way better than disallowing to revisit those links. (and we know that 'disallow' doesn't stop google from crawling).
You can create a class extension for \XF\Pub\Controller\MiscController to replace the 400 error code with a 410 error code:

PHP:
<?php

namespace AddOn\XF\Pub\Controller;

use XF\Mvc\Reply\AbstractReply;
use XF\Mvc\Reply\Error;
use XF\Mvc\Reply\Exception;

class MiscController extends XFCP_MiscController
{
    public function actionStyleVariation(): AbstractReply
    {
        try
        {
            $reply = parent::actionStyleVariation();
        }
        catch (Exception $e)
        {
            $reply = $e->getReply();
            if ($reply instanceof Error && $reply->getResponseCode() === 400)
            {
                $reply->setResponseCode(410);
            }

            throw $e;
        }

        return $reply;
    }
}

However this will not prevent Google from trying to crawl them to begin with, so it can still impact crawl budget. That's why I think robots.txt is a better solution, plus it will not only help with bad URLs that were already discovered but also prevent new bad URLs from being discovered going forward without impacting functionality and accessibility for guests.
 
Thank you!!!
I don't understand 99% of the code, but I'll try it tomorrow. )
It's 1am here.
However this will not prevent Google from trying to crawl them
I do want google to revisit them, to get 410 response and to remove from database.
but also prevent new bad URLs from being discovered
style-variation is removed for guests, so there will be no new 4xx
without impacting functionality and accessibility for guests.
I don't care about ability of a guest to switch modes.

Thank you one more time!
 
It works! Thank you, Jeremy!!!

I removed disallow for /miscs/ from robots.txt, and requested G to revalidate those 4xx.
Sitting tight with fingers crossed. )
 
Back
Top Bottom