XF 2.2 Error Sitemap and not splitted

mgeuss

New member
Any updates on this? I have the same issue here. After migrating from vBulletin to XenForo 2.2 last week, I submitted supernature-forum.de/sitepmap.php to the Google Webmaster tools.
Google says it succesfully read the sitemap index and found 0 URLs.
When I look into the details, I see the same message as above - www.supernature-forum.de/sitemap-1.xml could to be retrieved, same for -2 and -3.
 

Souss

Member
Any updates on this? I have the same issue here. After migrating from vBulletin to XenForo 2.2 last week, I submitted supernature-forum.de/sitepmap.php to the Google Webmaster tools.
Google says it succesfully read the sitemap index and found 0 URLs.
When I look into the details, I see the same message as above - www.supernature-forum.de/sitemap-1.xml could to be retrieved, same for -2 and -3.
It was also after a migration from vBulletin to xenforo
 

mgeuss

New member
This is interesting, but I find it hard to imagine that it has anything to do with the import.
Beside that, your sitemap.php doesn't work at all?
I tried to submit the sitemap.xml but that didn't make a difference.
 

mgeuss

New member
@Souss I see your sitemap.php is working now, but seems to contain all urls instead of referring to the sitemap-1.xml (and so on).
Is it working nox, does Google read it? And how did you do that?
 

Souss

Member
No, same problem, even I changed my hosting provider, Google webmaster tools didn't accept my sitemap.

I don't why there's somes secondes to load the sitemap and there's no contener of that.

Juste One big file of 2.5MB

When I
 
Last edited:

Souss

Member
Capture d’écran 2020-11-19 à 11.42.23.png
Impossible to read Sitemap

An error occurred while trying to access your sitemap. Please make sure it follows our guidelines and is accessible at the location you provided, then resend.

HTTP Error
 

mgeuss

New member
I don't see an error 404 here, Google just says that it couldn't retrieve the sitemap file.
Because we both come from vBulletin, I thought it might be an issue with the AddOn "XenForo Redirects for vBulletin", but disabling it doesn't make a difference.
Everything looks ok to me - the sitemap.php is correct, the sitemap files are are in the data directory and I can open them in the browser, but somehow Google can't access them. I have no special htaccess rules or anything else, everythin is pretty standard so I have no idea...
 

Mike

XenForo developer
Staff member
Just to reiterate something we mentioned in the ticket: a 2.5MB sitemap is not an issue. Google's limit is 50MB, so this is nowhere near the limit. (That's what you suggested was the issue initially.) The sitemap is only split if needed.

If Google is reporting a 404 error but that URL works for you, then there's something else causing the issue and this is very likely to be outside of XenForo itself. You may need to investigate the raw web server logs to confirm that Google is requesting the sitemap as you expect, for example. (Perhaps there's a DNS issue which means that request isn't going where you expect, for example.)
 

mgeuss

New member
I checked my server logs and now I am even more confused. I found multiple entries like that:

66.xxx.xx.xxx - - [18/Nov/2020:21:33:55 +0100] "GET /sitemap-1.xml HTTP/1.1" 200 1436302 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Everything looks just fine here. In my case the problem must be inside the sitemap file itself which prevents Google from successfully processing it.
 

mgeuss

New member
@Mike I entered this thread because I have a similar issue - Google can't read my sitemap. But it seems that it's a bit different.
While testing, I opened the sitemap-1.xml from xenforo.com in my browser. It first appears as plain text, but when it's loaded complete it shows the parsed xml format.
If I open sitemap-1.xml from https://www.supernature-forum.de/ in the browser, it never get's parsed (even the file itself seems to be ok, I checked that by downloading it).
Google never had issues with the sitemap from vBulletin, I also have WordPress running on that server with a sitemap, so it's hard for me to believe there are issues in server configuration .
Do you have any further idea?
 

Mike

XenForo developer
Staff member
Just to clarify, sitemap.php (or sitemap.xml) is the URL you should be submitting and what we automatically submit. It's split as necessary automatically and Google will follow the individual ones.

If I open sitemap-1.xml from https://www.supernature-forum.de/ in the browser, it never get's parsed (even the file itself seems to be ok, I checked that by downloading it).
I just tried this and it formatted as XML for me, which at least indicates that it's syntactically valid, though there really isn't any reason it shouldn't be. Indeed putting it through a sitemap validator immediately comes back with no issues.

Unfortunately, I don't really have any particular recommendations, especially if there's evidence of Google reading the sitemap via the web server logs, unless they can give a more specific idea of what the error is (the 404 at least indicates something, for example). I've just checked ours here and it has been submitted and processed successfully (and it's split into 6 parts so it's even larger).
 

mgeuss

New member
I've spent the last two hours doing some research on this. It's completely weird.
I took the automatically generated sitemap-1.xml, edited it manually and deleted the last 5000 urls, saved it under a different name and submitted it - it failed. I went on and cutted out the next 5000 urls, saved it under a different name, submitted it - it failed.
I repeated this until I had a file with only the first 5000 urls from the original file. I submitted that - and finally, success!
1606218902198.png
But that seemed too odd to be true, I couldn't believe that Google is only able to process a maximum of 5000 urls.
So I took URLs 5001-10000 from the original file, submitted this as a separate sitemap - and it failed again.
It seemed clear to me now that there must be a problem with the content of the sitemap itself.
I took the file with the 5000 urls which failed and cuttet it down in steps of 500 entries as before.
To shorten this: I ended up with a failed file containing only 15 urls.
Im manually inspected the URLs, they all work. I then discovered that there was an URL containing the word "sex".
I removed it - and the file was processed.
Is it possible that Google rejects a sitemap file based on a "bad word filter"?
 

Mike

XenForo developer
Staff member
Just to clarify, there is no 5000 entry limit in sitemaps. The limit is explicitly 50000 entries, which is how we split the sitemaps. The sitemap for this forum has a number of files with 50000 URLs and they're submitted successfully.

Unfortunately, if Google ends up having some sort of arbitrary or undocumented limit based on URL contents or specific URLs, that's unlikely to be something that we can really account for.
 

mgeuss

New member
I don't think it's a about the number of URLs. I tried various XML Sitemap Validation Online-Tools. Some worked, some failed with different errors.
My guess is that it is a question of server performance/response time. If the sitemap file can't be retrieved quick enough, the error occurs.
I don't know how hard it is to implement, but maybe it would help some customers (like me :D) if the number of URLS per sitemap file was adjustable.
 

Ozzy47

Well-known member
I don't know how hard it is to implement, but maybe it would help some customers (like me :D) if the number of URLS per sitemap file was adjustable.

There is an open suggestion for that,
 
Top