Partial fix Xenforo 2 Making Sitemaps Larger Than Google's Max Limit: Website Not Being Spidered/Indexed

Affected version
2.0.5

Chris D

XenForo developer
Staff member
@Chris D

The indexing has stopped and we have 7 errors now



Our SEO has tanked harder than it ever has in 25 years, after moving from VBulletin to Xenforo.

This has been the worst financial blow to our company ever

The developers who did our migration destroyed our website and SEO, please help.
None of the reported errors there currently seem to be valid. All of the links in those errors validate in various sitemap validators. The ones which were reported to have over 50,000 URLs all have 40,000 or less.

Clearly you've applied the suggested fix above now, but when was that actually done? Are the errors still there? Are the sitemap URLs exhibiting the errors now different?
 

420

Active member
None of the reported errors there currently seem to be valid. All of the links in those errors validate in various sitemap validators. The ones which were reported to have over 50,000 URLs all have 40,000 or less.

Clearly you've applied the suggested fix above now, but when was that actually done? Are the errors still there? Are the sitemap URLs exhibiting the errors now different?
We made the changes on June 9 when you advised us. Yes, the errors are still there.

Not sure how to answer your last question, can you please be more specific?

We did what you said and waited almost 3 weeks but now we have more errors.

What am I missing here?
 

420

Active member
Do I need to delete the sitemaps in the Google panel and resubmit them and lose 3 weeks of indexing?
 

Chris D

XenForo developer
Staff member
To simplify it, could you just copy and paste the errors from the Google panel again so I can review them.

Also, you may be misunderstanding how sitemaps work. Arguably they’re not actually important at all, but either way if you had to delete the sitemap, that doesn’t “lose 3 weeks of indexing”. You will still be indexed, it won’t remove you from the Google index if you delete the sitemap or do not have one.

The sitemap is simply a nudge in the right direction for Google. My comment about it not being important is mostly down to the fact that they will absolutely index your site whether you have a sitemap or not.

Anyway, if you can provide the current list of errors from the Google panel I’ll take another look.
 

420

Active member
@Chris D why did you edit them out of my original post?

I just went into my Google panel to check and now the sitemaps and errors are all gone, now there is nothing and my systems admin said he didn't delete them. When is this nightmare going to end?

What do I do now?
 

420

Active member
Wow, that's crazy. I refreshed the page and now the sitemaps and errors are back. Go figure.

Here they are again @Chris D

1

Errors
Parsing error
We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.
1
Sitemap: www.xyz.com/community/sitemap-39.xml
10907
Jun 26, 2018
2

Errors
Too many URLs
Your Sitemap contains too many URLs. Please create multiple Sitemaps with up to 50000 URLs each and submit all Sitemaps.
6
Sitemap: www.xyz.com/community/sitemap-28.xml
Tag: urlset
59024
Jun 27, 2018
Sitemap: www.xyz.com/community/sitemap-23.xml
Tag: urlset
59182
Jun 26, 2018
Sitemap: www.xyz.com/community/sitemap-26.xml
Tag: urlset
61502
Jun 26, 2018
 

420

Active member
So what are your thoughts @Chris D do you think we should delete the existing sitemaps from the Google panel and resubmit them again?

Maybe the whole process is hanging because of those previous sitemaps with the wrong amount of entries?

Maybe this would start the process again with the correct sitemaps with the correct number of entries?

Thank you for trying to help us figure this out, we are truly grateful.
 

Chris D

XenForo developer
Staff member
They are still here, I was confused why you asked form them again.
I asked for them again, quite simply, because some time has passed since the last time you posted them and, as I suspected, the sitemap URLs and errors are different.

Wow, that's crazy. I refreshed the page and now the sitemaps and errors are back. Go figure.

Here they are again @Chris D
sitemap-23.xml and sitemap-26.xml contain fewer than 40,000 entries (which shows the code change took effect), so those errors are wrong (on Google's side).

sitemap-39.xml doesn't contain any errors that I can see and several online validators confirm it's a valid XML file. It should be safe to ignore this.

However, sitemap-28.xml does indeed contain more than 50,000 URLs which doesn't really make sense. How can it be over-running by so much? Clearly the 40,000 limit is taking effect in some cases, but not all.

So what are your thoughts @Chris D do you think we should delete the existing sitemaps from the Google panel and resubmit them again?
Bear with me. I'm going to need to do a full audit of the code involved here. There's got to be something wrong, I just need to identify what, otherwise you'll just keep getting the same problem.
 
  • Like
Reactions: 420

420

Active member
I asked for them again, quite simply, because some time has passed since the last time you posted them and, as I suspected, the sitemap URLs and errors are different.


sitemap-23.xml and sitemap-26.xml contain fewer than 40,000 entries (which shows the code change took effect), so those errors are wrong (on Google's side).

sitemap-39.xml doesn't contain any errors that I can see and several online validators confirm it's a valid XML file. It should be safe to ignore this.

However, sitemap-28.xml does indeed contain more than 50,000 URLs which doesn't really make sense. How can it be over-running by so much? Clearly the 40,000 limit is taking effect in some cases, but not all.


Bear with me. I'm going to need to do a full audit of the code involved here. There's got to be something wrong, I just need to identify what, otherwise you'll just keep getting the same problem.
Of course man, please do your thing and I'll be patent.

Now that I know you're dedicated to it, I can breathe a little. :)

Thank you so very much, we are truly grateful.
 

420

Active member
Looks like it changed again

1

Errors
Parsing error
We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.
1
Sitemap: www.xyz.com/community/sitemap-39.xml
10907
Jun 26, 2018
2

Errors
Too many URLs
Your Sitemap contains too many URLs. Please create multiple Sitemaps with up to 50000 URLs each and submit all Sitemaps.
6
Sitemap: www.xyz.com/community/sitemap-7.xml
Tag: urlset
50152
Jun 27, 2018
Sitemap: www.xyz.com/community/sitemap-28.xml
Tag: urlset
59024
Jun 27, 2018
Sitemap: www.xyz.com/community/sitemap-23.xml
Tag: urlset
59182
Jun 26, 2018
 

420

Active member
@Chris D we have yet another SEO issue that a hundred thousand member profiles cannot be seen or indexed by Google.

How do we stop people from making profiles private and destroying our SEO?

Most of them are doing it without even realizing it, then getting confused why nobody is following them or something on thier threads/posts.

We are a free community and do not charge for access, what we get in return is the content displayed for our readers and indexed by Google.

Can you please help on this topic as well?

Thank you.
 

Chris D

XenForo developer
Staff member
Nothing we can do here and I’m sure it’s not, as you say, “destroying your SEO”. I’m not even sure how you’d be drawing that conclusion.

Your members have a right to privacy and prevent information about themselves from being viewed if that’s their wish.

It’s fairly difficult to do without realising it. It’s a fairly explicit action.

You mention their threads/posts. Preventing their profile being visible to guests or other members doesn’t prevent their content being visible or anything like that so that’s a confusing statement. Their content would still be indexed.
 

420

Active member
Nothing we can do here and I’m sure it’s not, as you say, “destroying your SEO”. I’m not even sure how you’d be drawing that conclusion.

Your members have a right to privacy and prevent information about themselves from being viewed if that’s their wish.

It’s fairly difficult to do without realising it. It’s a fairly explicit action.

You mention their threads/posts. Preventing their profile being visible to guests or other members doesn’t prevent their content being visible or anything like that so that’s a confusing statement. Their content would still be indexed.
So much crazy stuff going on with this nightmare migration even 7 months into it, maybe I'm confusing the member profile privacy with the member galleries being made private. So many members are so confused, so many are upset, most have left for other sites already. So many bug reports and support tickets I can't even keep track anymore.

Let me be clear on the SEO issues related and get us on the same page, sorry for any confusion. This whole rabbit hole is a mess.

1. Gallery privacy: members are making their galleries private and wondering why nobody is following, commenting etc. Can we assume these galleries are also not being indexed by Google, if they cannot be seen by members? Almost a hundred thousand Google errors, it's been a challenge getting through them all. We do have the fix for this now, by moving the galleries to another folder then turning off that option. Still waiting on our systems admin to create the new dev site to test there first.

2. Member profile privacy: In regards to our members having the right to privacy on profiles so they cannot be indexed by Google, I'd like to believe that is incorrect. We are a private company and website and have our own set of guidelines and protocols. Our site content is a subject matter of federal illegality, therefore we cannot have members making private profiles and committing federal crimes on our website, we need to be able to see everything. How do we get a 150,000 profiles indexed by Google, after being indexed by them when we were on VBulletin? This is a major hit to our SEO, can you help figure out a solution?
 

420

Active member
By the way @Chris D looks like this changed again, they are coming off after I resubmitted.

Hopefully this is a good thing, I'm thinking optimistically.

Aren't these the two that had the issue to begin with?

1

Errors
Too many URLs
Your Sitemap contains too many URLs. Please create multiple Sitemaps with up to 50000 URLs each and submit all Sitemaps.
2
Sitemap: www.xyz.com/community/sitemap-7.xml
Tag: urlset
50152
Jun 30, 2018
Sitemap: www.xyz.com/community/sitemap-28.xml
Tag: urlset
59024
Jun 29, 2018
 

Chris D

XenForo developer
Staff member
By the way @Chris D looks like this changed again, they are coming off after I resubmitted.

Hopefully this is a good thing, I'm thinking optimistically.

Aren't these the two that had the issue to begin with?
It's definitely a good thing that Google is no longer seeing files which are apparently corrupted. I suspect this may have been a bug in their parser which has now been resolved.

We still obviously have the sitemap files overrunning the (now) 40,000 limit. I've been through the code several times. At this point, I don't even have any best guesses as to where this is going wrong.

I think we need to add some debugging code and monitor the process more closely. To do this, can you please submit a ticket from the customer area and provide an Admin login, FTP login and access to the database, e.g. through PhpMyAdmin? Hopefully we'll be able to track this down.
 

XF Bug Bot

XenForo bug fixer bot
Staff member
Thank you for reporting this issue. The issue is now resolved and we are aiming to include that in the next XF release (2.0.11).

Change log:
Incrementally update the job state of the Sitemap job so that a fatal error shouldn't disrupt the process and introduce corrupted/duplicate items.
We've spent some time looking into this and discovered that the server involved is experiencing some sort of unlogged fatal error during the process. This is almost certainly a server issue, but we have implemented some changes that can reduce the scope of error when the job resumes from its interrupted state.
 

420

Active member
Thank you Chris, I'll work closely with Steven to resolve anything else on our end and get back to you soon.
 
Top