Partial fix Xenforo 2 Making Sitemaps Larger Than Google's Max Limit: Website Not Being Spidered/Indexed

420

Active member
Affected version
2.0.5
We have 2 sitemaps that Google is not spidering/indexing, this is from Webmaster Tools.

Errors
Too many URLs
Your Sitemap contains too many URLs. Please create multiple Sitemaps with up to 50000 URLs each and submit all Sitemaps.

I can send screenshots, however would rather do it privately, as they reveal our website.

Thank you.
 
The sitemap URL will usually end in, e.g. /community/sitemap-1.xml.

I know your site URL so if you just confirm the suffix of the sitemap URL, I'll take a look.
 
The code is fairly explicit here; there's definitely logic to close off a file when it reaches 50,000 but it's clearly over-running in some cases here.

I think the quickest solution for now is to reduce the limit from 50,000 to 40,000 which should mean if it continues to overrun then it still shouldn't exceed 50,000.

This can only be changed by editing the code. The necessary file is src/XF/Sitemap/Builder.php. Find:
PHP:
const MAX_FILE_ENTRIES = 50000;

Replace with:
PHP:
const MAX_FILE_ENTRIES = 40000;
 
We already started the re-indexing of our 3 million plus pages, so Steven wants to wait until the crawling stops, before regenerating.

We will do this as soon as that happens, he says probably a few days.

Thank you Chris.
 
@Chris D We have 2 million forum pages and 1.5 million photos in the gallery, yet only show 2 million submitted for indexing.

Does the gallery have a sitemap?
 
Yes it does.

The number of URLs indexed isn’t a signal of how many URLs your sitemap contains. There may be reasons that some URLs aren’t indexed, such as they already were indexed by the crawler.

The numbers may increase over time too as perhaps there are limitations on Google’s side as to how many they will index at a time.
 
I've found that website speed has a big impact on the rate at which Google will index a site.

If you look carefully at the results in the Google Webmaster Tools (aka Google Search Console), the Google web crawler tends to be fairly consistent in the amount of time it allocates each day to indexing a particular site. If you manage to speed up your site so that it can index more in the same time, the number of pages indexed will increase.

I observed this behaviour when experimenting with Cloudflare's Argo Smart Routing service - I found the significantly lower latency (especially noticeable for sites in South East Asia and Australia which tend to suffer from higher latency from Googlebot's perspective), really made a difference to the rate at which Googlebot indexed our site with pretty much an immediate and noticeable jump in the crawler stats.

I had to turn Argo off for one of our sites because it was costing us too much - but I still run it on another site (mostly because that site makes enough money to justify the cost - but also the traffic on that site is much more localised which doesn't seem to impact Argo costs as much for some reason).
 
Yes it does.

The number of URLs indexed isn’t a signal of how many URLs your sitemap contains. There may be reasons that some URLs aren’t indexed, such as they already were indexed by the crawler.

The numbers may increase over time too as perhaps there are limitations on Google’s side as to how many they will index at a time.

@Chris D We are not concerned with the indexing amount at this point, it's about the amount being shown as submitted.

Where is the gallery sitemap located specifically?

Thank you.
 
@Chris D We are not concerned with the indexing amount at this point, it's about the amount being shown as submitted.

Where is the gallery sitemap located specifically?

Thank you.
Looking at your forum / gallery statistics, why exactly is 2 million not enough?

1.5 million images + 250k threads + 145k members = just under 2 million which sounds about right. Where do you get "2 million forum pages" from?

The gallery sitemap is part of the same sitemap, so it's under sitemap.xml, somewhere amongst all of the index files.

Is this same for XenForo 1.5 (index every page that’s visible to guests)
Yes.
 
Looking at your forum / gallery statistics, why exactly is 2 million not enough?

1.5 million images + 250k threads + 145k members = just under 2 million which sounds about right. Where do you get "2 million forum pages" from?

The gallery sitemap is part of the same sitemap, so it's under sitemap.xml, somewhere amongst all of the index files.

@Chris D If 1.5 images have their own page, then we are only looking at 500,000 pages, which doesn't appear correct with 3.5 million posts.

We're trying to figure out why we have

2 million submitted pages
120,000 indexed

1.6 million submitted images
2600 indexed

The company that did our migration completely destroyed our SEO over the last 4 months and we are working diligently to get it back to normal.

Thank you for any help you can offer as we get through this unfortunate process.
 
The number of posts is irrelevant. We don’t create an entry in the sitemap for each individual post because posts don’t have their own page, they appear on the thread pages which we do create an entry for.

So if you look back on my previous post, which exactly correlates to your forum and gallery statistics, you do only have around 2 million pages to list in your sitemap.
 
The number of posts is irrelevant. We don’t create an entry in the sitemap for each individual post because posts don’t have their own page, they appear on the thread pages which we do create an entry for.

So if you look back on my previous post, which exactly correlates to your forum and gallery statistics, you do only have around 2 million pages to list in your sitemap.
Ok, thank you very much for your feedback and insight.

This stuff is not that easy to understand, we've been in SEO hell ever since moving to Xenforo.
 
Ah, then I guess I have found something. I have some tags included in the sitemap and those tags have content that are not visible to guests, so the tag page is saying "No results found." to guests, and Google Search Console is marking those tag pages Soft 404.
 
@Chris D

The indexing has stopped and we have 7 errors now

1
alert.png

Errors
Unsupported file format
Your Sitemap does not appear to be in a supported format. Please ensure it meets our Sitemap guidelines and resubmit.
1
Sitemap: www.xyz.com/community/sitemap-38.xml
-
Jun 24, 2018
2
alert.png

Errors
Parsing error
We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.
1
Sitemap: www.xyz.com/community/sitemap-39.xml
10907
Jun 23, 2018
3
alert.png

Errors
Too many URLs
Your Sitemap contains too many URLs. Please create multiple Sitemaps with up to 50000 URLs each and submit all Sitemaps.
5
Sitemap: www.xyz.com/community/sitemap-12.xml
Tag: urlset
55999
Jun 25, 2018
Sitemap: www.xyz.com/community/sitemap-47.xml
Tag: urlset
56039
Jun 24, 2018
Sitemap: www.xyz.com/community/sitemap-17.xml
Tag: urlset
52889
Jun 24, 2018

Our SEO has tanked harder than it ever has in 25 years, after moving from VBulletin to Xenforo.

This has been the worst financial blow to our company ever

The developers who did our migration destroyed our website and SEO, please help.
 
This may or may not be correct, but shouldn't the sitemap submitted at Google be sitemap.php and not a specific xml?

sitemap.webp

And aren't the sitemaps in the site's sitemap folder?
 
Top Bottom