XF 2.1 Xenforo Creating Two Sitemaps Possibly Causing Sitemap Collision

420

Active member
Just saw in Google Search Console a page that was not indexed, so I inspected it and it shows it's trying to index from two sitemaps

https://www.xyz.com/community/sitemap.php
https://www.xyz.com/community/sitemap.xml

In our Yoast SEO support ticket today, they say that can create problems called "sitemap" collision.

"When we say sitemap collision, we simply mean that you avoid using sitemaps that might include the same URLs. In your case, we understood that your sitemaps are different and do not contain same links, therefore, no further actions needed. Having similar sitemaps might confuse Google crawling each time the duplicate links for indexing purposes."

How do we get down to one sitemap?

Thank you.
 
There should be no sitemap.xml file for Xenforo, just sitemap.php, unless you submitted sitemap.xml manually at some point.

1. Check your installation via FTP or cPanel file manager. If there is an old sitemap.xml file in the forum root, delete it.

2. If you don't see sitemap.xml there, check your robots.txt file. If you see a line that says Sitemap: https://yourdomain.com/sitemap.xml, change it to Sitemap: https://yourdomain.com/sitemap.php.

3. If you don't see anything referencing sitemap.xml in either of those locations, check to see if you have an addon which may be creating sitemap.xml and either uninstall it or check to see if you can disable any sitemap option. XF 2 does this automatically.

4. If none of the above,
  • go to Google Search Console
  • click on Sitemaps
  • click on sitemap.xml if you see it there
  • on the next page click on the three dots upper right
  • select remove sitemap
The following screenshot is for a non-Xenforo site. The extraneous file for your forum would be title sitemap.xml, probably with an older date for "submitted".
remove-sitemap.webp
 
There were some issues originally with Xenforo sitemaps on our huge site when we first migrated from VBulletin. So Xenforo techs went into our servers and created some custom programming to get the sitemaps to work properly, so Google could read them. After that, XF updated their software to make the sitemaps handle 50,000 urls per sitemap. I asked if there was any reason to remove the coding they installed, to be sure there were no conflicts, they said no. Are you suggesting that maybe we do need to clean out that custom coding after all, so we have just one sitemap?

I'll ask my programmer to read this and get us on the same page.

Thank you so much for your feedback, we are truly grateful.
 
There were some issues originally with Xenforo sitemaps on our huge site when we first migrated from VBulletin. So Xenforo techs went into our servers and created some custom programming to get the sitemaps to work properly, so Google could read them. After that, XF updated their software to make the sitemaps handle 50,000 urls per sitemap. I asked if there was any reason to remove the coding they installed, to be sure there were no conflicts, they said no. Are you suggesting that maybe we do need to clean out that custom coding after all, so we have just one sitemap?

I'll ask my programmer to read this and get us on the same page.

Thank you so much for your feedback, we are truly grateful.
No. I stand corrected.

From a conversation with @M@rc

@M@rc wrote:
Are you sure about that? XF 2 added sitemap.xml


It's not just .php anymore.

I'm not referring to a physical sitemap.xml file you'll see if you FTP to the directory.

I think the thread you posted in is referring to the live link:


As you can see, they're both publicly available. XML (I think new in XF 2) might be from the PHP sitemap file.

And my reply:
That seems to be true. There is no sitemap.xml file but domain.com/sitemap.xml will show a bunch of individual sitemaps generated by sitemap.php.
 
So let's return to your original concern:
In our Yoast SEO support ticket today, they say that can create problems called "sitemap" collision.

"When we say sitemap collision, we simply mean that you avoid using sitemaps that might include the same URLs. In your case, we understood that your sitemaps are different and do not contain same links, therefore, no further actions needed. Having similar sitemaps might confuse Google crawling each time the duplicate links for indexing purposes."
I tried searching for information on more than one sitemap for a site and whether that causes a problem. It doesn't appear that it does, despite the quote from Yoast.

Ssee:

which leads me to


where John Meuller says

”Sitemap files are a great way to make your content known to Google and to other search engines. However, they’re limited to 50,000 URLs per file.

What do you do if you have more URLs? You can generate more than one sitemap file per website. You can either submit these individually, for example, through a Search Console. Or you can create a sitemap index file.

A sitemap index file is like a sitemap file for sitemaps. You can list multiple sitemap files in it. If you use a sitemap index file, you can just submit that file for your website in Search Console.

Even if you have fewer than 50,000 URLs, you can submit multiple sitemap files. For example, you might want to do that to keep track of different sections of your website, or just in general to make maintenance of your sitemap files a little bit easier.

When it comes to creating sitemap files, we strongly recommend that you have these made automatically through your server directly. That’s the best way to make sure that your new and updated content is highlighted to search engines as quickly as possible.

Most modern content management systems will take care of this for you. Often, it is just a matter of flipping a switch in your control panel to turn these on.”
 
Thank you DJ,

Do you think it matters that these articles are a year and a half old, a couple of major Google algorithm updates ago?

Still digesting everything with Marc, thank you so much.
 
Both pages have 8 sitemaps with 50,000 urls, so that's 16 sitemaps with 800,000 urls duplicated, that Google has to sift through.

We are wondering if that is causing 3/4 of our pages not to be indexed, due to possible sitemap collision and taking forever to get through the process without timing out.
 
And to add to what @ozzy47 said, you will never see all of your forum pages indexed. A lot, in fact the majority, of forum content is what Google calls "thin content" - not of interest to most searchers and not original.

See
 
Top Bottom