XF 2.2 Sitemap not readable by Google

klamm

New member
We succesfully upgraded from VB 3.8.x to XF 2.2. :cool:

Only problem we still cannot solve:
The generated sitemap, located here: https://www.klamm.de/forum/sitemap.xml is "not readable" by Google.
At least that's what search concole says.

Sitemap is delivered and accessable correctly.
Do you have any ideas what may cause the problem?
 

djbaxter

Well-known member
@klamm did you set up the sitemap in the options? (/admin.php?options/groups/sitemap/)

The sitemap for google search console should be https://www.klamm.de/forum/sitemap.php
That's a matter of preference. You can also use https://www.klamm.de/forum/sitemap.xml - in fact that's the recommended Xenforo method. See:


1. did you set up a Google Search Console account and submit your sitemap there?

2. have you specified the correction location of your sitemap in a robots.txt file. Use Sitemap: https://www.klamm.de/forum/sitemap.xml as the last line.
 

klamm

New member
1. did you set up a Google Search Console account and submit your sitemap there?

Yes, that's what I'm talking about. :)
Google reads the index, but does not read the single sitemaps ... although they are available.
e.g. https://www.klamm.de/forum/sitemap-1.xml

2. have you specified the correction location of your sitemap in a robots.txt file. Use Sitemap: https://www.klamm.de/forum/sitemap.xml as the last line.

Yes, done that.

I'm using many other sitemaps with my site, too and that's the first time such thing happens.
That's why I thought it has something todo with XF or the delivery of the sitemaps.
But headers etc. is exactly the same as e.g. here https://xenforo.com/community/sitemap.xml

Maybe I just have to wait another couple of days?
 

Attachments

  • 1.JPG
    1.JPG
    69.8 KB · Views: 8

djbaxter

Well-known member
Google doesn't read the daily sitemaps because it has no need to.

If your site had internal links to new content, and it has at least one link to the sitemap submitted to the Search Console, the job is passed over to Googlebot. who will crawl your site, including new content, at a frequency determined by it's algorithms and crawl budgets. Most of that will be done by visiting the site and following links from the home page recursively.

Out of the box, Xenforo's internal links are quite good, although it does lead to some duplicate content (not a major problem despite the myth that Google punishes duplicate content on the same site — it doesn't; it just ignores the duplicates) and it's worth blocking some of that in robots.txt just to improve the crawl budget.

Model your robots.txt after what Xenforo.com uses. Make sure it is resident in the root folder and does not itself block googlebot. If the forum is in a sudomain instead of a subfolder, ass it to the root of the subdomain as well (since on the server the subdomain is also a folder, it doesn't hurt to have it in both places).

Code:
User-agent: *
Disallow: /community/whats-new/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/search/
Disallow: /community/admin.php
Allow: /

Sitemap: https://xenforo.com/community/sitemap.xml
 
Last edited:

Chromaniac

Well-known member
the screenshot clearly shows that google couldn't read the individual xml files.

i would suggest using google's robots.txt tester to see if google can access the xml files.

Test your robots.txt with the robots.txt Tester - Search Console Help

checking the sitemap files using sitemap tester wouldn't hurt as well. though if no third party code is modifying the sitemap files, they should be fine.

one of the testers i tried gave me the following error:

Your server is reporting that the specific page is not accessible to us (Code 400). Please check that the page URL you have entered is correct, is published, and that there are no problems preventing it's display.
 
Top