XF 2.2 Confused by sitemaps...

RobParker

Well-known member
Could someone explain how XF creates sitemaps as I think I'm missing something.

The admin page says:
If this option is enabled, the sitemap will be rebuilt automatically periodically. If this option is disabled, the sitemap will only be updated when it is rebuilt manually through Tools > Rebuild caches. The current sitemap can be accessed via sitemap.php.

That link to sitemap.php is ourdomain.co.uk/sitemap.php but that contains the following:

<sitemapindex>
<sitemap>
<loc>https://www.ourdomain.co.uk/sitemap-1.xml</loc>
<lastmod>2021-08-13T22:38:22+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.ourdomain.co.uk/sitemap-2.xml</loc>
<lastmod>2021-08-13T22:38:22+00:00</lastmod>
</sitemap>
</sitemapindex>

Those 2 xml files don't exist (404 errors).

Do I have something configured incorrectly?
 
Solution
A rather long story short, you will need full friendly URLs enabled for the sitemap URLs that end in .xml to actually work.

Once the sitemap grows beyond 50,000 URLs it has to be split into separate .xml files, as in your case. Prior to the 50,000 URLs limit, sitemap.php will work but in reality we've seen Google ignore sitemaps unless they have a .xml extension.

Once you have full friendly URLs enabled, you can actually just use the URL sitemap.xml to access the index page and then sitemap-1.xml and so on will work.
Our site is installed in the root.

As I said, ourdomain.co.uk/sitemap.php exists and contains what I pasted above.

Neither sitemap-1.xml or sitemap-2.xml exist though (404).
 
Does it try to write a real file to that sitemap-1.xml location? Which user does it do that as? Could it be a filesystem permission issue? I'd assume an error would show somewhere if it couldn't write the file?
 
A rather long story short, you will need full friendly URLs enabled for the sitemap URLs that end in .xml to actually work.

Once the sitemap grows beyond 50,000 URLs it has to be split into separate .xml files, as in your case. Prior to the 50,000 URLs limit, sitemap.php will work but in reality we've seen Google ignore sitemaps unless they have a .xml extension.

Once you have full friendly URLs enabled, you can actually just use the URL sitemap.xml to access the index page and then sitemap-1.xml and so on will work.
 
Solution
Hmm now I'm more confused. I don't think that's the issue but here's everything I can figure out:

We don't have full friendly URLs enabled.

I've manually rebuilt the sitemap a few times as a test.

The sitemap is generated and in internal_data there's a zip file containing the xml

/public_html/internal_data/sitemaps
-rw-rw-rw- 1 apache apache 1.4M Aug 14 00:31 sitemap-1628897450-1.xml.gz
-rw-rw-rw- 1 apache apache 561K Aug 14 00:31 sitemap-1628897450-2.xml.gz

The contents looks right from what I can tell.

The files aren't being created in the root directory (public_html) though.

What should the permissions on public_html be?

They were (and have been for years) this:

drwx--xr-x. 21 admin apache 4.0K Aug 14 00:20 public_html

(edit: even though the group is called "apache" we now run nginx)

admin is the user login for the server/ftp. I assume the webserver runs as a different user though.

I tried this and it didn't help:
drwxrwx--x. 21 admin apache 4.0K Aug 14 00:20 public_html
 
No. The files don’t need to be copied to the root. If friendly URLs are enabled then sitemap.xml, sitemap-1.xml and so on hit XF rather than just being a straight up 404 served by your web server. XF routes the request and internally serves the correct content for the URL requested.
 
Maybe I'm explaining myself really badly here. I get that for the sitemap to actually be useful I need to enable friendly URLs, I'm not arguing that.

My point/question, is that a sitemap XML is being created in internal_data but it's failing to be copied to the root. That's surely another issue somewhere with our permissions (and probably one we want to resolve, no?). If our site was smaller and only had 10,000 links, a physical .xml file would be created in the forum root, right and that'd be usable?



Also, regarding friendly URLs, it needs:
location /xf/

Is that a full or relative path?
e.g. is it /home/sites/oursite.co.uk/public_html/ or is it just /?

Edit: a post elsewhere explains it'd just be /
 
Last edited:
My point/question, is that a sitemap XML is being created in internal_data but it's failing to be copied to the root.
It doesn’t need to be copied to the root.

To give another example outside of sitemaps to make it clearer.

This thread has a URL of threads/confused-by-sitemaps.197291. That is not a file that exists on this server. The threads directory doesn’t even exist on the server. Instead we route that URL in such a way that the software knows to return the data from a specific thread with a specific ID.

Sitemaps work in a similar way. sitemap.xml and sitemap-1.xml etc. do not exist anywhere. They do not need to exist as files in the root. Instead we route that URL in such a way that the software knows to return the specific sitemap data.

All you need to do is enable friendly URLs and the URLs will just work.
 
I understand all of that but I think you’re not replying to my question…

This is correct, right?
Yes, the .xml files are physical files.

Physical files are created and are normally copied to the root (but sitemap.php avoids the need for that).

I’ll sort the friendly URLs and go down that route but now I’m trying to ask is this failure to create the files a symptom of a permission issue we have somewhere. Should files have been copied (not “do they need to be”) to the root? If that should have happened but failed, it seems a bad idea to ignore it doesn’t it?
 
Well, the physical files are what you have found under internal_data/sitemaps but, no, as I’ve said; They are not copied anywhere. That’s where XF enumerates the data from when we route the sitemap.xml and sitemap-1.xml etc URLs through the software to display the data.

These files are not copied to the root. They do not need to be copied to the root. They should not exist in the root. They are not copied from anywhere to anywhere.
 
Thanks for clearing that up.

It looks like the confusion is from when I asked
“Does it try to write a real file to that sitemap-1.xml location?” and was told yes.
 
into sitemap.php

<?php

$dir = DIR;
require ($dir . '/src/XF.php');

XF::start($dir);
$app = XF::setupApp('XF\Pub\App');

/** @var \XF\Sitemap\Renderer $renderer */
$renderer = $app['sitemap.renderer'];
$request = $app->request();
$response = $app->response();
$counter = $request->filter('c', 'uint');

$response = $renderer->outputSitemap($response, $counter);
$response->send($request);
$dir = DIR;
require ($dir . '/src/XF.php');

XF::start($dir);
$app = XF::setupApp('XF\Pub\App');

/** @var \XF\Sitemap\Renderer $renderer */
$renderer = $app['sitemap.renderer'];
$request = $app->request();
$response = $app->response();
$counter = $request->filter('c', 'uint');

$response = $renderer->outputSitemap($response, $counter);
$response->send($request);
<sitemapindex>
<sitemap>
<loc>https://www.ourdomain.co.uk/sitemap-1.xml</loc>
<lastmod>2021-08-13T22:38:22+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.ourdomain.co.uk/sitemap-2.xml</loc>
<lastmod>2021-08-13T22:38:22+00:00</lastmod>
</sitemap>
</sitemapindex>

Those 2 xml files don't exist (404 errors).
Will it be like this inside the sitemap.php file?
 
Top Bottom