XF 1.4 Sitemap Full of Strange Characters

mjda

Well-known member
I built the sitemap on my test install, downloaded it, opened it up, and there is just a bunch of strange characters and symbols. When I download, and view, the sitemap here on Xenforo I see plain text. I'm assuming this has something to do with the way my server is configured (gzip maybe?), but I have no idea how to fix it.

Anyone have any ideas?
 
The sitemap is generally gzipped.

Well, yeah, I'm talking about the XML that is left after it's extracted. It's unreadable. I could send you a copy of it if you give me an email address, or I can PM you with a screenshot of my notepad++ if that will work.
 
Sounds like it didn't extract properly to me. Ideally, send me the URL to the sitemap script itself.

I downloaded it a few times and, like I said, the one from xenforo.com looks fine. I used the same program to extract them both. In any case, I just sent you a PM with a link to my sitemap URL.
 
Just for reference, 2 people have sent me links and I'm 99.9% sure it's just down to the program opening it. The file is a gzip file (not a zip file). I think Winzip may be the programmatic program here.

(And for completion sake, the sitemap index file is not run through gzip as it stands, so it's not a fair comparison.)
 
Just for reference, 2 people have sent me links and I'm 99.9% sure it's just down to the program opening it. The file is a gzip file (not a zip file). I think Winzip may be the programmatic program here.

(And for completion sake, the sitemap index file is not run through gzip as it stands, so it's not a fair comparison.)

I just realized something new concerning this. If I download the xml.gz file directly via FTP I can extract it and read it just fine. So, that tells me it's not the program at all and has something to do with a server setting somewhere that is altering the sitemap file when it's sent through http. To take it a step further I downloaded the file through http, then uploaded it to my server again so I could extract it using gunzip. Even doing that, the file was unreadable.
 
Well I've downloaded the sitemap from both sites and extracted it without issue. (The first one, I also checked with wget and gunzip from our server.) You can try manually submitting it to Google via webmaster tools -- that will show you the status of Google processing it.

That said, are you able to identify what the differences are between the 2 files you downloaded? It's worth noting that sitemap.php literally just spits out some headers and calls readfile() to output it. Any variation in content may imply something on the server modifying the content.
 
I had to rename the file and add the .gz extension before unzipping.

Screenshot from 2014-08-09 21:15:26.webp

Works perfectly.

Screenshot from 2014-08-09 21:19:34.webp

I think WinRAR/WinZip cannot recognize the file without the extension.
 
That said, are you able to identify what the differences are between the 2 files you downloaded?

I was actually able to find 1 difference. The sitemap on xenforo.com was compressed to 20%. When I downloaded the one from my site it didn't show to be compressed at all. That really got me to thinking so I started looking at the file a bit closer. I had a file called sitemap-1.xml.gz (downloaded from my test server). I extracted that to sitemap-1.xml. I then renamed sitemap-1.xml to sitemap-1-2.xml.gz. I was able to extract it again and view it just fine. So, basically, I had to extract the same file twice to get it to work.

I went and checked my server settings and realized I had server gzip compression disabled (don't ask me why). I enabled it and restarted apache. Now, things are working as expected. I can now download the file from sitemap.php, extract it and read it just fine.
 
Back
Top Bottom