• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

XF 1.5 internal_data/sitemaps was over 650 files

dvsDave

Well-known member
#1
So, I got a disk space warning on my server and started looking into what folders were the biggest offenders. To my surprise, the sitemaps folder was HUGE (like 22 gigs) There were a bunch of .gz files that were old, so I deleted those. I then wiped out the whole folder and went to Tools->Rebuild Caches and went to rebuild the sitemap.

Four Hours Later... I have over 650 5mb xml files and it's still rebuilding the Tags for the sitemap. We used to have a system that auto-generated tags and as a result, we have a LOT of tags across 35k discussions. When I woke up this morning, I had a 3 gig sitemap folder and the rebuild tool had timed out at some point.

So, I wiped the sitemap folder again, changed the XML Sitemap Generation option to exclude tags, then rebuild the sitemap with the Rebuild Caches tool. This time it took only a minute and my sitemap folder was only about 3 megs.

I guess I want to know if this is normal? Should I reinstate the tags and just not worry about the size, or should I keep excluding the tags? Do the tags make any appreciable SEO difference or does Google ding me for having an insane sitemap folder?
 

Brogan

XenForo moderator
Staff member
#2
That doesn't sound right as there should only be a single entry in the sitemap for each tag.
How many tags do you have?
 

dvsDave

Well-known member
#6
It's actually only 270 tags.

So, just to clarify, I'm not talking about the /sitemap folder, but the /internal_data/sitemaps/ folder where I was seeing these crazy figures.
 

Chris D

XenForo developer
Staff member
#8
Oh hang on a minute... there's a lot of repetition in there.

Halloween appears 187 times.

I'm going to look at this in more detail, from the code. First:

Can you confirm the result of this query:
Code:
SELECT COUNT(*) FROM xf_tag WHERE use_count > 0
Also can you confirm whether any add-ons are involved? Either were the tags imported from another add-on? Or are the tags themselves part of an add-on? e.g. is some of these repeated tags pointing to content that belongs to an add-on? It shouldn't matter, but worth checking.
 

dvsDave

Well-known member
#9
Sorry all, my 16 month old just pushed my laptop off the table and now the keyboard won't work (typing this on my cell). This has happened before and it's a known issue on my asus laptop, just have to open it up and reseat the keyboard connection. Will get the results of the query as soon as I can find the bizzare size torx driver I keep for just these occasions.
 

dvsDave

Well-known member
#10
Back up and running! The query returned 267

The biggest suspected culprit is VaultWiki. I just noticed a server error message:

Server Error Log
Error Info
ErrorException: Undefined index: user - vault/core/controller/ui/integrate/tag/xf.php:35
Generated By: Unknown Account, Today at 10:06 AM
Stack Trace
#0 /home/control/public_html/vault/core/controller/ui/integrate/tag/xf.php(35): XenForo_Application::handlePhpError(8, 'Undefined index...', '/home/control/p...', 35, Array)
#1 /home/control/public_html/vault/core/controller/ui/integrate/vw.php(242): vw_UI_Integrate_Tag_Controller_XF->get_stack(false)
#2 /home/control/public_html/vault/core/controller/ui/integrate/vw.php(66): vw_UI_Integrate_Controller->get_stack(false)
#3 /home/control/public_html/vault/core/controller/ui/integrate/vw.php(42): vw_UI_Integrate_Controller->setup()
#4 /home/control/public_html/library/vw/XenForo/CodeEventListener/Public.php(234): vw_UI_Integrate_Controller->integrate('<!DOCTYPE html>...')
#5 [internal function]: vw_XenForo_CodeEventListener_Public::front_controller_post_view(Object(XenForo_FrontController), '<!DOCTYPE html>...')
#6 /home/control/public_html/library/XenForo/CodeEvent.php(90): call_user_func_array(Array, Array)
#7 /home/control/public_html/library/XenForo/FrontController.php(183): XenForo_CodeEvent::fire('front_controlle...', Array)
#8 /home/control/public_html/index.php(13): XenForo_FrontController->run()
#9 {main}
Request State
array(3) {
["url"] => string(38) "http://www.controlbooth.com/tags/xl16/"
["_GET"] => array(0) {
}
["_POST"] => array(0) {
}
}
I've submitted a bug report to him: https://www.vaultwiki.org/issues/4443/

I also used to use cemzoo's sitemap generator(linked to discussion, since the resource has been pulled), but I disabled that after I upgraded to 1.5 (forgot to turn it off when I went to 1.4)
 

Chris D

XenForo developer
Staff member
#11
I'm intrigued what would happen if you did the following:
  • Disabled all add-ons
  • Disable all sitemap content types except Tags
  • Rebuild the sitemap manually from "Rebuild Caches" page
Theoretically, 267 tags would be completed in mere seconds. If it is completed quickly, without any add-ons enabled, then it might confirm that an add-on is responsible. Then you could keep trying it again with different add-ons enabled to confirm which is doing it.
 

dvsDave

Well-known member
#12
So, I disabled VaultWiki and ran the rebuild sitemap tool with only tags enabled and that took 2 seconds.

I then Re-enabled vaultwiki with only Tags enabled and that just started churning thru data.

tags_id_screenshot.JPG

I then disabled Tags in the sitemap generation settings and rebuilt again.

Took about 30 seconds this time.
 

pegasus

Well-known member
#15
Patch instructions here: https://www.vaultwiki.org/issues/4444/#note24299
Update: Patch 4.0.7 PL 1 released.
Fixes:
  • Tag Duplication Vulnerability
  • Template Expansion Vulnerability
  • Template Usage Vulnerability
  • Node Overload Vulnerability
All of these were related to Denial of Service (with the tag-duplication reported in this thread, an attacker didn't have to be involved).
Official disclosures should be forthcoming by week's end.
 
Last edited:

dvsDave

Well-known member
#16
That patch and one other patch from Vaultwiki did the trick. :)

Apparently this is what was happening:
Eventually tracked it down to Vaultwiki breaking sitemap generation during the tags step. When it took down the server, it'd created 43GB of temp sitemap files, when that's normally < 10MB. It would have continued to try to generate the temp files except it ran out of space. The sitemap log shows it created 8,202 files tracking 410+ million urls on a site that has nowhere near that.

I just now implemented the fix for the 'Undefined Index: user' issue, hoping that would fix this as well but it didn't...

When I run build sitemap with Vaultwiki disabled, it works perfectly. But as soon as VW enabled, it just perpetually churns out temp files.
 

pegasus

Well-known member
#17
A typo in array keys 'tag_id' vs $tag_id
Usually something like that just issues an E_NOTICE and doesn't bring whole servers down.

Might I suggest a change to the Sitemap builder where it will stop building the sitemap if it exceeds certain size limits? Even if a site really had 2B tags, I doubt they would want a 45G sitemap file. Sometimes the admin won't notice (to uncheck including e.g. tags in the sitemap) until it's too late.

I believe there are already protections like this in place for attachment storage (once the site has XGB of attachments, XenForo stops letting users upload them). EDIT: Actually I can't find such an option in XenForo at quick glance. I found it on my vBulletin test board though. If it doesn't exist, might I suggest this as well?
 
Last edited: