1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.5 internal_data/sitemaps was over 650 files

Discussion in 'XenForo Questions and Support' started by dvsDave, Nov 10, 2015.

  1. dvsDave

    dvsDave Well-Known Member

    So, I got a disk space warning on my server and started looking into what folders were the biggest offenders. To my surprise, the sitemaps folder was HUGE (like 22 gigs) There were a bunch of .gz files that were old, so I deleted those. I then wiped out the whole folder and went to Tools->Rebuild Caches and went to rebuild the sitemap.

    Four Hours Later... I have over 650 5mb xml files and it's still rebuilding the Tags for the sitemap. We used to have a system that auto-generated tags and as a result, we have a LOT of tags across 35k discussions. When I woke up this morning, I had a 3 gig sitemap folder and the rebuild tool had timed out at some point.

    So, I wiped the sitemap folder again, changed the XML Sitemap Generation option to exclude tags, then rebuild the sitemap with the Rebuild Caches tool. This time it took only a minute and my sitemap folder was only about 3 megs.

    I guess I want to know if this is normal? Should I reinstate the tags and just not worry about the size, or should I keep excluding the tags? Do the tags make any appreciable SEO difference or does Google ding me for having an insane sitemap folder?
  2. Brogan

    Brogan XenForo Moderator Staff Member

    That doesn't sound right as there should only be a single entry in the sitemap for each tag.
    How many tags do you have?
  3. dvsDave

    dvsDave Well-Known Member

    Not sure, how can I find out? I also pulled a random xml file before I deleted them and zipped it up so I could upload it here.

    Attached Files:

  4. Steve F

    Steve F Well-Known Member

    dvsDave likes this.
  5. Brogan

    Brogan XenForo Moderator Staff Member

    Navigate to admin.php?tags/
    That will give you a count.

    Based on that file, it's going to be a big number.
    dvsDave likes this.
  6. dvsDave

    dvsDave Well-Known Member

    It's actually only 270 tags.

    So, just to clarify, I'm not talking about the /sitemap folder, but the /internal_data/sitemaps/ folder where I was seeing these crazy figures.
  7. Chris D

    Chris D XenForo Developer Staff Member

    There's 50,000 tags in that one sitemap file, though.
  8. Chris D

    Chris D XenForo Developer Staff Member

    Oh hang on a minute... there's a lot of repetition in there.

    Halloween appears 187 times.

    I'm going to look at this in more detail, from the code. First:

    Can you confirm the result of this query:
    SELECT COUNT(*) FROM xf_tag WHERE use_count > 0
    Also can you confirm whether any add-ons are involved? Either were the tags imported from another add-on? Or are the tags themselves part of an add-on? e.g. is some of these repeated tags pointing to content that belongs to an add-on? It shouldn't matter, but worth checking.
    dvsDave likes this.
  9. dvsDave

    dvsDave Well-Known Member

    Sorry all, my 16 month old just pushed my laptop off the table and now the keyboard won't work (typing this on my cell). This has happened before and it's a known issue on my asus laptop, just have to open it up and reseat the keyboard connection. Will get the results of the query as soon as I can find the bizzare size torx driver I keep for just these occasions.
    Kintaro likes this.
  10. dvsDave

    dvsDave Well-Known Member

    Back up and running! The query returned 267

    The biggest suspected culprit is VaultWiki. I just noticed a server error message:

    I've submitted a bug report to him: https://www.vaultwiki.org/issues/4443/

    I also used to use cemzoo's sitemap generator(linked to discussion, since the resource has been pulled), but I disabled that after I upgraded to 1.5 (forgot to turn it off when I went to 1.4)
  11. Chris D

    Chris D XenForo Developer Staff Member

    I'm intrigued what would happen if you did the following:
    • Disabled all add-ons
    • Disable all sitemap content types except Tags
    • Rebuild the sitemap manually from "Rebuild Caches" page
    Theoretically, 267 tags would be completed in mere seconds. If it is completed quickly, without any add-ons enabled, then it might confirm that an add-on is responsible. Then you could keep trying it again with different add-ons enabled to confirm which is doing it.
  12. dvsDave

    dvsDave Well-Known Member

    So, I disabled VaultWiki and ran the rebuild sitemap tool with only tags enabled and that took 2 seconds.

    I then Re-enabled vaultwiki with only Tags enabled and that just started churning thru data.


    I then disabled Tags in the sitemap generation settings and rebuilt again.

    Took about 30 seconds this time.
  13. Chris D

    Chris D XenForo Developer Staff Member

    Certainly seems like VaultWiki is to blame then, unfortunately.
    dvsDave likes this.
  14. dvsDave

    dvsDave Well-Known Member

    I'll work with them to get the issue resolved and report back here when we are done.
    Chris D likes this.
  15. pegasus

    pegasus Well-Known Member

    Patch instructions here: https://www.vaultwiki.org/issues/4444/#note24299
    Update: Patch 4.0.7 PL 1 released.
    • Tag Duplication Vulnerability
    • Template Expansion Vulnerability
    • Template Usage Vulnerability
    • Node Overload Vulnerability
    All of these were related to Denial of Service (with the tag-duplication reported in this thread, an attacker didn't have to be involved).
    Official disclosures should be forthcoming by week's end.
    Last edited: Nov 11, 2015
  16. dvsDave

    dvsDave Well-Known Member

    That patch and one other patch from Vaultwiki did the trick. :)

    Apparently this is what was happening:
  17. pegasus

    pegasus Well-Known Member

    A typo in array keys 'tag_id' vs $tag_id
    Usually something like that just issues an E_NOTICE and doesn't bring whole servers down.

    Might I suggest a change to the Sitemap builder where it will stop building the sitemap if it exceeds certain size limits? Even if a site really had 2B tags, I doubt they would want a 45G sitemap file. Sometimes the admin won't notice (to uncheck including e.g. tags in the sitemap) until it's too late.

    I believe there are already protections like this in place for attachment storage (once the site has XGB of attachments, XenForo stops letting users upload them). EDIT: Actually I can't find such an option in XenForo at quick glance. I found it on my vBulletin test board though. If it doesn't exist, might I suggest this as well?
    Last edited: Nov 11, 2015

Share This Page