1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.2 Over 30,000 google 404 crawl errors?

Discussion in 'Troubleshooting and Problems' started by william1872, Feb 28, 2014.

  1. william1872

    william1872 Member

    Just checking google webmaster tools today and noticed excessive crawl errors, in fact over 30k?

    From what I can see there's a url being appended at the end of the actual url and I've not idea what's going on?

    Here's an example.

    Code:
    http://www.jvfocus.com/forums/internet-marketing-discussion-forum.11/forums/internet-marketing-discussion-forum.11/?order=title
    
    http://www.jvfocus.com/threads/a-powerful-recipe-to-help-you-become-more-productive.670/members/william-murray.1/
    Any help appreciated and I've attached a screenshot of the errors...Here's our robots.txt file to.

    Code:
    User-agent: *
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /attachments/
    Disallow: /goto/
    Disallow: /posts/
    Disallow: /login/
    Disallow: /admin.php
    Disallow: /jvzoosub/
    Disallow: /cometchat/
    Disallow: /kibana/
    Disallow: /thankyou/
    Disallow: /online/
    Disallow: /search/
    Disallow: /wmimages/
    Disallow: /banners/
    Disallow: /pages/confirm-subscription/
    Allow: /
     

    Attached Files:

  2. Liam W

    Liam W Well-Known Member

    What addons do you have installed? Could one of them be causing the issue?

    Are those links visible when you visit the page(s)?

    Liam
     
  3. william1872

    william1872 Member

    They're invalid urls and I've attached a list of addons Liam
     

    Attached Files:

  4. Liam W

    Liam W Well-Known Member

    The top two look like they could be the offenders to me, judging by their names...

    If you go into the Chrome Dev console on one of the threads, and then do a search (Ctrl+F with the console in focus), where are the invalid URL's in the source?

    Liam
     
  5. william1872

    william1872 Member

    Not seeing anything that resembles the errors google bot is seeing in the dev console, totally baffled by this one?
     
  6. william1872

    william1872 Member

    Here's another example url you can see it's being appended to the end of the actual url

    Code:
     http://www.jvfocus.com/threads/jay-abraham-the-worlds-preeminent-business-growth-expert.197/threads/when-long-copy-doesnt-work.751/
    The url portion that's been appended is actually another thread without the domain dot com

    Code:
    /threads/when-long-copy-doesnt-work.751/
     
  7. Daniel Hood

    Daniel Hood Well-Known Member

    Do you have any template modifications? I think the issue may be a link going to a relative path instead of to the full url.
     
    william1872 likes this.
  8. william1872

    william1872 Member

    There have been some template modifications and I can get the developer to look into that, thanks Daniel : )
     
  9. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    You have some <head> problems and the "base href" appears to be a casualty of that which is likely causing these "double" links.

    Send me a PM with a URL and admin login to your forum. I can examine your templates and try to fix this.
     
    hellreturn and TheBigK like this.
  10. william1872

    william1872 Member

    Thanks Jake :)
     
  11. visulet

    visulet Member

  12. Mike

    Mike XenForo Developer Staff Member

    It comes across as if there was a <base> tag broken at some point, but unfortunately without knowing the page where it found those URLs, it's difficult to suggest anything in particular.
     
  13. visulet

    visulet Member

    Check : http://webcache.googleusercontent.c...the-secret-is-in-the-seasoning.24018/&strip=1
    Google don't see <base href="http://forum.dontpayfull.com/" /> in Text-only version.
    All links are broken.


    In Full version: http://webcache.googleusercontent.c...the-secret-is-in-the-seasoning.24018/&strip=0
    All look ok... but... there are 2 base href:
    <base href="http://forum.dontpayfull.com/threads/the-secret-is-in-the-seasoning.24018/"> This is put by google.

    <base href="http://forum.dontpayfull.com/" /> - from dontpayfull source code.

    I checked other xenforo forums the links are ok in Text-only version.

    Fetch as google the base href is there:
    2015-01-28_0035.png

    Any suggestion?

    Thank you.
     
  14. Mike

    Mike XenForo Developer Staff Member

    There might actually be a Google change that seems to vary from what I've seen in the past. It varies from the link you provided versus a link from XenForo.com. Google adds its own <base> tag for the cached content pages and we actually have some JS in there to help correct that (it's why the full version works for you). Since they strip out the JS in text only, it doesn't work. The change is that on the XF.com result, it's actually using the base tag from the original HTML, not just the page URL; it certainly didn't do that before, so it could even be a recent fix.

    I'm not sure that's necessarily the page generating the issue though -- it seems strange for Google to crawl that (let alone get a ton of links off it).
     
  15. TheBigK

    TheBigK Well-Known Member

    @visulet informed me about the problem and I'm bit confused here.

    1. Our GWT does not show any 404s for member links.
    2. If I go to: http://webcache.googleusercontent.c...ngineers.com/threads/drdo-exam.78513/&strip=1 -> Click on 'Text Only' version -> Click on Member Link, I got a BAD Link! However, the links on the life site for the actual URL just works absolutely fine on live site.
    3. This behavior is random - most of the pages I checked are all right. But I did find few pages that have this problem.

    What's going on? Totally clueless!
     
  16. visulet

    visulet Member

    Traffic droped by 30%.
    We have over 30% 404 errors in logs... we see google bot , bing bot... some users...

    2015-01-28_1523.png
     
  17. visulet

    visulet Member

    We have implemented a quick fix that changes the way url are formed by prepending a '/' where applicable. I have attached a patch against XenForo 1.4 version. It would be wonderful if a XenForo Developer takes a look at this for possible issues using this method.
    Thanks.
     

    Attached Files:

  18. Mike

    Mike XenForo Developer Staff Member

    It only works if your install is at the root. It should suffice for your purposes, though I would note that we haven't had other reports of this (certainly not happening here) so I'm not sure what's triggering it in your case.
     
  19. Sankisan

    Sankisan New Member

    We have the same problem as visulet, 25.000 crawler errors right now :(

    trying his fix now :D
     

    Attached Files:

Share This Page