XF 1.2 Over 30,000 google 404 crawl errors?

william1872

Member
Just checking google webmaster tools today and noticed excessive crawl errors, in fact over 30k?

From what I can see there's a url being appended at the end of the actual url and I've not idea what's going on?

Here's an example.

Code:
http://www.jvfocus.com/forums/internet-marketing-discussion-forum.11/forums/internet-marketing-discussion-forum.11/?order=title

http://www.jvfocus.com/threads/a-powerful-recipe-to-help-you-become-more-productive.670/members/william-murray.1/

Any help appreciated and I've attached a screenshot of the errors...Here's our robots.txt file to.

Code:
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /admin.php
Disallow: /jvzoosub/
Disallow: /cometchat/
Disallow: /kibana/
Disallow: /thankyou/
Disallow: /online/
Disallow: /search/
Disallow: /wmimages/
Disallow: /banners/
Disallow: /pages/confirm-subscription/
Allow: /
 

Attachments

  • Google Crawl Errors.webp
    Google Crawl Errors.webp
    75.8 KB · Views: 45
The top two look like they could be the offenders to me, judging by their names...

If you go into the Chrome Dev console on one of the threads, and then do a search (Ctrl+F with the console in focus), where are the invalid URL's in the source?

Liam
 
Here's another example url you can see it's being appended to the end of the actual url

Code:
 http://www.jvfocus.com/threads/jay-abraham-the-worlds-preeminent-business-growth-expert.197/threads/when-long-copy-doesnt-work.751/

The url portion that's been appended is actually another thread without the domain dot com

Code:
/threads/when-long-copy-doesnt-work.751/
 
You have some <head> problems and the "base href" appears to be a casualty of that which is likely causing these "double" links.

Send me a PM with a URL and admin login to your forum. I can examine your templates and try to fix this.
 
It comes across as if there was a <base> tag broken at some point, but unfortunately without knowing the page where it found those URLs, it's difficult to suggest anything in particular.
 
Check : http://webcache.googleusercontent.c...the-secret-is-in-the-seasoning.24018/&strip=1
Google don't see <base href="http://forum.dontpayfull.com/" /> in Text-only version.
All links are broken.


In Full version: http://webcache.googleusercontent.c...the-secret-is-in-the-seasoning.24018/&strip=0
All look ok... but... there are 2 base href:
<base href="http://forum.dontpayfull.com/threads/the-secret-is-in-the-seasoning.24018/"> This is put by google.

<base href="http://forum.dontpayfull.com/" /> - from dontpayfull source code.

I checked other xenforo forums the links are ok in Text-only version.

Fetch as google the base href is there:
2015-01-28_0035.webp

Any suggestion?

Thank you.
 
There might actually be a Google change that seems to vary from what I've seen in the past. It varies from the link you provided versus a link from XenForo.com. Google adds its own <base> tag for the cached content pages and we actually have some JS in there to help correct that (it's why the full version works for you). Since they strip out the JS in text only, it doesn't work. The change is that on the XF.com result, it's actually using the base tag from the original HTML, not just the page URL; it certainly didn't do that before, so it could even be a recent fix.

I'm not sure that's necessarily the page generating the issue though -- it seems strange for Google to crawl that (let alone get a ton of links off it).
 
@visulet informed me about the problem and I'm bit confused here.

1. Our GWT does not show any 404s for member links.
2. If I go to: http://webcache.googleusercontent.c...ngineers.com/threads/drdo-exam.78513/&strip=1 -> Click on 'Text Only' version -> Click on Member Link, I got a BAD Link! However, the links on the life site for the actual URL just works absolutely fine on live site.
3. This behavior is random - most of the pages I checked are all right. But I did find few pages that have this problem.

What's going on? Totally clueless!
 
We have implemented a quick fix that changes the way url are formed by prepending a '/' where applicable. I have attached a patch against XenForo 1.4 version. It would be wonderful if a XenForo Developer takes a look at this for possible issues using this method.
Thanks.
 

Attachments

It only works if your install is at the root. It should suffice for your purposes, though I would note that we haven't had other reports of this (certainly not happening here) so I'm not sure what's triggering it in your case.
 
Top Bottom