Fixed  All your <base> are belong to us - Broken links on Google/Bing Cached Views

inph

Active member
Pages generated by Google or Bing's (and possibly other search engines) "Cached" views contain broken links due to an overriding <base> tag at the top of the page and relative page links generated by Xenforo.

Browsers:
Chrome 11, Firefox 4, IE7 and later will all ignore subsequent <base> tags that appear after the initial <base> tag.

Opera 11, IE6 will process the additional base tag which also renders the CSS and references the Javascript correctly on the cached view.

Edit: In Opera, viewing "Text Only Version" will also produce broken links.

One solution would be to move from generated relatives to links including the root /

<a href="threads/redirection-scripts-for-vbulletin-3-x.5030/page-2
to
<a href="/community/threads/redirection-scripts-for-vbulletin-3-x.5030/page-2

http://www.google.co.uk/search?q=site:xenforo.com google cache links
xf-google-cache_1.webp

http://webcache.googleusercontent.com/search?q=cache:mA4H1uMotIkJ:xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/ site:xenforo.com google cache links
xf-google-cache_2.webp

http://xenforo.com/community/thread...vbulletin-3-x.5030/forums/add-on-releases.32/
xf-google-cache_3.webp

http://xenforo.com/community/thread...-3-x.5030/attachments/import-301-v2-zip.7288/
xf-google-cache_4.webp

Opera 11: http://webcache.googleusercontent.com/search?q=cache:mA4H1uMotIkJ:xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/ site:xenforo.com google cache links
xf-google-cache_5.webp

Google Cached Page Source
PHP:
<!DOCTYPE html><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<base href="http://xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/">

[snip google generated header html]

<!DOCTYPE html>
<html id="XenForo" lang="en-US" class="Public LoggedOut" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>

<meta charset="utf-8" />
<base href="http://xenforo.com/community/" />

<title>Redirection Scripts for vBulletin 3.x | XenForo Community</title>

Bing Cached Page Source
PHP:
<base href="http://xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/" /><meta http-equiv="content-type" content="text/html; charset=utf-8" /><!-- Banner:Start -->

[snip bing generated header html]

<!-- Banner:End --><div style="position:relative"><!DOCTYPE html>

<html id="XenForo" lang="en-US" class="Public LoggedOut" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>

<meta charset="utf-8" />
<base href="http://xenforo.com/community/" />

Google Cached Text Only Version Page Source
PHP:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<base href="http://xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/">

[snip google generated header html]

<html id="XenForo" lang="en-US" class="Public LoggedOut" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>

    <meta charset="utf-8" />

    <title>Redirection Scripts for vBulletin 3.x | XenForo Community</title>

    <meta name="description" content="If you have imported your vBulletin 3.x database into XenForo, you can automatically redirect all traffic destined for your vBulletin content to its new..." />
 
Brogan, the specific issue is a bit different, though conceptually the same.

This looks like a Google issue on the whole though. Adding an additional base tag seems like a very strange behavior, if one already exists. You can see it coming up with other people as well:
http://www.google.vu/support/forum/p/Webmasters/thread?tid=190a40c928b55355&hl=en

Generating absolute URLs isn't really a good option, and it's rather wasteful in general (repetition and worse for caching). I don't think a change is feasible here.
 
Whilst I agree that Google/Bing should adhere and adopt the <base href /> tag, they dont.
Generating absolute URLs isn't really a good option, and it's rather wasteful in general (repetition and worse for caching). I don't think a change is feasible here.

Absolute URL: http://xenforo.com/community/thread...oken-links-on-google-bing-cached-views.15459/
Relative URL: threads/all-your-base-are-belong-to-us-broken-links-on-google-bing-cached-views.15459/
Root-Relative URL: /community/threads/all-your-base-are-belong-to-us-broken-links-on-google-bing-cached-views.15459/

I can understand why generating absolute URLs is not a good option. How about root-relative URLs?

Another option that keeps the current structure intact would be to extend Router.php. In the same way XF checks for the presence of page-x and additionally check for the presence of forums/members/threads/attachments/etc and respond with a 301 redirect stripping the unnecessary parts of the url out (highlighted in red).

The .js javascripts on xenforo.com appear to already be referenced via absolute URLs (although I'm sure they're relative on my installation). css.php would also require some extra work. Although I wouldn't be too bothered by the stylesheet or javascript not working correctly as long as the links to the content work.

Broken URLs
http://xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/members/kier.2/
http://xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/forums/add-on-releases.32/
http://xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/attachments/import-301-v2-zip.7288/
http://xenforo.com/community/threads/redirection-scripts-for-vbulletin-3-x.5030/threads/redirection-scripts-for-vbulletin-3-x.5030/page-3
 
I believe I have a workaround for this issue as part of another fix. It worked with the Google HTML stuck at the beginning of the page, though I'll have to confirm once Google re-caches some of our pages.
 
Code:
<script type="text/javascript">
var _b = document.getElementsByTagName('base')[0], _bH = "http://xenforo.com/community/";
if (_b && _b.href != _bH) _b.href = _bH;
</script>
Seems to work fine on the cached pages I checked.
Also assuming the broken images are due to referrer checking and rewrite rules.
 
Top Bottom