Googlebot found an extremely high number of URLs on your site

TheBigK

Well-known member
Looks like our affair with the Google Bot isn't ending. I checked Google Webmaster Tools and it's reporting -


Googlebot found an extremely high number of URLs on your site
Googlebot encountered problems while crawling your site http://www.crazyengineers.com
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.


Here's a list of sample URLs with potential problems. However, this list may not include all problematic URLs on your site.

I've absolutely no clue what's going wrong and in the last ~11 months of running Xenforo, I've never had this issue. I didn't make any change to the website that could have resulted in this issue.

Can someone inspect our site and see if anything need to be fixed?
 
You can add a few more to that robots.txt:
Code:
Disallow: /forums/-/
Disallow: /help/
Disallow: /recent-activity/
Disallow: /login/
Disallow: /lost-password/
Disallow: /misc/contact/
Disallow: /online/
Disallow: /register/
Disallow: /search/

Not sure just how many that would block out though, I see my entire robots.txt only block out 920 URLs, but I have by no means a big board (I have a very small board, with relatively few posts). Do you also get allot of duplicate content warnings? If you have a high number of duplicate title tags etc, it is usually an indication that Google does not ignore something it should be. Would be a good place to start.
 
Yup, we've seen a similar indexing dip too. Our graph looks very similar to CrazyE's. ;)
Pretty similar for us too... (not the "extremely high number of URL's", just the dropping crawl stats)...
Screen Shot 2012-10-15 at 17.49.52.webp

..although traffic is on a steady climb week on week. Back to pre-conversion figures finally (converted in April).
 
Is this somehow related to Google not liking you deleting your Wordpress Tags ?
The traffic is slightly on the incline in the last few days; but the GWT is now reporting this new error. I'm convinced that an error-free website would make Google send the love again.

I'm wondering why is Google Bot indexing the URLs that I've prevented through the robots.txt. Can someone check if my robots.txt is correct?
 
Disallow: /search/[/CODE]

Not sure just how many that would block out though, I see my entire robots.txt only block out 920 URLs, but I have by no means a big board (I have a very small board, with relatively few posts). Do you also get allot of duplicate content warnings? If you have a high number of duplicate title tags etc, it is usually an indication that Google does not ignore something it should be. Would be a good place to start.
Yes I do have duplicate content reported ( about 1000 URLs ), but I can't do anything about it as it seems to be clearly an error from Google's side. There are URLs that don't exist on our site that are marked as 'duplicate'.
 
Yes I do have duplicate content reported ( about 1000 URLs ), but I can't do anything about it as it seems to be clearly an error from Google's side. There are URLs that don't exist on our site that are marked as 'duplicate'.
Ahh, are those pages 404 pages? Do they return 404, or 200? Cause if they don't return 404, google will mark them as duplicates.
 
I've already began the process of removing the duplicate content. Just found out that few of the new members have created duplicate threads in multiple forums to attract attention and responses. :(

I'm not sure why is Google Indexing 'Find-New'? Is my robots.txt correct?
 
Do you see the same robots.txt in WMT? You can also test URLs in that tool. You can also do a folder removal of /find-new/, but then it has to be in robots.txt. It seems right to me.
 
Top Bottom