Googlebot found an extremely high number of URLs on your site

TheBigK

Well-known member
Looks like our affair with the Google Bot isn't ending. I checked Google Webmaster Tools and it's reporting -


Googlebot found an extremely high number of URLs on your site
Googlebot encountered problems while crawling your site http://www.crazyengineers.com
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.


Here's a list of sample URLs with potential problems. However, this list may not include all problematic URLs on your site.

I've absolutely no clue what's going wrong and in the last ~11 months of running Xenforo, I've never had this issue. I didn't make any change to the website that could have resulted in this issue.

Can someone inspect our site and see if anything need to be fixed?
 
Do you see the same robots.txt in WMT? You can also test URLs in that tool. You can also do a folder removal of /find-new/, but then it has to be in robots.txt. It seems right to me.
Yes, I do. I checked with the URL checker and found that Google Bot is indeed banned from visiting those URLs / Pages -

http://www.crazyengineers.com/community/find-new/3733339/threads?page=2
Blocked by line 2: Disallow: /community/find-new/

It's very strange that Google Bot's visiting that page and reporting it.
 
WMT data isn't always up to date, sometimes weeks old. Those errors could take months to go away in WMT. If the robots.txt checker says you're good I wouldn't worry. The actual Googlebot sees your changes almost instantly.

WMT is in need of a serious overhaul IMO... the site speed checker on it hasn't updated since August on mine.
 
WMT data isn't always up to date, sometimes weeks old. Those errors could take months to go away in WMT. If the robots.txt checker says you're good I wouldn't worry.
Thanks. Yeah, it may take months before it's gone. But the issue is - the WMT reported those URLS few days after I blocked them via robots.txt. The problem is the large number of 404s and these 'large number of URLs' type of errors are causing me a traffic drop. Need to do something to get out of the mess.
 
You can remove the folder for the find new URLs, just remove /community/find-new/ . It will take a while before the request is processed, but if you have it blocked via robots.txt, you should be good.
 
Thanks. Yeah, it may take months before it's gone. But the issue is - the WMT reported those URLS few days after I blocked them via robots.txt. The problem is the large number of 404s and these 'large number of URLs' type of errors are causing me a traffic drop. Need to do something to get out of the mess.

One of my forum is also having same issue.. but I believe it's due to Google Panda or Penguin updates.
 
One of my forum is also having same issue.. but I believe it's due to Google Panda or Penguin updates.
Are you sure it's because of the Panda or Penguin? GWT shows 'large number of URLs' on the pages I've blocked through the robots.txt. That is strange. Plus, I have not done anything that makes us vulnerable to Panda or Penguin.

Our site is currently suffering through large number of 404 errors (96k of them!) and this new 'large URLs'. I'm totally clueless on how to go about fixing this.
 
Are you sure it's because of the Panda or Penguin? GWT shows 'large number of URLs' on the pages I've blocked through the robots.txt. That is strange. Plus, I have not done anything that makes us vulnerable to Panda or Penguin.

Our site is currently suffering through large number of 404 errors (96k of them!) and this new 'large URLs'. I'm totally clueless on how to go about fixing this.

I started with 120K++ 404 errors before 10 months and I am still on 11K 404 on one of my website as I changed my forums from /forum/ directory to root directory.

After one of the Google updates my e-commerce website got hit and I am trying to fix the issues. I noticed resolving duplicate title content helps.

The one thing I would suggest is to block "Baidu" bots and increase Google crawl frequency which would compensate usage on your VPS or Dedi server. That worked well for my friend which I mentioned in another topic of yours. He recovered within 3 months and I think he is back to stable as per organic traffic is concerned.
 
Well, for my site, 404 resulted because of deletion of large number of tags from WordPress installation. I've redirected all the deleted tags to homepage and see that the errors went down from 99k to 95k in 4 days. I hope the recovery will be faster.

Did you notice traffic drop because of the page not found errors? I found 2 other webmasters also reported that when the 404 errors suddenly surged, the traffic tanked. I'll consider blocking baidu and increasing Google crawl frequency. That might help. Google's showing duplicate content arising out of Xenforo pagination. I can't do much about it, I guess.
 
Yes, I noticed traffic drop. My main links aka topics which where redirected using 301 didn't had any traffic change by overall yes, their was loss in traffic. You will recover fast, don't worry.
 
TBH if Google is dropping things like member profile pages, profile posts, etc. then I'm not too fussed. I'd rather have my main thread content indexed as a preference over the ancillary fluff. (y)
 
Here's the latest discovery - I found that out of the top 1000 "404" URLs reported in GWT majority of the reported 404s are regular URLs with a strange number string attached.

Here are few sample URLs from my WordPress Installation-

http://www.domain.com/correct-url-ends-here/1345601488000/1345993967000
http://www.domain.com/another-correct-url-ends-here/1346198904000/1346893566000

The bolded number string is appended and GWT even reports several versions of each URL (each with its own number string).
My doubts:

1. Is it because of the 'disqus' comments system? Or something else? I've switched over to the LiveFyre system for the time being. I can disable it totally, if required.

2. What's the proper way to fix this?
 
Hmm I also noticed a very sharp drop in my crawl stats around the end of Sept. When was it that Google changed it's algorithm?

Double checked my Analytics and my Google traffic is still on an upward trend though.
Yup, we've seen a similar indexing dip too. Our graph looks very similar to CrazyE's. ;)
Have you experienced traffic drop as well?
Pretty similar for us too... (not the "extremely high number of URL's", just the dropping crawl stats)...
View attachment 35656

..although traffic is on a steady climb week on week. Back to pre-conversion figures finally (converted in April).
Well it looks like the other shoe dropped because I just lost 60-70% of my Google traffic. However my crawl stats are somewhat back to normal now.
 
Well it looks like the other shoe dropped because I just lost 60-70% of my Google traffic. However my crawl stats are somewhat back to normal now.
I'm sorry to hear that. Did you lose it over a short period of time? How does your GWT account looks like?

Questions-

1. Did you see rise in error count? If yes, what's the type of error?
2. Did you receive any message from GWT?
3. Your crawl stats are back - did the crawl stats drop around the same time your traffic suffered?
 
I'm sorry to hear that. Did you lose it over a short period of time? How does your GWT account looks like?

Questions-

1. Did you see rise in error count? If yes, what's the type of error?
2. Did you receive any message from GWT?
3. Your crawl stats are back - did the crawl stats drop around the same time your traffic suffered?
Nope no new errors.
No message from GWT.
No the crawl stats had dropped in late Sept/early Oct, but were back up to normal before this drop in traffic happened. The impressions and click traffic dropped in one day late last week and hasn't changed since.
 
Top Bottom