Traffic Down Since VB to XF Migration

Alfuzzy

Active member
Got the
grep "Googlebot" logfile.txt | grep -v " 200 " | grep -v " 301 " | grep -v " 302 "

that greps for googlebot and excludes lines with 200, 301, 302
Got the greatly reduced results after using the above statement:

Everything is either a 303, 304, 403, or 404.

Thanks
 

arn

Well-known member
Got the

Got the greatly reduced results after using the above statement:

Everything is either a 303, 304, 403, or 404.

Thanks

grep "Googlebot" logfile.txt | wc -l
grep "Googlebot" logfile.txt | grep " 403 " | wc -l
grep "Googlebot" logfile.txt | grep " 404 " | wc -l

Run those commands separately. It will spit out how many log entries total, and how many for 403 and 404.
 

Alfuzzy

Active member
grep "Googlebot" logfile.txt | wc -l
grep "Googlebot" logfile.txt | grep " 403 " | wc -l
grep "Googlebot" logfile.txt | grep " 404 " | wc -l

Run those commands separately. It will spit out how many log entries total, and how many for 403 and 404.

Thanks Arn...will run those separately.

By the way...someone from my host is helping me run these. I think the server log file he's running these commands on are from the last 10-12 hours. To do longer time periods...I think he needs to access additional logs (older files). Is 10-12 hours enough?

Also...with the last run using:

grep "Googlebot" logfile.txt | grep -v " 200 " | grep -v " 301 " | grep -v " 302 "

On a 10-12 hour log...that command narrowed things down from about 1875 hits to just over 100.

Is the last 10-12 hours of server log enough...or should it be longer?

Thanks
 

arn

Well-known member
10-12 hours is ok in that it's showing what's happening right now. But you might need to keep an eye on it over the next week.
 

Alfuzzy

Active member
Awesome...sounds good.

To double check...What grep commands should we be running regularly/semi-regulary over the next week...to find what we want to find?

If Google Search Console -> Settings -> Crawl Stats -> By Response...is saying "Other Client Error (4xx)" = 46%...should we be running different grep commands (other than 403's & 404's)?

Just asking...to confirm I'm doing the right thing (I'm no expert on this).

Thanks:)
 

arn

Well-known member
The question is how much of google's crawl is being spent on pages it shouldn't be, and is there some weird misconfiguration issue.

If you do those counts, you can see what the current percentage of crawl pages is ok, vs spent on something else. Google's console suggests it's 40-some percent on bad pages. This is double checking and seeing what Google is actually hitting.

So, I'd run those numbers once or twice a day, see how it looks, and see if there's an underlying issue.
 

Alfuzzy

Active member
Hello Arn,

We may not need to wait for a full week of data. Been working with the folks at my host...and they were able to run the 3 grep commands you advised for "Googlebot"...going as far back as December 31st. They did a search on the server logs from 12/31/20 thru 1/27/21 (approx 26-27 days)...and here are the results:

1. grep "Googlebot" logfile.txt | wc -l

316711

2. grep "Googlebot" logfile.txt | grep " 403 " | wc -l

3936 (about 1.2%)

3. grep "Googlebot" logfile.txt | grep " 404 " | wc -l

9999 (about 3.1%)

Looks like nowhere near the 46% number from Google Search Console.

Thanks
 

motowebmaster

Active member
Years ago when I migrated from vb3 to xf1, my site's revenue dropped. Learned that while I had many good advertisers, the pool wasn't large enough back then to withstand a change. Since then, revenue has dropped every time I make a significant change.

For me the revenue does return, and have since learned what to not do during major changes. I've also reduced costs and learned to diversify the revenue/advertising sources. It makes more money today than it did in the past, even with current events.

I think the suggestions being offered are worthwhile, but for many advertisers any degree of change will disrupt confidence in your site's ability to deliver on their advertising investment.
 

Alfuzzy

Active member
Years ago when I migrated from vb3 to xf1, my site's revenue dropped. Learned that while I had many good advertisers, the pool wasn't large enough back then to withstand a change. Since then, revenue has dropped every time I make a significant change.

For me the revenue does return, and have since learned what to not do during major changes. I've also reduced costs and learned to diversify the revenue/advertising sources. It makes more money today than it did in the past, even with current events.

I think the suggestions being offered are worthwhile, but for many advertisers any degree of change will disrupt confidence in your site's ability to deliver on their advertising investment.
Thanks for the info.:)

My main goal at the moment is figuring out why site traffic has dropped a bunch since migrating to XF from vB. Once I've figured that out...then I can do a better revenue assessment.

Migrating from vB to XF shouldn't result in a major traffic decrease. If the issue is something setup incorrectly on my end...I would love to figure it out!

I've shared a ton of info in this thread...if anyone has any idea's...please post...I would love to investigate!:)

Thanks
 

Silmarillion

Active member
@briansol
These entries are not a good idea. Users should set their privacy themselves in their settings and excluding postings (goto) is also not good.



Since I've removed these things from robots.txt, the number of hits has increased again, Google indexes more again and my "excluded" content in the GSC is less.

Have a look here:
Unfortunately, I only found out all of this by working with the AMP add-on from @mazzly ...
That surprises me, @Masetrix. The current robots.txt from XF.com looks like this:

Code:
User-agent: *
Disallow: /community/whats-new/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/admin.php
Allow: /

Sitemap: https://xenforo.com/community/sitemap.xml

What does your current robots.txt look like? Would you share this with us?
 

Silmarillion

Active member
Hey Silmarillion. Who were you asking this question? I'm the thread OP...wasn't sure if you were asking me or someone else.

Thanks
Hi Alfuzzy, my question was actually addressed to Masetrix. But of course you are also very welcome to share your file if you like.
 

Masetrix

Well-known member
That surprises me, @Masetrix. The current robots.txt from XF.com looks like this:

Code:
User-agent: *
Disallow: /community/whats-new/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/admin.php
Allow: /

Sitemap: https://xenforo.com/community/sitemap.xml

What does your current robots.txt look like? Would you share this with us?
If "goto" references have ever been used and you block them in Robots.txt later, you will find many new errors in the GSC after a short time. It is better to use "nofollow" or "noindex" tags here.

It makes a difference whether you have created a forum with XF, because you can use this Robots.txt, if forums have been migrated to XF you should better leave that in order not to generate tons of error messages in the GSC. In my experience, a lot of errors also means less traffic.
 

Alfuzzy

Active member
Hi Alfuzzy, my question was actually addressed to Masetrix. But of course you are also very welcome to share your file if you like.
Here's my robots.txt. If anyone has suggestions (what to add to it or what to delete from it)...please post. Thanks:)

User-agent: *
Crawl-delay: 5
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /forums/members/
Disallow: /members/
Disallow: /forums/member.php
Disallow: /member.php
Disallow: /forums/calendar.php
Disallow: /calendar.php

Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /login/
Disallow: /members/
Disallow: /admin.php
Allow: /
 

djbaxter

Well-known member
Two things: you seem to be mixing up WordPress and Xenforo in many of those lines. Also crawl-delay is a bad idea: bettber to let Google and Bing determine optimal craw rates and you can't increase crawl rates in that way, only decrease them.

For WordPress. you really don't need to block anything usually, since most of the areas are already inaccessible and you normally domn't want to block content these days.

For Xenforo, some of what you're blocking doesn't exist at those locations you're blocking.

Change to:

Code:
User-agent: *
Disallow: /forums/whats-new/
Disallow: /forums/posts/
Disallow: /forums/tags/
Disallow: /forums/members/
Disallow: /forums/member.php
Disallow: /forums/calendar.php
Disallow: /forums/account/
Disallow: /forums/attachments/
Disallow: /forums/goto/
Disallow: /login/
Disallow: /forums/members/
Disallow: /forums/admin.php
Allow: /

User-agent: Mediapartners-Google*
Disallow:

Sitemap: http://{yourdomain.com}/forums/sitemap.xml

You can eliminate Mediapartners-Google* if you don't use AdSense.

The reason for disallowing /posts/ and /whats-new/ is duplicate content - /threads? is not blocked.

Optionally, if you actually use tags, you can remove that but it won't help much with indexing.
 

Alfuzzy

Active member
Hello djbaxter...thanks for the help.:)

Yes there are some lines in the robots.txt for both XF and WP. The WP lines are probably some legacy stuff from years ago. As far as the crawl delay...I think I read somewhere that crawlers (at least Google crawler)...ignores any crawl delay in robots file. But if it makes sense to remove the crawl delay line and also simplify things...it can definitely be removed.

In the robots example posted above...I'm assuming the file structure on this website may be different than my site due to the "forums" directory included in the file paths.

Screen Shot 2021-03-31 at 11.18.33 AM.png

Do I need to include the "forums" file path in my robot's (my site doesn't have a specific "forums" sub-directory).

Thanks:)
 

djbaxter

Well-known member
Hello djbaxter...thanks for the help.:)

Yes there are some lines in the robots.txt for both XF and WP. The WP lines are probably some legacy stuff from years ago. As far as the crawl delay...I think I read somewhere that crawlers (at least Google crawler)...ignores any crawl delay in robots file. But if it makes sense to remove the crawl delay line and also simplify things...it can definitely be removed.

In the robots example posted above...I'm assuming the file structure on this website may be different than my site due to the "forums" directory included in the file paths.

View attachment 249587

Do I need to include the "forums" file path in my robot's (my site doesn't have a specific "forums" sub-directory).

Thanks:)
Oh so the forum is now residing in the root directory for your domain? If so, yes: delete the /forums part for all those entries so it looks like this:

Code:
User-agent: *
Disallow: /whats-new/
Disallow: /posts/
Disallow: /tags/
Disallow: /members/
Disallow: /member.php
Disallow: /calendar.php
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /login/
Disallow: /members/
Disallow: /admin.php
Allow: /

User-agent: Mediapartners-Google*
Disallow:

Sitemap: http://{yourdomain.com}/forums/sitemap.xml
 
Top