Hi - I don't know what your robots.txt file looks like, but my sites have been suffering from a low page indexation rate since I installed XenForo (2019 - transitioning from another CMS). I had a hunch that it was because of Google crawling a whole lot of unnecessary pages and I think I was right. I have two articles for you to refer to:
Indexable URLs: The Mother of All SEO Indicators 18th December 2014Annabelle Note: We changed the name from compliant to indexable in Botify to reflect SEO industry terminology. Nothing has changed except the name. Get rid of the noise and hear the music loud and clear! That’s essentially what...
www.botify.com
Crawl Budget Optimization for Classified Websites 1st August 2019Frank Vitovich Classified websites pose unique problems for SEOs — they often have millions of pages, multiple ways to sort and filter to find what you’re looking for, and a constantly changing inventory. At Botify, we often find...
www.botify.com
Apparently, crawl budget is big these days, even on smaller websites.
Up until mid November 2021, I didn't block much in the robots.txt file, but on November 9, 2021, I decided to give some blocking a try. I blocked the /member/ and /attachment/ pages. Those are very important to block. They used to give 403 errors (because I don't allow guests to see them), but now they aren't crawled at all. Other very important directories to block are /whats-new/, /search/, /goto/, and /posts/. I also blocked the /threads//post and /thread//latest URLs.
This is what I'm seeing so far. It's only been a month and some change.
Crawl rate is beginning to pick up. These are the 200 pages.
View attachment 262379
Crawled - Currently Not Indexed pages are going down, which is good.
View attachment 262380
And most importantly, valid pages are increasing.
View attachment 262381
This same thing is occurring across eight of my forums. I did the same thing to all of them.
Check this one out. This is a different site that's been languishing for years. Take a look at the valid pages jump.
View attachment 262382
If you're interested, this is what I blocked in my robots.txt files.
User-agent: *
Disallow: /forum/account/
Disallow: /forum/admin.php
Disallow: /forum/attachments/
Disallow: /forum/conversations/
Disallow: /forum/find-threads/
Disallow: /forum/forums/*/create-thread
Disallow: /forum/forums/*/post-thread
Disallow: /forum/goto/
Disallow: /forum/job.php
Disallow: /forum/login/
Disallow: /forum/logout/
Disallow: /forum/lost-password/
Disallow: /forum/members/
Disallow: /forum/misc/
Disallow: /forum/online/
Disallow: /forum/posts/
Disallow: /forum/profile-posts/
Disallow: /forum/register/
Disallow: /forum/search/
Disallow: /forum/search-forums/location
Disallow: /forum/threads/*/add-reply
Disallow: /forum/threads/*/approve
Disallow: /forum/threads/*/draft
Disallow: /forum/threads/*/latest
Disallow: /forum/threads/*/post
Disallow: /forum/threads/*/reply
Disallow: /forum/threads/*/unread
Disallow: /forum/whats-new/
My goal was to not allow Googlebot to crawl any page that I didn't want in the index.
I hope this information helps you. If you're using noindex on some pages and are allowing them to be crawled by Google, I can tell you that I've never had any luck doing that. I've always seen ranking drops when I allowed pages like that to be crawled. It's a waste of crawl budget. In my opinion, it's always better to block in the robots.txt file. In this case, I haven't seen any ranking increases, but I'm hopeful that will happen during the next few Google updates.