Hi - I don't know what your robots.txt file looks like, but my sites have been suffering from a low page indexation rate since I installed XenForo (2019 - transitioning from another CMS). I had a hunch that it was because of Google crawling a whole lot of unnecessary pages and I think I was right. I have two articles for you to refer to:
Discover why indexable URLs are crucial for SEO success. Learn how to minimize "noise" on your website for optimal search engine visibility.
www.botify.com
Unlock the potential of your classified website with crawl budget optimization. Learn how to improve SEO on millions of pages efficiently.
www.botify.com
Apparently, crawl budget is big these days, even on smaller websites.
Up until mid November 2021, I didn't block much in the robots.txt file, but on November 9, 2021, I decided to give some blocking a try. I blocked the /member/ and /attachment/ pages. Those are very important to block. They used to give 403 errors (because I don't allow guests to see them), but now they aren't crawled at all. Other very important directories to block are /whats-new/, /search/, /goto/, and /posts/. I also blocked the /threads//post and /thread//latest URLs.
This is what I'm seeing so far. It's only been a month and some change.
Crawl rate is beginning to pick up. These are the 200 pages.
View attachment 262379
Crawled - Currently Not Indexed pages are going down, which is good.
View attachment 262380
And most importantly, valid pages are increasing.
View attachment 262381
This same thing is occurring across eight of my forums. I did the same thing to all of them.
Check this one out. This is a different site that's been languishing for years. Take a look at the valid pages jump.
View attachment 262382
If you're interested, this is what I blocked in my robots.txt files.
User-agent: *
Disallow: /forum/account/
Disallow: /forum/admin.php
Disallow: /forum/attachments/
Disallow: /forum/conversations/
Disallow: /forum/find-threads/
Disallow: /forum/forums/*/create-thread
Disallow: /forum/forums/*/post-thread
Disallow: /forum/goto/
Disallow: /forum/job.php
Disallow: /forum/login/
Disallow: /forum/logout/
Disallow: /forum/lost-password/
Disallow: /forum/members/
Disallow: /forum/misc/
Disallow: /forum/online/
Disallow: /forum/posts/
Disallow: /forum/profile-posts/
Disallow: /forum/register/
Disallow: /forum/search/
Disallow: /forum/search-forums/location
Disallow: /forum/threads/*/add-reply
Disallow: /forum/threads/*/approve
Disallow: /forum/threads/*/draft
Disallow: /forum/threads/*/latest
Disallow: /forum/threads/*/post
Disallow: /forum/threads/*/reply
Disallow: /forum/threads/*/unread
Disallow: /forum/whats-new/
My goal was to not allow Googlebot to crawl any page that I didn't want in the index.
I hope this information helps you. If you're using noindex on some pages and are allowing them to be crawled by Google, I can tell you that I've never had any luck doing that. I've always seen ranking drops when I allowed pages like that to be crawled. It's a waste of crawl budget. In my opinion, it's always better to block in the robots.txt file. In this case, I haven't seen any ranking increases, but I'm hopeful that will happen during the next few Google updates.