Page indexing issue from google

Alvin63

Active member
Google console tells me there are page indexing issues with three of my pages. Not quite sure what that means or what to do about it. It says those pages are excluded by a "noindex" tag and so google can't crawl them. It gave these three

whatsnew/latest activity
search&type-resource
members/"my name"/trophies
 

Alvin63

Active member
Further down it says I have 10 not indexed pages and 14 indexed pages. All I've done is insert the google tag they gave me.
 

Nicolas FR

Well-known member
It says those pages are excluded by a "noindex" tag and so google can't crawl them. It gave these three

whatsnew/latest activity
search&type-resource
members/"my name"/trophies
Perfect, leave it like that. Nobody wants these pages to be indexed. It's not really an error, Google wanted to crawl them but you told him no in the robots.txt file. We all did the same.

Further down it says I have 10 not indexed pages and 14 indexed pages. All I've done is insert the google tag they gave me.
When your forum will have grown you will have thousands of unindexed pages, there also it is normal. All this will partly come from your robots.txt file and we have, with a few differences, the same. So again don't worry about that.
 

Alvin63

Active member
Cheers. Thanks very much. I kind of guessed there was some inbuilt thing telling google not to crawl those pages - private info perhaps? I didn't put anything in robots.txt so I'm guessing it's in there as standard to not crawl certain pages.
 

Nicolas FR

Well-known member
Not private information but pages that have no SEO interests.

The robots.txt file, for example, prevents posts from being indexed, unlike threads which are. This avoids diluting the good information of your forum in thousands, tens of thousands, millions of posts.
The purpose of the robots.txt file, among other things, is to tell Google what you want to do with your different content: index it or not.
 

Wildcat Media

Well-known member
Google has gotten so fussy with robots.txt lately that I'm afraid to do anything with it. I had a crawl delay set up on an older site since the search engine bots were bogging it down. A few years later I notice the site isn't being crawled anymore, and it was due to that crawl delay. Even if Google didn't support it, they were penalizing sites that had this line in their robots.txt file.
 

Alvin63

Active member
I am wondering if I have something wrong set up. I have done nothing re robots.txt since installing - and have just inserted the G code into the SE settings as google suggested. Google does seem to be complaining about the number of pages not indexed and sent me this notification

"
To the owner of https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Search Console has identified that your site is affected by 2 Coverage issue(s). The following issues were found on your site.

Top Issues

  • Blocked due to access forbidden (403)
  • Blocked due to other 4xx issue

We recommend that you fix these issues when possible to enable the best experience and coverage in Google Search."

I am really not sure what they are on about! The site loads fine and google analytics hasn't seemed to pick up any issues.
 

Wildcat Media

Well-known member
Do what @nicodak says, and check the individual pages linked in the search errors, following those links to your forum while logged out (which is how Googlebot sees it). Occasionally there are errors we don't see since we are logged in.
 

Alvin63

Active member
Thank you. According to my server "If you have pages that are blocked via .htaccess they will show a 403 but on purpose"

Now he has mentioned .htaccess I am remembering that I ticked the box to show friendly url's (which got rid of the index.php bit after my domain name on the address bar) and there was a comment under that tick box about needing to maybe set something up with .htacess.
 

Alvin63

Active member
Where would I see the errors in GSC? All I can see is the notification I posted above. Is there a particular section to look in?
 

Alvin63

Active member
I remember when I was questioning about ticking the "use full friendly urls" box - there was a bit underneath that said "If you enable this option, the links generated by the system will not include "index.php?". However, to enable this, mod_rewrite must be available and an appropriate .htaccess file (or the equivalent for your web server) must be in place."

And Brogan or someone else, commented I might need to change htaccess.txt to .htaccess if there were any issues. I am a bit nervous of doing that in file manager as have no idea what it means!
 

Alvin63

Active member
Server says mod_rewrite is enabled. Inside htaccess.txt it says this

"
# Mod_security can interfere with uploading of content such as attachments. If you
# cannot attach files, remove the "#" from the lines below.
#<IfModule mod_security.c>
# SecFilterEngine Off
# SecFilterScanPOST Off
#</IfModule>

ErrorDocument 401 default
ErrorDocument 403 default
ErrorDocument 404 default
ErrorDocument 405 default
ErrorDocument 406 default
ErrorDocument 500 default
ErrorDocument 501 default
ErrorDocument 503 default

<IfModule mod_rewrite.c>
RewriteEngine On

# If you are having problems with the rewrite rules, remove the "#" from the
# line that begins "RewriteBase" below. You will also have to change the path
# of the rewrite to reflect the path to your XenForo installation.
#RewriteBase /xenforo

# This line may be needed to workaround HTTP Basic auth issues when using PHP as a CGI.
#RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
RewriteRule ^(data/|js/|styles/|install/|favicon\.ico|crossdomain\.xml|robots\.txt) - [NC,L]
RewriteRule ^.*$ index.php [NC,L]
</IfModule>"
 

Alvin63

Active member
Apparently Hostinger supports both Apache and NGINX. But - elsewhere it says this

"Hostinger uses LiteSpeed Web Servers (LSWS) as the main web server software instead of Nginx or Apache. Compared to the other popular web servers,"
 

Alvin63

Active member
Up to date info on Hostinger site says this:

"At the beginning of 2019, Hostinger decided to try LiteSpeed Web Servers (LSWS) as the main web server instead of Apache. LiteSpeed is a drop-in replacement for Apache with .htaccess file support."

This .htaccess keeps being mentioned. I wish I knew what it was and whether I need it!
 

Nicolas FR

Well-known member
It's LSWS on your server, not Apache or Nginx.
But for the moment just log in to the GSC and go to Pages and check the errors shown.
 

Alvin63

Active member
It doesn't say errors on pages it says:

Not indexed 53 pages "These URLs are not indexed by Google. In some cases, this may be your intent; in other cases, it might be an error. Examine the issues in the table below to decide whether you need to fix these URLs."

Indexed 17 pages

Underneath that it lists pages that aren't indexed in different categories. There are five categories

1) Excluded by "noindex" tag
2) Blocked due to access forbidden (403)
3) Page with redirect
4) Blocked due to other 4xx issue
5) Crawled - currently not indexed.

If I click on that last one - crawled - currently not indexed it lists 44 pages some of which are forum software like trophies and smilies, but most of which are ordinary forum threads - and those should be being crawled.
 
Top