XF 2.0 SEO Problems

Chad

Active member
Seems to me that Google is not picking up my site's threads/posts. I searched multiple forum categories and multiple older threads' titles on Google (copy/paste) and none of them show up at all? These are all unique titles too. I'm baffled.

Site: https://www.talkjesus.com/

Confirming
sitemap.php is in root directory
/public_html/internal_data/sitemaps has 2 sitemap files there too (see attached).
Domain is 16 years old
SEO set up correctly in admin panel
Google Search console: site verified + sitemap there as "success"

I would appreciate some help here. Thanks.

196151

196152

196154


Google Coverage

Saw this just now.

196155

196156

How do I fix this? 65,000 pages excluded cannot be right.

This is my htaccess

Code:
RewriteEngine On

RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.talkjesus.com/$1 [R,L]

# Mod_security can interfere with uploading of content such as attachments. If you
# cannot attach files, remove the "#" from the lines below.
#<IfModule mod_security.c>
# SecFilterEngine Off
# SecFilterScanPOST Off
#</IfModule>

## EXPIRES CACHING ##
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access 1 year"
ExpiresByType image/jpeg "access 1 year"
ExpiresByType image/gif "access 1 year"
ExpiresByType image/png "access 1 year"
ExpiresByType text/css "access 1 month"
ExpiresByType text/html "access 1 month"
ExpiresByType application/pdf "access 1 month"
ExpiresByType text/x-javascript "access 1 month"
ExpiresByType application/x-shockwave-flash "access 1 month"
ExpiresByType image/x-icon "access 1 year"
ExpiresDefault "access 1 month"
</IfModule>
## EXPIRES CACHING ##

# TN - BEGIN Cache-Control Headers
<ifModule mod_headers.c>
<filesMatch "\.(ico|jpe?g|png|gif|swf)$">
Header set Cache-Control "public"
</filesMatch>
<filesMatch "\.(css)$">
Header set Cache-Control "public"
</filesMatch>
<filesMatch "\.(js)$">
Header set Cache-Control "private"
</filesMatch>
<filesMatch "\.(x?html?|php)$">
Header set Cache-Control "private, must-revalidate"
</filesMatch>
</ifModule>
# TN - END Cache-Control Headers

Header unset Pragma
FileETag None
Header unset ETag

<IfModule mod_gzip.c>
mod_gzip_on Yes
mod_gzip_dechunk Yes
mod_gzip_item_include file \.(html?|txt|css|js|php|pl)$
mod_gzip_item_include handler ^cgi-script$
mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime ^application/x-javascript.*
mod_gzip_item_exclude mime ^image/.*
mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</IfModule>

<IfModule mod_deflate.c>
# Compress HTML, CSS, JavaScript, Text, XML and fonts
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/vnd.ms-fontobject
AddOutputFilterByType DEFLATE application/x-font
AddOutputFilterByType DEFLATE application/x-font-opentype
AddOutputFilterByType DEFLATE application/x-font-otf
AddOutputFilterByType DEFLATE application/x-font-truetype
AddOutputFilterByType DEFLATE application/x-font-ttf
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE font/opentype
AddOutputFilterByType DEFLATE font/otf
AddOutputFilterByType DEFLATE font/ttf
AddOutputFilterByType DEFLATE image/svg+xml
AddOutputFilterByType DEFLATE image/x-icon
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml

# Remove browser bugs (only needed for really old browsers)
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
Header append Vary User-Agent
</IfModule>

ErrorDocument 401 default
ErrorDocument 403 default
ErrorDocument 404 default
ErrorDocument 500 default

<IfModule mod_rewrite.c>
RewriteEngine On

# If you are having problems with the rewrite rules, remove the "#" from the
# line that begins "RewriteBase" below. You will also have to change the path
# of the rewrite to reflect the path to your XenForo installation.
#RewriteBase /

# This line may be needed to enable WebDAV editing with PHP as a CGI.
#RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]

RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$
RewriteCond %{REQUEST_URI} !^/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$
RewriteRule ^(data/|js/|styles/|install/|favicon\.ico|crossdomain\.xml|robots\.txt) - [NC,L]
RewriteRule ^.*$ index.php [NC,L]

</IfModule>

<Files 403.shtml>
order allow,deny
allow from all
</Files>

# php -- BEGIN cPanel-generated handler, do not edit
# NOTE this account's php is controlled via FPM and the vhost, this is a place holder.
# Do not edit. This next line is to support the cPanel php wrapper (php_cli).
# AddType application/x-httpd-ea-php72 .php .phtml
# php -- END cPanel-generated handler, do not edit
 
Last edited:
I looked at one thread and had to leave after about ten seconds due to eye strain of dark purple text on dark grey background.

Maybe Google is penalising you for that as it’s very close to masking the text.
 
Where do you see dark purple text on dark grey background?

There's over 300,000 posts and nearly 60,000 threads. Google is not penalizing me for that one mysterious page you saw. Plus, there's no grey background. It's black, aka "night mode" style design.
 
You appear to have additional rewrite rules that are not part of the normal XF .htaccess:

Code:
RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$
RewriteCond %{REQUEST_URI} !^/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$
 
When I do a Google search for pages from your site [site:talkjesus.com] I get about 7,200 or so results. When I manually check some of the older threads on your site [info:<insert URL here>] they have not been indexed.

Have you moved from http to https recently? Have you changed site style?
 
When I do a Google search for pages from your site [site:talkjesus.com] I get about 7,200 or so results. When I manually check some of the older threads on your site [info:<insert URL here>] they have not been indexed.

Have you moved from http to https recently? Have you changed site style?

https has been enabled for at least one year. The site was redesigned back in August (light mode) and new night mode back around September.
 
Do you know how many indexed pages you had before those changes? I know from experience it has taken nearly three years for our main site to recover from the https change - indexing volume wise.
 
Ok, I removed those 3 lines. They're inserted by cPanel automatically.
There's no point in blindly making changes without evidence to justify making those changes. Why are they added by cPanel? What do they do? They might be important.

The information as to what makes up the 60K+ is right there. The majority of them are listed as "Crawled - currently not indexed". I'm not certain but I believe you should be able to drill down into each of these down to specific URLs.

It's just going to be a case of going through the different error categories, trying to decipher what those Google messages are trying to tell you, and also analysing any of the URLs to see if there is anything pertinent that can be improved.

There's not going to be any silver bullet nugget of advice to give you on how you can fix it, and in some cases, there might not even be anything you can do. From what I've read, the "Currently not indexed" error is fairly generic and essentially boils down to "Google's prerogative".
 
Why don't you go into your google webmaster tools in the search console and see if there are any errors. There is a place to see which URLs are not getting crawled if they have errors. At least then you would be able to find out which ones have errors and which ones don't
 
There's no point in blindly making changes without evidence to justify making those changes. Why are they added by cPanel? What do they do? They might be important.

The information as to what makes up the 60K+ is right there. The majority of them are listed as "Crawled - currently not indexed". I'm not certain but I believe you should be able to drill down into each of these down to specific URLs.

It's just going to be a case of going through the different error categories, trying to decipher what those Google messages are trying to tell you, and also analysing any of the URLs to see if there is anything pertinent that can be improved.

There's not going to be any silver bullet nugget of advice to give you on how you can fix it, and in some cases, there might not even be anything you can do. From what I've read, the "Currently not indexed" error is fairly generic and essentially boils down to "Google's prerogative".

I clicked on the "Crawled - currently not indexed" and see this, all random threads/posts/categories so nothing specific to pinpoint.

Screenshot_2019-02-20 Coverage(2).webp

Their explanation is as vague as can be: https://support.google.com/webmasters/answer/7440203#crawled

Crawled - currently not indexed: The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling.

So, I started clicking on the link examples and got more info.

Example link: https://www.talkjesus.com/posts/322075/

Error:

Screenshot_2019-02-20 Coverage(2).webp


Another Example: https://www.talkjesus.com/threads/word-for-today-stop-striving.63424/
Result: "URL is on Google"

Another one: https://www.talkjesus.com/posts/322119/

Redirect issue.

I would appreciate help fixing this. I provided XF config screenshot and htaccess copy in my OP.
 

Attachments

  • Screenshot_2019-02-20 URL Inspection.webp
    Screenshot_2019-02-20 URL Inspection.webp
    30.4 KB · Views: 36
Why don't you go into your google webmaster tools in the search console and see if there are any errors. There is a place to see which URLs are not getting crawled if they have errors. At least then you would be able to find out which ones have errors and which ones don't

Did you actually read anything in this thread and view the screenshots? This is where I'm getting all the "errors". The search console.
 
Did you actually read anything in this thread and view the screenshots? This is where I'm getting all the "errors". The search console.

Yes I just noticed that just now. I had missed it. Odd that there aren't any errors at all in your search console. I suppose you have a robots.txt file though google will crawl you without one. Not sure what the issue is
 
Curious,

So because my robots.txt was missing for however long and I just replaced it an hour ago, do you think this will remove a good portion of the "crawled but not indexed" warnings and improve the SEO?

I found another issue in "coverage" area of the search console in Google:

URL is not available to Google
It cannot be indexed.
Page fetch - Failed: Crawl anomaly

Same with many other member links. Is that normal?

Again, this is my robots:

Code:
User-agent: *
Disallow: find-new/
Disallow: account/
Disallow: attachments/
Disallow: goto/
Disallow: posts/
Disallow: login/
Disallow: admin.php
Allow: /

Sitemap: https://www.talkjesus.com/sitemap.php
 
Last edited:
Back
Top Bottom