XF 2.0 SEO Problems

Chad

Active member
Seems to me that Google is not picking up my site's threads/posts. I searched multiple forum categories and multiple older threads' titles on Google (copy/paste) and none of them show up at all? These are all unique titles too. I'm baffled.

Site: https://www.talkjesus.com/

Confirming
sitemap.php is in root directory
/public_html/internal_data/sitemaps has 2 sitemap files there too (see attached).
Domain is 16 years old
SEO set up correctly in admin panel
Google Search console: site verified + sitemap there as "success"

I would appreciate some help here. Thanks.

196151

196152

196154


Google Coverage

Saw this just now.

196155

196156

How do I fix this? 65,000 pages excluded cannot be right.

This is my htaccess

Code:
RewriteEngine On

RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.talkjesus.com/$1 [R,L]

# Mod_security can interfere with uploading of content such as attachments. If you
# cannot attach files, remove the "#" from the lines below.
#<IfModule mod_security.c>
# SecFilterEngine Off
# SecFilterScanPOST Off
#</IfModule>

## EXPIRES CACHING ##
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access 1 year"
ExpiresByType image/jpeg "access 1 year"
ExpiresByType image/gif "access 1 year"
ExpiresByType image/png "access 1 year"
ExpiresByType text/css "access 1 month"
ExpiresByType text/html "access 1 month"
ExpiresByType application/pdf "access 1 month"
ExpiresByType text/x-javascript "access 1 month"
ExpiresByType application/x-shockwave-flash "access 1 month"
ExpiresByType image/x-icon "access 1 year"
ExpiresDefault "access 1 month"
</IfModule>
## EXPIRES CACHING ##

# TN - BEGIN Cache-Control Headers
<ifModule mod_headers.c>
<filesMatch "\.(ico|jpe?g|png|gif|swf)$">
Header set Cache-Control "public"
</filesMatch>
<filesMatch "\.(css)$">
Header set Cache-Control "public"
</filesMatch>
<filesMatch "\.(js)$">
Header set Cache-Control "private"
</filesMatch>
<filesMatch "\.(x?html?|php)$">
Header set Cache-Control "private, must-revalidate"
</filesMatch>
</ifModule>
# TN - END Cache-Control Headers

Header unset Pragma
FileETag None
Header unset ETag

<IfModule mod_gzip.c>
mod_gzip_on Yes
mod_gzip_dechunk Yes
mod_gzip_item_include file \.(html?|txt|css|js|php|pl)$
mod_gzip_item_include handler ^cgi-script$
mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime ^application/x-javascript.*
mod_gzip_item_exclude mime ^image/.*
mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</IfModule>

<IfModule mod_deflate.c>
# Compress HTML, CSS, JavaScript, Text, XML and fonts
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/vnd.ms-fontobject
AddOutputFilterByType DEFLATE application/x-font
AddOutputFilterByType DEFLATE application/x-font-opentype
AddOutputFilterByType DEFLATE application/x-font-otf
AddOutputFilterByType DEFLATE application/x-font-truetype
AddOutputFilterByType DEFLATE application/x-font-ttf
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE font/opentype
AddOutputFilterByType DEFLATE font/otf
AddOutputFilterByType DEFLATE font/ttf
AddOutputFilterByType DEFLATE image/svg+xml
AddOutputFilterByType DEFLATE image/x-icon
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml

# Remove browser bugs (only needed for really old browsers)
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
Header append Vary User-Agent
</IfModule>

ErrorDocument 401 default
ErrorDocument 403 default
ErrorDocument 404 default
ErrorDocument 500 default

<IfModule mod_rewrite.c>
RewriteEngine On

# If you are having problems with the rewrite rules, remove the "#" from the
# line that begins "RewriteBase" below. You will also have to change the path
# of the rewrite to reflect the path to your XenForo installation.
#RewriteBase /

# This line may be needed to enable WebDAV editing with PHP as a CGI.
#RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]

RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$
RewriteCond %{REQUEST_URI} !^/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$
RewriteRule ^(data/|js/|styles/|install/|favicon\.ico|crossdomain\.xml|robots\.txt) - [NC,L]
RewriteRule ^.*$ index.php [NC,L]

</IfModule>

<Files 403.shtml>
order allow,deny
allow from all
</Files>

# php -- BEGIN cPanel-generated handler, do not edit
# NOTE this account's php is controlled via FPM and the vhost, this is a place holder.
# Do not edit. This next line is to support the cPanel php wrapper (php_cli).
# AddType application/x-httpd-ea-php72 .php .phtml
# php -- END cPanel-generated handler, do not edit
 
Last edited:

nodle

Well-known member
Don't feel bad. I went though the same thing as you, I messed with everything that I could. I tried to research the errors as well and basically get a generic answer from Google on it. Honestly the only thing that I can narrow it down to is Google just doesn't like your content.

Screenshot_2019-02-20 Coverage.png
 

djbaxter

Well-known member
You appear to have additional rewrite rules that are not part of the normal XF .htaccess:

Code:
RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$
RewriteCond %{REQUEST_URI} !^/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$
Ok, I removed those 3 lines. They're inserted by cPanel automatically.

Hopefully I can get more help on figuring out the rest of my problem.
There's no point in blindly making changes without evidence to justify making those changes. Why are they added by cPanel? What do they do? They might be important.

The information as to what makes up the 60K+ is right there. The majority of them are listed as "Crawled - currently not indexed". I'm not certain but I believe you should be able to drill down into each of these down to specific URLs.

It's just going to be a case of going through the different error categories, trying to decipher what those Google messages are trying to tell you, and also analysing any of the URLs to see if there is anything pertinent that can be improved.

There's not going to be any silver bullet nugget of advice to give you on how you can fix it, and in some cases, there might not even be anything you can do. From what I've read, the "Currently not indexed" error is fairly generic and essentially boils down to "Google's prerogative".
You probably want to put those lines back.

Yes, they are inserted by cPanel when you enable the AutoSSL domain security certificate by Comodo.
 

djbaxter

Well-known member
Curious,

So because my robots.txt was missing for however long and I just replaced it an hour ago, do you think this will remove a good portion of the "crawled but not indexed" warnings and improve the SEO?

I found another issue in "coverage" area of the search console in Google:

URL is not available to Google
It cannot be indexed.
Page fetch - Failed: Crawl anomaly

Same with many other member links. Is that normal?

Again, this is my robots:

Code:
User-agent: *
Disallow: find-new/
Disallow: account/
Disallow: attachments/
Disallow: goto/
Disallow: posts/
Disallow: login/
Disallow: admin.php
Allow: /

Sitemap: https://www.talkjesus.com/sitemap.php
Why would you disallow posts?
 

Chad

Active member
I made an edit to ours shortly after it was posted. As long as it roughly matches that, I’d say it’s ok.

Thanks. Updated mine to the below since my installation is in root, not "community".

Code:
User-agent: *
Disallow: whats-new/
Disallow: account/
Disallow: attachments/
Disallow: goto/
Disallow: posts/
Disallow: login/
Disallow: admin.php
Allow: /

Sitemap: https://www.talkjesus.com/sitemap.php

Is the end trailing slash needed or is this correct?
 

usAdultAds

Active member
Don't feel bad. I went though the same thing as you, I messed with everything that I could. I tried to research the errors as well and basically get a generic answer from Google on it. Honestly the only thing that I can narrow it down to is Google just doesn't like your content.

View attachment 196181

I have the same issue, however, let's say you have 5 people having the similar discussions over and over, Google is going to choose what it thinks is the best content based on several factors, including engagement, then Google will ignore the rest. Even 5 people can create 100s of posts in a short period of time, but that does not mean Google will keep each and every one of those posts, as some or most could be similar in nature. There is another issue that forum has to deal with also, and that is called "thin" content, if the post appears too shallow, then Google may ignore those posts. 3 months ago, I had 10,000 results for my forum, today 2,300 results, and every blog post I have made is still indexed.
 
Last edited:

Ludachris

Well-known member
Just saw this in Google Search console. So apparently around end of December it started dropping steeply. I don't understand why.


View attachment 196280
I know this is old but I wanted to chime in to say that Chad, the drop in coverage may not have been due to anything you did on your end. It could have very well been an algorithm update on Google's side that negatively impacted your rankings/indexing. In order to try and find out what the impact is you have to research their algo updates and see if there is any info that you can find about it, and then see if you can remedy the situation. It's not always easy, and in many cases you simply have to accept it and find ways to improve elsewhere.

As for having a big chunk of pages not indexed, I'll say from experience that this can be pretty normal. Google is known to not always index all pages on a website. They focus on what they feel is the best and most "valuable" content on your site based on their algorithm. As someone said above, if Google feels that a lot of threads are focused on the same topic (repetitive in nature) the algo might decide to choose one thread to represent several of them in the search results. This is not only common on forums but also e-commerce sites where a lot of products have very similar naming schemes and descriptions. But there are other reasons the algo omits pages besides "duplicate content" too. Again, you have to keep up with all of the algo changes - and unfortunately, since they don't publish what has been changed, you have to rely on "SEO experts" who publish articles on what they think changed.

It's frustrating to hear this, but sometimes there truly is very little you can do to counteract these indexing and ranking problems. You just have to make sure you have all the technical settings right on the back end, make sure Google can spider the site effectively, make sure you're not violating any obvious rules, and then just focus on good content.
 
Top