Robots.txt and sitemap questions

Ok so I asked AI and it suggested a combination of Cloudflare Zero trust, plus editing htaccess with deny all except my ip address (rather than limiting the server to only allowing Cloudflare IP addresses). Does that sound like a plan?
 
Also, I still have this zip file in Public_html at the bottom of the list - when I first started the site I hadn't a clue how to upload it to the server, so the server did it for me or walked me through it - can't remember. And I think it involved unzipping the file within public_html. I assume that shouldn't still be there! Is it ok just to delete it?

Server file.webp
 
I havent followed this post so not sure if this has all been covered but you can regenerate a sitemap using a cron task and you take the url for the sitemap and upload it to GSC-Google Search Console. It will be generated automatically.

Regarding robots.txt. It is very important for forums due tocrawl budget especially on large forums. We dont want google wasting time crawling portions of the site that are not useful to search.
 
Apparently if I secure admin.php and /install via Cloudflare Zero Trust, then I need to ensure that only Cloudflare IP addresses pass through the server. Is that right?
No.

Otherwise someone could bypass Cloudflare to do something. But they also said there are downsides and limitations to restricting all IP addresses to the server going through Cloudflare.
No. We are just addressing securing two sections of your website and nothing more here.

Ok so I asked AI and it suggested a combination of Cloudflare Zero trust, plus editing htaccess with deny all except my ip address (rather than limiting the server to only allowing Cloudflare IP addresses). Does that sound like a plan?
No. To secure these 2 locations of your site you don't need to be limiting the server to only allowing Cloudflare IP addresses. That has no bearing on this at all. As for .htaccess you could do a similar ip only limitation through .htaccess but nothing to do with email addresses. If you setup the security for /admin.php and /install correctly through Cloudflare Zero Trust there is no need to mess with .htaccess at all.
 
Also, I still have this zip file in Public_html at the bottom of the list - when I first started the site I hadn't a clue how to upload it to the server, so the server did it for me or walked me through it - can't remember. And I think it involved unzipping the file within public_html. I assume that shouldn't still be there! Is it ok just to delete it?

View attachment 323131
Yes, it is safe to delete this file.
 
No.


No. We are just addressing securing two sections of your website and nothing more here.


No. To secure these 2 locations of your site you don't need to be limiting the server to only allowing Cloudflare IP addresses. That has no bearing on this at all. As for .htaccess you could do a similar ip only limitation through .htaccess but nothing to do with email addresses. If you setup the security for /admin.php and /install correctly through Cloudflare Zero Trust there is no need to mess with .htaccess at all.
Thank you. So there is no risk from someone getting into the server bypassing Cloudflare then?
 
Yes, it is safe to delete this file.
It didn't like that at all! When I went to delete it - big red warning and said you don't have permission to access this and everything disappeared. Reloaded file manager and it all seemed normal but the zip file still in there. I think I'll just leave it there if it's not a security risk!
 
Thank you. So there is no risk from someone getting into the server bypassing Cloudflare then?
No risk? We're dealing with the internet here, there is no such thing.

Your site has sat for years without either of these sections secured, without issue. You are now adding another layer of security through Cloudflare to better improve the robustness of your site.

Can Cloudflare be bypassed? Yes, it is possible, but you will always have your username/password combination as the first layer of security as a deterrent, the same way your site was secured before adding any of this.

If you are worried about Cloudflare being bypassed you can add additional security through .htaccess (directly on the server). Just do a search for securing files and directories with .htaccess on Google it is pretty straight forward.

This is getting way off topic to the original thread topic. You should do some searches for site security and/or start a new thread if you want additional site security help.
 
It didn't like that at all! When I went to delete it - big red warning and said you don't have permission to access this and everything disappeared. Reloaded file manager and it all seemed normal but the zip file still in there. I think I'll just leave it there if it's not a security risk!
It's just a permission issue. The file is safe to delete.

The file is currently residing in public_html so there is a risk of someone accessing and running that file - you really should get it deleted when you can.
 
Just as an update. Since doing the robots.txt it's gone from about 56 bots/guests at any given time to 180. Most of whom are guests rather than robots as I disallowed some of the robots. Also noting that Anthropic ignores robots text (it followed it for about two days and now it's back).
 
I think it does, yes. I left it out. Because google snippets use images.
Yeah, I would definitely keep attachments out from your robots.txt especially if you SEO those images and want them indexed. I tested this for 48 hours and created threads with attachments and none of those images were indexed as oppose to before without the disallow. Also, for Google Carousel or News, those images won’t show up otherwise.
 
Last edited:
This was my final version: However Anthropic is ignoring it. I've been quite alarmed at the high increase in bots crawling since adding the robots.txt. The recognised bots are just the same - so I should say a high increase in "guests" who are probably bots. It's tripled. Hence now looking at site security! I guess it depends what kind of images are on your site. Some may not want them appearing all over the place. Mine are just of pets.



User-agent: AspiegelBot
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: MauiBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: ImageSift
Disallow: /

User-agent: AnthropicBot
Disallow: /

User-agent: *
Disallow: /admin.php
Disallow: /account/
Disallow: /goto/
Disallow: /login/
Disallow: /register/
Disallow: /search/
Disallow: /help/
Disallow: /members/


Sitemap: https://www.xxxxxxxxxxxxxx.com/sitemap.xml
 
so I should say a high increase in "guests" who are probably bots
Would recommend you to use known bots, if you want to know what bots are actually coming.
Probably the best option to keep the bots data updated. It does nothing more than identify and tell you what bots (that are known) are coming to your site.
 
Back
Top Bottom