Bot Management using robots.txt in XFcloud

CTS

Active member
Using the XFcloud for my instance, so I do not have server access or htaccess. Up front.

I know my ability to manage bots is limited, so my question revolves around the editing of robots.txt from within ACP.

I wish to "ask" Bytespider to cease indexing from my site.

I would like to use this code,...

Code:
User-agent: Bytespider
Disallow: /

User-agent: *
Disallow:

Now, if I were to append this to the bottom of the existing robots.txt, would there be any conflicts of the basic default robots.txt using the XFcloud instance.

Anybody have experience in XFcloud in safe ways to add or modify to the robots.txt?
 
Can't speak for using XFcloud and modifying robots.txt but bytespider ignores that file.
Can you add to the .htaccess file in your XF root directory?
Code:
BrowserMatchNoCase "Bytedance" bad_bot
BrowserMatchNoCase "Bytespider" bad_bot
BrowserMatchNoCase "Baiduspider" bad_bot
Order Deny,Allow
Deny from env=bad_bot
 
in Page_Container template, modify this as needed.

Code:
<meta name="robots" content="noindex, nofollow, noarchive, noodp, nosnippet, notranslate, noimageindex">
<meta name="googlebot" content="noindex, nofollow">
<meta name="googlebot-news" content="nosnippet">
<meta name="googlebot-video" content="noindex">
<meta name="googlebot-image" content="noindex">
<meta name="bingbot" content="noindex, nofollow">
<meta name="bingpreview" content="noindex, nofollow">
<meta name="msnbot" content="noindex, nofollow">
<meta name="slurp" content="noindex, nofollow">
<meta name="teoma" content="noindex, nofollow">
<meta name="Yandex" content="noindex, nofollow">
<meta name="baidu" content="noindex, nofollow">
<meta name="Yeti" content="noindex, nofollow">
<meta name="ia_archiver" content="noindex, nofollow">
<meta name="facebook" content="noindex, nofollow">
<meta name="twitter" content="noindex, nofollow">
<meta name="rogerbot" content="noindex, nofollow">
<meta name="LinkedInBot" content="noindex, nofollow">
<meta name="embedly" content="noindex, nofollow">
<meta name="slackbot" content="noindex, nofollow">
<meta name="W3C_Validator" content="noindex, nofollow">
<meta name="redditbot" content="noindex, nofollow">
<meta name="discordbot" content="noindex, nofollow">
<meta name="applebot" content="noindex, nofollow">
<meta name="pinterest" content="noindex, nofollow">
<meta name="smtbot" content="noindex, nofollow">
<meta name="googlewebmaster" content="noindex, nofollow">
<meta name="twitterbot" content="noindex, nofollow">
<meta name="tumblr" content="noindex, nofollow">
<meta name="slackbot" content="noindex, nofollow">
<meta name="flipboard" content="noindex, nofollow">
<meta name="qualaroo" content="noindex, nofollow">
<meta name="opensearch" content="noindex, nofollow">
<meta name="sogou" content="noindex, nofollow">
<meta name="exabot" content="noindex, nofollow">
<meta name="duckduckbot" content="noindex, nofollow">
<meta name="taptu" content="noindex, nofollow">
<meta name="outbrain" content="noindex, nofollow">
<meta name="Bytespider" content="noindex, nofollow">
 
Using the XFcloud for my instance, so I do not have server access or htaccess. Up front.

I know my ability to manage bots is limited, so my question revolves around the editing of robots.txt from within ACP.

I wish to "ask" Bytespider to cease indexing from my site.

I would like to use this code,...

Code:
User-agent: Bytespider
Disallow: /

User-agent: *
Disallow:

Now, if I were to append this to the bottom of the existing robots.txt, would there be any conflicts of the basic default robots.txt using the XFcloud instance.

Anybody have experience in XFcloud in safe ways to add or modify to the robots.txt?
Just to clarify this, you are not limited in editing your robots.txt at all. It is just done through an option in the admin CP for both convenience and to workaround no direct access to the server. Anything you want to put in robots.txt is fine and will work exactly the same way as editing the file directly.

Unfortunately that is one of the very very few drawbacks in the cloud. No htaccess access.
To be fair, if we used Apache, we'd probably have a UI to enable you to edit the .htaccess. But the bigger problem is we don't use Apache, we use Nginx, so the presence of a .htaccess file does nothing as that is basically exclusively for Apache.

It is concerning to me that Bytedance/spider are ignoring robots.txt. We may look at a more robust solution for this that we can implement centrally for all customers.
 
It is concerning to me that Bytedance/spider are ignoring robots.txt. We may look at a more robust solution for this that we can implement centrally for all customers.
Bytedance / Bytespeider even doesn't always use their own Useragent, they also use generic ones (like Chrome, etc.)
 
So this in the interim will be suitable for the short term if it is added to the default (cloud) robots.txt ?

Code:
User-agent: Bytespider
Disallow: /

User-agent: *
Disallow:

Until better solutions are implemented, I wish to make sure I do not hinder any of the other desired bots either.

tnx
 
This is the default robots.txt:

Code:
User-agent: PetalBot
User-agent: AspiegelBot
User-agent: AhrefsBot
User-agent: SemrushBot 
User-agent: DotBot
User-agent: MauiBot
User-agent: MJ12bot
Disallow: /

User-agent: *
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /misc/language
Disallow: /misc/style
Disallow: /posts/
Disallow: /login/
Disallow: /search/
Disallow: /whats-new/
Disallow: /admin.php
Allow: /

Sitemap: {sitemap_url}

It's sufficient for most cases. If you want to add Bytespider it changes to:

Code:
User-agent: PetalBot
User-agent: AspiegelBot
User-agent: AhrefsBot
User-agent: SemrushBot 
User-agent: DotBot
User-agent: MauiBot
User-agent: MJ12bot
User-agent: Bytespider
Disallow: /

User-agent: *
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /misc/language
Disallow: /misc/style
Disallow: /posts/
Disallow: /login/
Disallow: /search/
Disallow: /whats-new/
Disallow: /admin.php
Allow: /

Sitemap: {sitemap_url}
 
  • Like
Reactions: CTS
@Chris D

Watching over time since adding your suggestion to robots.txt,..

... it "appears" Bytespider may be complying. Their traffic has slowled down to a crawl (pun intended), so fingers crossed.
 
Top Bottom