XF 2.0 Bot/Crawler Detection

Lukas W.

Well-known member
Judging from a few hints, I'm pretty sure there is a way, but I can't figure out exactly how to. Is there a way to determine if the current visitor is a crawler? Does this also function for facebooks open graph cralwer/user agent?
 
I doubt there's a necessity to build in anything extra actually. The online-list is well capable of filtering robots, so if that functionality is somehow available to the current user as well, that'd be all I need.
 
@katsulynx the XF bot detection is fairly limited, but if you are only interested in the major well behaved bots it should work.

Yes. I only want to filter them out, so they're not shown the age gate. I've tried to show the page under the following condition, but it still blocks the facebook open graph crawler even though it's afterwards listed as robot on the online list.
PHP:
if(empty(\XF::visitor()->Activity->robot_key)) {
    // User is not a robot.
}
 
Yes. I only want to filter them out, so they're not shown the age gate. I've tried to show the page under the following condition, but it still blocks the facebook open graph crawler even though it's afterwards listed as robot on the online list.
PHP:
if(empty(\XF::visitor()->Activity->robot_key)) {
    // User is not a robot.
}
Check the session object instead:
PHP:
/** @var \XF\Session\Session $session */
$session = \XF::session();
if (!empty($session['robot']))
{
// is a robot
}
Note; robot may not be set if the session hasn't been started, so use isset/empty
 
Hm. I'm running the code inside a listener that is bound to the controller_post_dispatch event, so I'd assume the session and user should already be set at that point (and dumping them on the page delivers expected results). However your code still does block the facebook open graph crawler out. Seems like the robot entry is empty for him at that point in time.
 
Hm. I'm running the code inside a listener that is bound to the controller_post_dispatch event, so I'd assume the session and user should already be set at that point (and dumping them on the page delivers expected results). However your code still does block the facebook open graph crawler out. Seems like the robot entry is empty for him at that point in time.
If that is the case, the facebook open graph crawler isn't being detected as a bot. Sounds like a bug report!
 
Still can't get it working. @Chris D, @Mike any idea why the robot_key could be empty there?
Curious, did you ever figure this out Lukas? I also have an age gate I'm trying to prevent being included in search indexing. Tried the bit below but I guess it's not working. I'm starting to see it show up on some of our indexed pages in search results. Testing from Google's search inspection tool, it's visible as well.

Code:
<xf:if is="!{$session.robotId}">
</xf:if>
 
Back
Top Bottom