chillibear
Well-known member
Totally agree. Were it not for the quantity and speed of the AI bot "browsing" and the subsequent load it'd not get flagged for the most part. Whilst annoying I tend to only cull the visits that cause heavy load at present whilst we (like ES Dev Team) ponder updating some of our code (also using Clickhouse FWIW) to be a little more intelligent. It does seem at least for now they are still going for direct requests so an analysis of traffic for a given "AI scraper" doesn't look like a normal user at least. However given the photos you see online of automated mobile phone farms and of course conventional headless browsers I'd not be surprised if we saw more stuff that is indistinguishable from normal traffic (except maybe in the speed). However I suppose making those extra requests for JS/Images/CSS and so forth must add up on the scraper's side so maybe they'll stick with what they have. It's actually been quite quiet on the scraping front for us at least the last week or two, just the normal better behaved bots.modern bots can slip past CDN-level checks. That’s exactly why the application-layer approach outlined by ES Dev Team is becoming increasingly important.
I'd be quite interesting if XF internally had more sense of "usage checking" to identify real visitors from bots and so forth. I did start writing some statistical analysis code (outside of the XF codebase) for users at the start of the year to idly see if we might use it to highlight suspicious accounts, but alas "real work" got in the way and I've not gotten back to it yet.