Xrumer 16.0 spam now targeting hidden fields / honeypots (core antispam)

Other than spam posts and reviewing ACP newest registrations, core has no other way to identify spam registrations?
 
Other than spam posts and reviewing ACP newest registrations, core has no other way to identify spam registrations?

The core doesn't have logs about what hits the honeypots which is why I was sad tenants didn't update the plugin back then. With Xenforo you don't see the hundreds of bits per hour or second it's keeping away.
 
Other than spam posts and reviewing ACP newest registrations, core has no other way to identify spam registrations?

nope, thats why using undetected IPs and mass registering before doing spam work is so effective. This is also why the core should never have used honeypots, they've effectively made it a huge target which is far less affective (at least the core honeypots are anyway)

How on earth can you log spammers if you have no way to detect them, and if you do detect them, why aren't you using that as a prevention mechanism
As far as the core honey pots, registration timer and APIs are concerned, there is nothing to detect with this new wave of clean IP bots
fbhp classical honey pots, fbhp proxy detection, fbhp browser bot detection, fbhp non browser bot detection will still show this information (and block)

You could check your aw stats and look for recent activity over the last 6 weeks from certain countries.

Staying ahead of the game, guessing what these bot applications are going to do before they even mentioning it is useful, thankfully I have my crystal ball,

This was always coming!
 
Last edited:
The core doesn't have logs about what hits the honeypots which is why I was sad tenants didn't update the plugin back then. With Xenforo you don't see the hundreds of bits per hour or second it's keeping away.

There was little point in updating it at that point, for just logs? The fbhp mechanisms has always been about blocking 100% of bots elegantly, if the core does it, then fbhp would have only been about logging.
Once the core copied many of fbhp mechanisms, it was obvious the classical honeypots were then a target, I just had to wait, watch and see how the bots were about to bypass it, just before they released, it was then worth striking again.

However, the xf core didn't copy everything, and made quite a few mistakes that make it incredibly easy to target, fbhp does not make these mistakes
I'm also keeping good methods secrete from now on, the core should not make the same mistake, not when while there are still mechanisms they should be putting into the core! (customImgCaptcha being one of many, it simply cant be targeted by virtue of it being customisable)

Now, fbhp is becoming necessary again, hence I'm updating it again, I suspect the next fbhp will block 100% bots elegantly again, I'm just testing it with 2 other forum admins now...
 
Last edited:
This is also why the core should never have used honeypots, they've effectively made it a huge target which is far less affective (at least the core honeypots are anyway)

Surely it will always be a cat and mouse game? Better to have it in the core and working for a couple of years before needing to get a core upgrade to keep up with the bots.

It's fascinating reading about your work in this field, thanks for fighting the fight for us!
 
Surely it will always be a cat and mouse game? Better to have it in the core and working for a couple of years before needing to get a core upgrade to keep up with the bots.

It's fascinating reading about your work in this field, thanks for fighting the fight for us!

They didn't need to do it, there are other mechanism that are soft, and currently almost impossible to bypass and can't be targeted .. like APIs, or customImgCaptcha to some degree (I can send xenforo devs ideas if they run out).
Using only mechanism that can't be targeted and putting them in the core, means other methods that can be targeted survive much longer

If mechanisms that can be targeted are put in the core, they will be targeted
If mechanism that can be bypasses/faked are used widely, they will be bypassed/faked ... this is the history of spam/security, going back to before we use to look at header information (which can, and now is always faked to look like browsers by bots)

  • headers... can be targeted, was widely used and thus was targeted <now dead as an anti-spam technique>
  • proxy detection... can be targeted, was widely used and thus was targeted <now fairly dead as an anti-spam technique>
  • googles original reCaptcha... can be targeted, was WIDELY used and thus was targeted MANY times<now dead as an anti-spam technique>
  • custom text captcha... can be targeted, was widely used (xf core) was targeted (xrumer actually held competitions solving these, even if you have an original question, it is manually solved and shared with thousands of botters<now dead as an anti-spam technique>
  • xenforo style honeypots (basic hidden fields) can be targeted, was widely used (xf core) and thus was targeted <now dead as an anti-spam technique>
  • js detection... can be targeted, was widely used and thus was targeted <now dead as an anti-spam technique>
  • registration timer ... can be targeted, wsa widely used (xf core) and thus was targeted <now dead as an anti-spam technique>
  • IP logging by APIs .. is hard to target, very WIDELY used, it will be targeted, there is a way around (mass IP sharing, using API look ups) , and one day this too will be a dead technique (to some degree this new wave of bots are already doing it with clean IPs and holding back their spam work until reaching a registration threshold)
  • Deep custom Image based problem solving.. is hard to target, but there is a way (neural network, once they get to the point of being trained on just about every thing, it wont matter that your image is custom anymore) .. it may also go the same way as txt captcha, but there are reasons this is harder (each img has 360,000 versions of it's self)
  • googles new NoCaptcha reCaptcha... is WIDELY used, thus CURRENTLY targeted (is passed by some browser bots), has been broken many times, will be broken again by xrumer soon (predicting we are about 3 months off)
Obviously there are more techniques that have in the past been widely used, and are now dead because of this.
Think of it like handing out antibiotics, for the sake of man kind, don't hand them out to every spotty little oik, you will make the mechanism ineffective faster than it needs to be


When we run out of anti-spam mechanism, what do you think will happen to all cms's, all sites that allow user interaction?

We will one day run out of hard mechanisms.. it is unavoidable. I have never come across an idea that I can not figure out a way around with AI (some harder than others, but all possible given the effort).
At the point we run out of mechanisms every site is in danger, from the smallest to the biggest (even things like facebook will have mass uncontrollable bot trouble in the future ... predicting we are 6-10 years off)
 
Last edited:
There was little point in updating it at that point, for just logs? The fbhp mechanisms has always been about blocking 100% of bots elegantly, if the core does it, then fbhp would have only been about logging.
Once the core copied many of fbhp mechanisms, it was obvious the classical honeypots were then a target, I just had to wait, watch and see how the bots were about to bypass it, just before they released, it was then worth striking again.

The thing is once it became a core method, even though it was bad. Losing the logs meant you saw absolutely nothing. Millions of bots, no evidence unless were going to parse server logs.

Right after 1.4 came out we started getting hit with spam. No real finger print to investigate from the ACP. But it was obvious there was a pattern. Captcha's didn't matter and we were told this is because it was humans not bots. But 23 year old female users are a bit odd for us. All using gmail and from either 1 of two indian isp's or 1 of two pakistani IP's. It was a problem that wasn't hard to solve.

The solution gave me back logging as well:
upload_2017-2-9_16-48-29.webp


Open ports are unreliable but they point to data center spam operations. Even if this guy didn't hit on SFS at all I would still know it was a spammer in the logs. That AS is a data center, I can't tell you the last time I had a person sign up from RDP lol.

Im not suggesting your product get into the business of logging these things. It's a slow process to get the data sometimes can be a burden to the registration system. But to me the logs tell me whats going on and when I lost them I lost my eye on the machine generated spam altogether. When I got spam I was in the dark as to what the cause was.
 
Last edited:
Yup, [from the date and the level to which the ip was detected (else where), it would have been a non js (non browser based) bot, probably from GSA
... at that point GSA weren't presenting a huge challenge (and still don't if I'm honest), xrumer had going quite ... no challenge .. no work for me]
Oh invisible text!


edit (what happened to shrike through): scrub that, from the date it could have been either gsa or xrumer


Honestly, there wasn't much call for fbhp in 1.4, updating it has taken quite a bit of time and effort because the core registration methods changed quite a lot in 1.4, updating it for just logs would be a lot of effort for little reward/need/demand (most people would have stopped using it regardless, the core replaced it's main purpose, to stop bots 100% elegantly ... for short a time)

Open ports are unreliable, the big problem is that it takes a long while to scan every port, even if you target the common ones, it produces a big lag on registration
Not all botters use common ports
And, not all bots require host open ports (locally running selenium for example)

Anyway, I'm sorry to hear your struggles, but the call for fbhp has really only just started again now xrumer is back with vengeance, it gives me something to get my teeth into (I like a challenge, bot logging would have been not very in demand, and certainly not a challenge).
 
Last edited:
We have no issues blocking data centers. Worst case scenario we block registration via VPN but people can still connect after registration.

The pattern we saw was they moved from their home nation ISP to rented servers and once they lost that they were using infected PCs with US home ISP IP addresses. It got a little nuts at that point but I think they might have lost their botnet because it stopped.

I did a test registration earlier and I typed so fast I had to wait out the timer. The submission was quick. Assuming a minimum registration timer I don't think asynchronously pulling this data is usually going to cause problems but the truth is that has nothing to do with machine spam does it? Non browser non js you can't do the look ups until after submission.
 
upload_2017-2-12_14-25-49.webp


Im really not even sure what to say about this one lol

Although I just thought of something clever to do with the TPU addon. Instead of blocking email addresses with the core, just auto rejected with TPU based on those domains so they aren't getting messages that make it obvious what email's are banned.
 
:) Most spammers don't use spam email accounts, not like that, that's a nice way of telling you iAmSpam
There are quite a few mail.ru and they can also automate registration against gmail, outlook, etc

But even when they automate against gmail, to make sure their email hasn't been registered before and to easily loop, they can do some spectacularly spammy looking things (unfortunately not all spammers are this dumb):

upload_2017-2-12_19-40-29.webp


Oh... just noticed, this was another that got pass the core honeypots, not js enabled though
 
Last edited:
This is a spammer that know's what they are doing, even switching IP addresses when re-attempting (trying to avoid) the core honeypots... still gets caught by fbhp :p
- The username is almost the same, but the last part switches with a pre-defined list. Hard to know its spam from username/email ... easy to know from just about everything else

upload_2017-2-12_19-51-14.webp
 
Last edited:
Ignoring the outliers, you can start to see the begging of a pattern

The average registration speed of bots is rising, these aren't going to suddenly drop (they are not peaks from 1 or 2 slow bots, but these bots are purposely bypassing the registration timer):

upload_2017-2-12_22-17-16.webp
 
Ignoring the outliers, you can start to see the begging of a pattern

The average registration speed of bots is rising, these aren't going to suddenly drop (they are not peaks from 1 or 2 slow bots, but these bots are purposely bypassing the registration timer):

View attachment 148091

They have been bypassing the timer for quite some time, it only catches older ones now.
 
Yes, the odd one a two, since the begging.
This was sometimes due to lag (site lag or bot lag), and sometimes because rarer bots purposely paused (custom non default settings/ bot plugins / custom bots)

Now it is not the odd one or two rare bots, it is the average, it is becoming common for newer bot versions to do this by default

You're right, it's only going to be catching old bots pretty soon
 
Yeah. When I started my anti spam bot addon for vB, I noticed they were passing that often. But the way I was using the honeypots, a bot never bypassed all of them, sometimes they missed some, but never did bots pass all of them due to the way I set them. Have to fool them as much as you can.
 
The other thing I did was not tell the bot they failed. Once they did fail, upon submitting the info, it would be stored, and the bot was served the successful registration page. They then would immediately start trying to post, but of course could not because they had no account. They would assume something is wrong with the sites registration process. :)
 
Yeah. When I started my anti spam bot addon for vB, I noticed they were passing that often. But the way I was using the honeypots, a bot never bypassed all of them, sometimes they missed some, but never did bots pass all of them due to the way I set them. Have to fool them as much as you can.
- I agree, strongly
In as many different ways as you can, as many times as you can, with no flags that tell anyone that it's a honeypot
- hidden fields are not the only types of honey pots, I still have quite a few tricks up my sleeves
Bots don't get bored, if they can re-attempt they will, often with different stratergies, sometimes trial and error. It's insane how few honeypots the core uses! (and only one is a good honeypot with a real hp field name)

The other thing I did was not tell the bot they failed. Once they did fail, upon submitting the info, it would be stored, and the bot was served the successful registration page. They then would immediately start trying to post, but of course could not because they had no account. They would assume something is wrong with the sites registration process. :)

- I like this idea, but when things go wrong, botters (humans) look through the logs (and sometimes watch) and see whats happening

I make sure all the responses go back as if it was a real "password field" / "username" etc, bots look for responses, they often use them to verify the type of field
but my way of preventing them from re-attempting is 2 fold
1) cache their IP and automativally block them out for several days (they don't get a 2nd chance, if I am really sure its a bot, it's not having another go)
- they only see a 403 forbidden / Unauthorised (can't remeber which now). This reduces queries, and server resources
2) send their IP to an API that I own (stopbotters) , this API then prevents re-attempts on all forums using that API (they wont even get a 2nd chance on other forums)
.. I don't keep these IP addresses for more than a couple of days, yet it's catching a higher % of bots than stopforumspam (who often get false positives). IP addresses really shouldn't be used as a method of anti-spam for detection more than a couple of days old.
 
Last edited:
I make sure all the responses go back as if it was a real "password field" / "username" etc, bots look for responses, they often use them to verify the type of field

Agreed. The way I did it was assign a random string to all normal fields, and all hidden fields. Hidden fields would appear the same as an normal field. They could not suss out which password, email, username etc was the real one and which one was fake. ;)
 
Yeah, but it's important for a good chunk of your honey pots too look like uuids too
and a good chunk to look like real fields (not just one, as the core does)
=>When bots switch strategies (and they do), both strategies will then hit the honey pots
... one strategy might look for labels, the other response, the other nearest text, the other form names..etc (honey pots must not give thier selves away for any of these)

And the number of honeypots needs to be higher for random attempts

Randomising and missing one honeypot out of six fields, yet still hitting the other 5 is easy (probability = 1 in 6 chance (.1667) : (5/6). (4/5). (3/4).(2/3).(1/2))
<= thus the core honey pot"s" is/are easy to bypass with simple random attempts!
Randomising and missing 12 out of 17, yet still hitting the other 5 is hard (probability = 1 in 16000 chance (.000161) : (5/17).(4/16).( 3/15).(2/14).( 1/13))

... still the core honeypots are dead, they need to think a bit more out of the box for things that wont be targeted
 
Top Bottom