Xrumer

ManagerJosh

Well-known member
So before you get your panties in a bunch, this was done in a controlled competition environment targeting boards only in the competition and not beyond a board outside the environment.

Anyways, I was at the Western Regional Collegiate Cyber Defense Competition this week, and there was a team of end-users designed to generate traffic and noise. One of the members had XRumer and I was shocked at how sophisticated the program is.

It's amazing what it can do. Simply drop in a text file with URLs you want to target, compose the message you want, click a button, and BAM. All thirteen student teams competing in the competition had their boards spammed. It was crazy!
 
Not sure if it would generate any response honestly, but from a forum admin perspective, it's rather scary.... I knew it was easy... i didn't know it was THAT easy

TBH I only know of one member who used to talk about Xrumer (I'm not 100% sure what it's used for) but what I got from it is that it was a spam utility or something along those lines and that's where my interest in it dropped.
 
XRumer lacks certain features.

It's ability to look for topic related forms, and then look for topic related threads/posts, and reply to those... is impressive (the way you can use this reminds me of when I use to play with AIML, see AIML)

The ability to spam multiple forums all at once (firing many thousand threads at once) is impressive, but it might be one of its down falls (I can't go into why)

It can avoid many APIs for a lot longer now, since many users are adopting xblack.txt to avoid spam reporting sites (so their proxies remain unreported for longer)

It can break QA CAPTCHA easily, since Xrumer users use their local Textcaptcha to input the answer for any forums that XRumer fails the text CAPTCHA (this is then shared centrally, so all bot users can get past the QA)..
bumff... QA is no longer a viable method (unless you like playing Russian roulette, and updating the QA frequently)

But..
  • For registration form filling, it doesn't really have a browser method to jump into, it can't be automated as if it was a browser.. on registration it only looks for certain ids/ names/order of fields (Although it does have a browser method for solving CAPTCHA)
  • and it also doesn't extend its self very well to plug-ins (If people could easily create plug-ins for it, all forums/cms sites would be in a lot of trouble)
Once those two features are possible, forums are in a very prone position... and I don' t think it's that far away
Imagine if users were creating pro-spam plug-ins for XRumer quicker than anti-spam plug-ins were created, that's a scary situation.

The most common CAPTCHA methods are already beaten (ReCaptcha etc). This isn't because the sets are particularly easy to train against... (in fact, it was very hard, because the sets were updated frequently) but because there are sets available to train against. This is the problem with all public CAPTCHA that provide the sets, even for a multi billion dollar coperation such as Google. If the data is available, you can train a neural network to work out most image based CAPTCHA sets (and this isn't just true for images, but js/flash games too). If a set is available, you can train against it, so the process can be automated... this is why customisation (not common CAPTCHA ) needs to be adopted to stop spam bots.

Those text files you mentioned, there are already many thousand publically available, in fact, there are sites dedicated to them

Just Google : XRrumer linklist

Some link lists are pure Xenforo link lists (don't be surprised if your own form is listed on them)

It is impressive, it will get worse. Knowing your enemy is important... knowing your enemies next move and what they are likely to do is just as important (script pausing will now be adopted to get around the core registration timer)....

For a system that has a valuable enough population, any weak mechanism* that is added to the core, will be coded against. QA was coded against using the Textcaptcha mechanism, registration timers will now be coded against using script pausing (slowing 1000's of threads down by a total of 10 seconds at the start.. which is nothing)


* When I say weak mechanism, I mean any mechanism that can be coded around. Weak mechanism can be 100% effective, up until the time they are codded around. API's are not essentially weak mechanism (but are rarely even close to 100% effective), however... many APIs do rely on weak mechanisms to gain the data, once all weak mechanisms have been coded against, we will be left with a smaller set of effective APIs
 
But..
  • It doesn't really have a browser method to jump into, it can't be automated as if it was a browser.. on registration it only looks for certain ids/ names/order of fields
  • and it also doesn't extend it's self very well to plug-ins (If people could easily create plug-ins for it, all forums/cms site would be in a lot of trouble)

Have you noticed any similarities in the user agent displayed for these bots? I'm wondering if it can be nailed by mod_security.
 
They use browser user_agents (and have done for many years). FoolBotHoneyPot does record quite a lot, one of the things it records is user agent. XRumer never looks like anything other than a Browser. Oddly, some browser based bots or Java bots forget to update their user_agent... but not XRumer bots, not for a long time


Anything that can be spoofed to prevent detection, will be spoofed to prevent detection. user_agent is a header value and can be altered
 
xrumer is the one that attacks XF based forums sites globally few months back.
all of us were affected by that xrumer updates.
 
QA is no longer a viable method (unless you like playing Russian roulette, and updating the QA frequently)
The problem there is people still think that using "what's 2 + 2" math questions or "Are you human?" yes or no answers are viable protection against bots. To get around that, I first disable CAPTCHA (since the only one being slowed by this method are the actual humans) and came up with my own image set. I have pictures of shapes and objects, and I ask people to identify something in the picture. Something simple like, "how many squares are this picture?" Then I reuse the pictures multiple times, asking for different things from the same pick. If it's a picture of someone that anyone visiting the site should be able to identify (i.e. the ever popular name this president pic), I ask "what's this person's last name." Not "This president's name is George _____". That's info that the bots can and have shared already. I never put in any information that's clearly an identifier of the picture. Keep your questions super generic. Don't have only yes/no answers, or only left/right answers. Mix it up, so there's nothing there for the bots to compare and properly guess.
 
Lets see...

You're an independent contractor specializing in "X". You write a great tutorial on the topic, taking a complex and often performed topic and make it simple, with supporting scripts. You want to use this to drive traffic to your site and generate business.

You have a list of forums and wikis that pertain to 'X', so you use it to post to all of them. Its not unique content, but its not spam either.You trade information for advertising.

Legitimate use, yes?
 
Top Bottom