XF 2.1 Link/BBCode processing in a post, and spam detection

Wildcat Media · Jun 1, 2020

We have been using regex in "Spam Phrases" to find any link pasted into a site where the string starts with https:// or http:// and it has worked well. /^https?:\/\/\S+\n/si

I found a new angle. I found that if someone pastes in a link to a site, like www.xenforo.com without the https:// or http:// prefix, it is still being pasted into a post as a hyperlink. Here's one I found twice already (the BBCode view):

This is me favorite vapeshop [URL='http://www.aquavape.co.uk']www.aquavape.co.uk[/URL]

I tried a test post and in fact, I can do that here if I type in www.xenforo.com -- it will automatically morph into a hyperlink.

I would like to eliminate this source of spam. Now that they've detected how some of us are preventing spam, we need to mitigate this. There are two ideas I came up with:

Can we somehow disable this automatic hyperlinking when it is not preceded by http or https? (I'm not sure if it's a function of the editor, or something in XenForo that processes it.)
Does XenForo process the Spam Phrases after BBCode is generated? I'm thinking that if this is the case, could we filter on the [url= BBCode to trap the spam?

Or, any other ideas?

Paul B · Jun 9, 2020

You can add multiple strings to the spam phrases to catch other instances.

For example:

Code:

/\[url=("|')?([^"'\]]+)("|')?\].*\[url\]\2\[/si
/\[url=("|')?([^"'\]]+)("|')?\].*\[url=("|')?\2("|')?\]/si
/^[a-z0-9-]+\.(com|net|org)\/\S+\n/si

Wildcat Media · Jun 9, 2020

Brogan said:
You can add multiple strings to the spam phrases to catch other instances.

For example:

Code:

/\[url=("|')?([^"'\]]+)("|')?\].*\[url\]\2\[/si /\[url=("|')?([^"'\]]+)("|')?\].*\[url=("|')?\2("|')?\]/si /^[a-z0-9-]+\.(com|net|org)\/\S+\n/si

That's an interesting batch of regex.

The third one makes sense (and I would add quite a few TLDs to that list). The first two seem to capture the [url=] in posts. Those will give me a few options to try in our spam filtering. Thanks much!

Chromaniac · Aug 27, 2020

Code:

/\[url=("|')?([^"'\]]+)("|')?\].*\[url\]\2\[/si
/\[url=("|')?([^"'\]]+)("|')?\].*\[url=("|')?\2("|')?\]/si
/^[a-z0-9-]+\.(com|net|org)\/\S+\n/si
/^https?:\/\/\S+\n/si

So I have this in Spam phrases. And Maximum messages to check for spam is set at 50. What happens with this configuration? Any member with less than 50 posts. Any post of theirs with a link in it goes to moderation?

Paul B · Aug 27, 2020

Chromaniac · Aug 27, 2020

Thanks. Is there any configuration that would moderate posts containing particular keywords without any post limits? Censor does not work here because it replaces keywords.

Paul B · Aug 27, 2020

There's nothing built in which will allow the first 50 posts to be moderated for all links, and all subsequent posts to be moderated based on keywords.

Max Taxable · Aug 27, 2020

Chromaniac said:
So I have this in Spam phrases. And Maximum messages to check for spam is set at 50.

I set this only at 1. Posts aren't counted until they are published (approved) so that even if a spammer makes 10000 posts, the post count is still zero anyway.

Chromaniac · Aug 27, 2020

I am sorry to take a bit more of your time. Just one more query on same topic. What does this do? Submit content without approval in group permissions... If I disable it for a user, all his posts go to moderation? Or is this connected to node's moderation settings as well?

Thanks a lot!

Paul B · Aug 27, 2020

It allows members to bypass the moderation queue.

We use it here for verified license holders who may otherwise be caught by the spam phrases when posting the first few messages.

Chromaniac · Aug 30, 2020

Brogan said:
Yes.

This is weird. I haven't seen any post landing in moderation since I added these to spam phrases box. So I decided to make a test account and test through it. I posted following content (to test all three types of links)

Code:

[url=https://google.com/]Google[/url]
https://google.com
www.google.com

And the post was published instantly. Didn't land in moderation. I wonder what I am doing wrong.

Chromaniac · Aug 30, 2020

I am kind of baffled now. Shouldn't it work if I just add [url to the spam phrases? It seems to but OP says it doesn't! Was this behavior modified in 2.2? Thanks!

Wildcat Media · Nov 24, 2020

I was revisiting this today and I came to the same conclusion--the above regex is not catching any example starting with [url . However, I was playing around at Regex 101 and came up with this:

/\[url(.*)\]/is

And that seems to work...as regex. Untested in XF 2.2 though. The other examples above did not work at Regex 101. Essentially it will catch [url followed by a closing ]. So any combination of URL and quotes, it seems to catch it. Maybe someone with more regex knowledge than myself can improve on it.

Likewise, /^https?:\/\/\S+\n/si still seems to work.

So I also might guess that the spam filtering is done after the post is parsed with BBCode? I mean, some spammers out there may just dump in text containing a link, but a few others might include the BBCode in their post text, knowing that most forums will render it.

XF 2.1 Link/BBCode processing in a post, and spam detection

Wildcat Media

Well-known member

Paul B

XenForo moderator

Wildcat Media

Well-known member

Chromaniac

Well-known member

Paul B

XenForo moderator

Chromaniac

Well-known member

Paul B

XenForo moderator

Max Taxable

Well-known member

Chromaniac

Well-known member

Paul B

XenForo moderator

Chromaniac

Well-known member

Chromaniac

Well-known member

Wildcat Media

Well-known member

Similar threads

We value your privacy