1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

As Designed spam phrases: won't match 微信

Discussion in 'Resolved Bug Reports' started by rebelde, Jun 28, 2015.

  1. rebelde

    rebelde Member

    I tried to block 微信 in the Spam Phrases, but I couldn't get it to block the posts.

    The work-around is to make it a regular expression:

    Additional documentation suggestion:
    Also (as if you didn't have enough things to do!), I recommend that you create more extensive documentation and link to it from the adminCP, especially for Unicode matching. This Regex match seems to work without the /u, but others require it. It was not easy to figure this out.
  2. Chris D

    Chris D XenForo Developer Staff Member

    This seems to work as expected for me:



    Each spam phrase that isn't already a regex is normalised into one, e.g. the regex that runs on that example above is:

    Note the unicode modifer at the end.

    Are you certain, in your testing, it was with a user who will have their messages checked for spam? e.g. moderator/admin users and users who have exceeded certain criteria will not have their messages checked for spam.
  3. rebelde

    rebelde Member

    I'm very surprised that you can't replicate this. It happens on both my test forum and my active forums.

    Yes, using the same user with 6 posts (my limit is 10), I edit a post.

    If it has 微信, it does not match - edit allowed
    If it is /微信/u it matches and blocks - edit blocked.

    I change it back and forth, over and over and get the same results.

    Chris, I can let you into my test forums if it helps.
  4. Chris D

    Chris D XenForo Developer Staff Member

    Checking it on your test forum may be useful.

    Submit a ticket from your customer area with details and I will take a quick look.
  5. rebelde

    rebelde Member

    Ticket submitted. Thanks.
  6. Chris D

    Chris D XenForo Developer Staff Member

    Just an update on this.

    I found a specific reproduction case.

    I found that something like:

    Would work fine and the message would be rejected accordingly.


    Would cause the match to fail and the post be allowed.

    I haven't looked into the specifics, yet,
  7. Mike

    Mike XenForo Developer Staff Member

    If you block "test", it won't match "test2" as it looks for non-word characters afterwards. In CJK languages this is potentially problematic, but I'm not aware of a definitive way to have it work as expected in both cases.
  8. rebelde

    rebelde Member

    Here are a few ideas. The last one (#3) is the easiest:

    1. You could test for CJK characters first. If the phrase has CJK, then match the string instead of the word.
    We currently use this to catch any Korean due to Korean spam: /[가-힣]/u
    You could probably expand that to match Chinese and Japanese.

    2. Or just give an error or warning when somebody enters non-regex CJK into the Spam Phrases: "Only use regular expressions for CJK phrases."
    3. Just change the text in the AdminCP: "Only use regular expressions for CJK phrases."

    Additional text that can reduce confusion: "Use regular expressions to match a string of characters. Phrases without regular expressions will only match exact words."
  9. Mike

    Mike XenForo Developer Staff Member

    You don't actually have to use a regex -- simply using * around the words is sufficient. The system is basically the same as censoring in this regard (in terms of word detection).

    Based on this, I don't think anything is going to explicitly be changed here. I believe this is the only time we've actually had a question surrounding this (regarding CJK) in either censoring or spam prevention.
  10. cclaerhout

    cclaerhout Well-Known Member

    • If you want to target any sentences having in them the word 微信, you can try this:
      => this regex will still accept your members to quote those characters ; ie:
      This word "微信"
    • If you want to target any CJK sentences having in them the word 微信, you can try this:
      This regex will accept this kind of text though:
    • If you want to prevent any CJK words in your board (and full width characters), you can try this:
  11. rebelde

    rebelde Member

    Thanks Cédric (and everybody). We like CJK on our board, we just need to keep out the spammers. The Chinese ones mention QQ/微信, so we are blocking that. We haven't found an easy pattern to the Korean ones, so we match against the Unicode range: /[가-힣]/u I would test your {Hangul} Regex, but what we have is working well.

    We allow all this, but it is all sent to moderation to see if it is spam. It works great, as long as I can remember how the non-regex matches work...

Share This Page