Won't fix Whitespace username possible (aka unicode)

Xon

Well-known member
XF's unicode validation is permissive enough to accept "︈ ︈ ︈" (Unicode with the hexadecimal string; EFB88820EFB88820EFB888 )
 
Last edited:

Mike

XenForo developer
Staff member
I don't think we're going to make any changes here. Clearly Unicode offers a huge range of characters, some of which aren't directly visible.

In this case, they're using variation selectors and there are some legit uses of this: https://en.wikipedia.org/wiki/Variant_form_(Unicode)

Uses: http://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt and http://unicode.org/Public/emoji/5.0/emoji-variation-sequences.txt

While not all of the variation selectors are used right now, they do have a legitimate use case.

Realistically, I think active moderation may be a situation where dealing with this is necessary (or applying much more restrictive user name limits, if applicable to your forum).
 

Sim

Well-known member
Is there no way to programmatically detect unprintable characters and restrict their usage?

How are we supposed to detect unprintable characters from a moderation point of view?

A "blank" username is fairly obvious, but how about a username with arbitrary whitespace padding?
 

Mike

XenForo developer
Staff member
Is there no way to programmatically detect unprintable characters and restrict their usage?
"Unprintable" isn't really a directly meaningful thing. Even characters that themselves have no visible output can be used legitimately. Some examples include emoji color variants and emoji flags. As noted, the particular characters here do change how text is displayed when supported; in some cases, this even includes when used in conjunction with some basic ASCII characters.

The exact meaning would vary from font to font and system to system. In theory there may be some things that can be done, but Unicode currently defines over 275,000 code points, over 137,000 characters. There is a huge range of what could be considered valid.

How are we supposed to detect unprintable characters from a moderation point of view?

A "blank" username is fairly obvious, but how about a username with arbitrary whitespace padding?
Well presumably usernames are the primary situation where it comes up and you can restrict those to match a particular set of characters/match via the user name regex function.

There are plenty of challenges that Unicode theoretically allows independent of characters that don't have printable output independently (confusables: http://www.unicode.org/Public/security/8.0.0/confusables.txt).
 

Sim

Well-known member
Well presumably usernames are the primary situation where it comes up and you can restrict those to match a particular set of characters/match via the user name regex function.
Yes, usernames are my primary concern.

Can you give us an example username regex for basic alpha numeric and spaces?

Could we just use something like /[:ascii:]/i - would that work? Would we be better using /[:graph:]\s/i to avoid control characters included in :ascii:?

Then if we want to exclude @ symbols, how about /[:graph:]\s^@/i ?
 

Alfa1

Well-known member
It seems to me that this will allow users to register an accoutn with the same name as another member or even staff member, but with an invisible character added to it. That seems a security / abuse issue.
 

Mike

XenForo developer
Staff member
Can you give us an example username regex for basic alpha numeric and spaces?
This should work: (The option currently adds delimiters and is case insensitive.)
Code:
^[a-z0-9 ]+$
 

StarArmy

Well-known member
Whoa whoa whoa, hold up.

Does this also mean that users are going to be able to register with emoji usernames? Like, I could register as "🏁?"
 

Brogan

XenForo moderator
Staff member
Would it be better in the example above if the member registered with a user name of "Flag of South Korea Flag of South Korea Flag of South Korea" ?

;)
 

Brogan

XenForo moderator
Staff member
Just handle it as you would any other moderation issue, assuming it's against the terms and rules of the site.

I've seen some sites where that would be considered positively benign.
 

Xon

Well-known member
Could we just use something like /[:ascii:]/i - would that work? Would we be better using /[:graph:]\s/i to avoid control characters included in :ascii:?

Then if we want to exclude @ symbols, how about /[:graph:]\s^@/i ?
gogo regex abuse:
Code:
^[ -~]+$
This matches all asci characters from space to tilde, which happens to match all printable asci characters. Then the naive trim() that XF does on a username prevents all whitespace usernames.
 
Top