Browser issue URLs constructed from Thread Titles including the "em dash" are rejected from being linked to within a post.

Affected version
2.2.0
Let me explain this better. If a user uses the "em dash", which looks like this —, in the title of a thread, whatever functionality that constructs the URL from the title recognizes the "em dash —" as a word. And since words are separated by regular hyphens in the URL, you end up with this sequence: "-—-" in the URL.

It then appears like Xenforo has some kind of internal URL validation going on that does not respect "—" as a word, because the URL with "-—-" in it won't show up in a message. You'll see it in the message editor, but once posted it seems to get stripped out of the message altogether.

It took me a long time to discover what was going on here. I thought my permissions got messed up, etc. Seems like an "edge case" that might be rare enough to not have been caught yet.
 
I think you're talking about two distinct things here, but we'd probably need you to show us an example to truly confirm. You can either link to one on your site or post an example in our testing forum.

In terms of the URL generation, we only have special handling for a small amount of (basic ASCII) punctuation. Unless you enable the Romanization options for URLs, all other characters will be left in the URL and simply encoded. If the user put spaces around the emdash, then -—- in the URL would be expected.

In terms of autolinking, it's very possible this has already changed in the 2.2.2 development version as we loosened our URL validation before linking URLs. If the URL didn't display a link, this likely means that the URL that was pasted wasn't actually 100% valid. An emdash in a URL should be encoded. Browsers may display the URL in an unencoded form but copying it should return the encoded version. So while this has been tweaked, that isn't entirely unexpected.
 
I've found an interesting issue when a URL title has an em dash in it.

If I post the link to one of those URLs, it doesn't automatically convert to a clickable link.

Case in point:

https://wranglertjforum.com/threads/elvis-trail—florence-arizona-december-5th.43584/

And another:

https://wranglertjforum.com/threads/dana-35-from-1997-going-into-a-2003—brake-upgrade-to-include-abs—disc-or-drum.43877/#post-738500
 
Last edited:
Replace the em dash with a hyphen and the URL works fine. However, when you look at the URL in the address bar, it is an emdash.

Here is the same URL copied and pasted with a hyphen instead (I had to replace it with a hyphen myself):

 
Last edited:
I have just merged these two similar bug reports relating to the em dash issues.

We hadn't yet received any feedback from the OP:

I think you're talking about two distinct things here, but we'd probably need you to show us an example to truly confirm. You can either link to one on your site or post an example in our testing forum.

In terms of the URL generation, we only have special handling for a small amount of (basic ASCII) punctuation. Unless you enable the Romanization options for URLs, all other characters will be left in the URL and simply encoded. If the user put spaces around the emdash, then -—- in the URL would be expected.

In terms of autolinking, it's very possible this has already changed in the 2.2.2 development version as we loosened our URL validation before linking URLs. If the URL didn't display a link, this likely means that the URL that was pasted wasn't actually 100% valid. An emdash in a URL should be encoded. Browsers may display the URL in an unencoded form but copying it should return the encoded version. So while this has been tweaked, that isn't entirely unexpected.
 
Using the more recent example mentioned, the URL is actually this: https://wranglertjforum.com/threads/elvis-trail%E2%80%94florence-arizona-december-5th.43584/ You can confirm that by going to the thread and copying the URL out of the address bar. I'm not sure what browser is copying th emdash literally as you can see it's explicitly encoded (since it's not a basic ASCII character).

My feedback from before generally still applies, though I'd like to know how the URL was obtained without the character escaped. Note that browsers generally do unescaping in the UI so that you see the emdash in the URL bar, but when you copy it, it should be the value that I showed (where it's encoded).
 
Using the more recent example mentioned, the URL is actually this: https://wranglertjforum.com/threads/elvis-trail%E2%80%94florence-arizona-december-5th.43584/ You can confirm that by going to the thread and copying the URL out of the address bar. I'm not sure what browser is copying th emdash literally as you can see it's explicitly encoded (since it's not a basic ASCII character).

My feedback from before generally still applies, though I'd like to know how the URL was obtained without the character escaped. Note that browsers generally do unescaping in the UI so that you see the emdash in the URL bar, but when you copy it, it should be the value that I showed (where it's encoded).
I just copied and pasted it from the latest version of Safari on my iMac.
 
Can confirm this seems to be a Safari thing.

Chrome:

Safari:
https://wranglertjforum.com/threads/elvis-trail—florence-arizona-december-5th.43584/
 
Yep, I can also confirm that this is a Safari thing. In both Firefox and Chrome it doesn’t happen. This is on my iMac running the latest version of each browser.
 
Unfortunately, realistically, this is a Safari bug and one that has been around for a while. It's not specific to an emdash -- it applies to any character in the URL that is encoded. You can see an example with Chinese characters here:


The problem is simply that what Safari is copying from the address bar isn't actually a valid URL. Our autolinking code needs to essentially guess what is a URL and what's the end of a URL, generally erring on the side of not linking things it shouldn't. Characters that aren't allowed in a URL would be a clear point to either end the linking or to not link in the first place. There are even some edge cases where spaces can be used in a URL that, when pasted, are pasted as a literal space instead of %20.

The primary workaround in XF would be to enable the "Romanize titles in URLs" option which would take all special characters out of titles used in URLs, essentially mitigating the issue, though there are situations where this may change the URLs in a way that isn't desired.

Other browsers appear to copy the URLs in their raw/valid fasion. Though it doesn't seem likely, this does seem like something Safari should be changing to improve interop.
 
Top Bottom