Fixed vB4 Import: Method "rewriteQuotes" does not handle usernames that contain square brackets

Steffen

Well-known member
Affected version
2.0.2
Quote tags with usernames that contain square brackets are not rewritten by the "rewriteQuotes" method. Example:
Code:
[QUOTE='User [Name];123']Foo[/QUOTE]

The code currently uses a single regex:
Code:
'/\[quote=("|\'|)(?P<username>[^;\n\]]*);\s*n?(?P<postid>\d+)\s*\1\]/siU'

One solution would be to use two regexes:
Code:
'/\[quote=("|\')(?P<username>[^;\n]*);\s*n?(?P<postid>\d+)\s*\1\]/siU'
'/\[quote=(?P<username>[^;\n\]]*);\s*n?(?P<postid>\d+)\s*\]/siU'

The first regex allows usernames with square brackets if the argument is surrounded by quotes. The second regex handles cases without quotes.
 
@Steffen I'm thinking that a regex with a conditional might work - would you mind giving this a spin as the $quoteEqualsPattern in rewriteQuotes?
PHP:
$quoteEqualsPatterns = [
    '/\[QUOTE=(?P<quoted>\'|"|)(?(quoted)(?P<username>[^\][;\n]+|[^;\n]+))(;(?P<postid>\d+))?\1\]/siU' => function ($match)
    {
        return sprintf('[QUOTE="%s, post: %d"]',
            $match['username'],
            $this->lookupId('post', $match['postid'])
        );
    }
];
 
Using a conditional inside a regex sounds like an elegant solution, I didn't know this feature.

However, something is odd. Demo script with your pattern:
PHP:
$pattern = '/\[QUOTE=(?P<quoted>\'|"|)(?(quoted)(?P<username>[^\][;\n]+|[^;\n]+))(;(?P<postid>\d+))?\1\]/siU';
$m = null; var_dump(preg_match($pattern, "[QUOTE='User [Name];123']Foo[/QUOTE]", $m), $m);
$m = null; var_dump(preg_match($pattern, "[QUOTE=User Name;123]Foo[/QUOTE]", $m), $m);

This code is working fine although the conditional always chooses the "else" clause. Proof: The following code with an intentionally broken "if" clause (notice the "zzz") produces the same result:
PHP:
$pattern = '/\[QUOTE=(?P<quoted>\'|"|)(?(quoted)(?P<username>zzz|[^;\n]+))(;(?P<postid>\d+))?\1\]/siU';
$m = null; var_dump(preg_match($pattern, "[QUOTE='User [Name];123']Foo[/QUOTE]", $m), $m);
$m = null; var_dump(preg_match($pattern, "[QUOTE=User Name;123]Foo[/QUOTE]", $m), $m);

I suspect the reason that even this intentionally broken version works may be backtracking: The else clause initially matches the "]" as part of the username but then the regex engine backtracks because it realizes that there is no "]" left to be matched at the end of the pattern.

The PHP manual says that "the condition is satisfied if the capturing subpattern [...] has previously matched" (Conditional subpatterns). I think a possibly empty subpattern like (?P<quoted>\'|"|) always matches and should be rewritten to (?P<quoted>\'|")?.

However, this causes a problem with the back reference \1 at the end of the regex because "If a subpattern has not actually been used in a particular match, then any back references to it always fail" (Back references). I've found a workaround in replacing (?P<quoted>\'|")? with ((?P<quoted>\'|")?).

The third thing that needs to be changed is to flip ?P<username> and ?(quoted) because conditionals seem to work in capturing subpatterns but not the other way round. (This may be the reason why the regex engine always chose the "else" clause in the first example above. From the perspective of the regex engine, there simply was no "else" clause to begin with but only a subpattern with an alternation.)

The final pattern:
Code:
'/\[QUOTE=((?P<quoted>\'|")?)(?P<username>(?(quoted)[^;\n]+|[^\][;\n]+))(;(?P<postid>\d+))?\1\]/siU'

Phew! :)
 
I'm going to be totally honest - I'll take your word for it and stuff your version into the next XF release!
 
One more thing: You've made the post id optional in your pattern by appending a ? (and I've kept it that way). Therefore, $match['postid'] won't exist for QUOTE tags without post-ids and cause an "[E_NOTICE] Undefined index: postid" and a somewhat broken QUOTE tag with an empty post id.

I think the ? should be removed such that matching the post id is non-optional because we are only interested in rewriting QUOTE tags with post ids, right?
 
Back
Top Bottom