Fixed Highly-nested bbcode post causing php to crash due to stack overflow

Xon

Well-known member
From this thread it's been diagnose that some of the regex in 'filterFinalOutput' of the wysiwyg formatter will cause stack exhaustion (default CentOS 6.x limit of 10mb isn't sufficient).

Thanks to @cclaerhout for helping debug this issue.

The only notice that this occurring is something like the following in php-fpm logs:
Code:
[27-Jul-2014 08:15:39] WARNING: [pool www] child 23811 exited on signal 11 (SIGSEGV) after 5458.336405 seconds from start
[27-Jul-2014 08:15:39] NOTICE: [pool www] child 32674 started

The bbcode which is sufficient to trigger this bug:
Code:
[font=trebuchet ms]a17
[font=trebuchet ms]a16
[font=trebuchet ms]a15
[font=trebuchet ms]a14
[font=trebuchet ms]a13
[font=trebuchet ms]a12
[font=trebuchet ms]a11
[font=trebuchet ms]a10
[font=trebuchet ms]a9
[font=trebuchet ms]a8
[font=trebuchet ms]a7
[font=trebuchet ms]a6
[font=trebuchet ms]a5
[font=trebuchet ms]a4
[font=trebuchet ms]a3
[font=trebuchet ms]a2
[font=courier new]a1
[/font][/font][/font][/font][/font][/font][/font][/font][/font][/font][/font][/font][/font][/font][/font][/font][/font]

If not, add more nesting.

A post length can't be used to preventing a user from entering something like this.
 
PHP crashing in this case is actually PCRE crashing, almost certainly because if the pcre.recursion_limit value: http://php.net/manual/en/pcre.configuration.php#ini.pcre.recursion-limit You should probably lower that to fit with your stack size.

Using that test case, I get around 1300 recursions used before one of the regexes fails. A 8MB stack is estimated to support about 16000 recursions in PCRE (using their rule of thumb, give or take with other elements on the stack). The recursions there are way below that level and I'm not able to reproduce this issue myself (further nesting leads to tags being ignored, though the issue isn't just pure nesting). The test case really shouldn't be significant enough to cause any issue unless the stack size is very small.

Larger posts can potentially cause some issues, though I should note that the default post limit is set to 10000 characters which helps to protect against a few possible performance issues.
 
@Mike
You should also take a look at the regex mentioned in the other thread. To take only the first part of it:

Test string: <xf100:font blabla>
Original regex: <(xf100:((?>[a-z0-9-]+?)))([^>]*)(?<!/)>
Patched regex: <(xf100:((?>[a-z0-9-]+)))([^>]*)(?<!/)>
Replace pattern (there are 3 captering groups in it): \1\2\3
With the original regex, here's what I get:
Code:
xf100:f
f
ont blabla
With the modified regex:
Code:
xf100:font
font
blabla

As you can see:
- the first capturing group seems to be fixed, which is important since it is used in the full regex to match the ending tag
- the second capturing group seems also to be fixed, which is important since it is used by the "_replaceTagFilter" function

If an atomic group hasn't been used the "?" would have worked, but with it, it seems to mess the matches.
 
The ? is supposed to invert the greediness, which makes it greedy here. Based on that, the ? should help it but I'll need to check if PCRE isn't working how I expect.
 
Once applying; ini_set("pcre.recursion_limit", "16777"); the original post stops causing php-fpm to crash.

Except this is a silent failure, so what is breaking?

PHP crashing in this case is actually PCRE crashing, almost certainly because if the pcre.recursion_limit value: http://php.net/manual/en/pcre.configuration.php#ini.pcre.recursion-limit You should probably lower that to fit with your stack size.
Default stack in CentOS 6.5 is ~10mb. Going by the rule of thumb (which I've found elsewhere too) the limit should be a more sane ~16777 rather than 100000

Larger posts can potentially cause some issues, though I should note that the default post limit is set to 10000 characters which helps to protect against a few possible performance issues.
The original post was ~100kb* and contained some other minor formatting, and required upping the stack size to about ~20mb from 10mb to prevent php-fpm from crashing.

*My site's users love their rather long story/fanfic posts.
 
Last edited:
I'm outside so the answer needs to be checked. The atomic group doesn't remind the stack position in the regex so the ? only gets the first character after the begin of the match. Without an atomic group it would have taken the last stack position and would have matched the full tag name.
 
There's a couple of expressions in that regular expression that was posted in the original thread that I don't really understand. For instance, isn't (?>[a-z0-9-]+?) the same thing as [a-z0-9-]++ except that it uses an atomic group instead of a possessive quantifier? I don't know PCRE's internals (their JIT even less so) so I don't know whether they get optimized to the same form. If they don't, the second form may be more efficient.

The other expression is this:
Code:
((?:(?>[^<]*?)|<)*)
Isn't it functionally equivalent to a catch-all such as the following? (.*+)
The original expression matches as many non-left-bracket characters as possible, or a left bracket, and both as many times as possible. Isn't that the same as matching anything as many times as possible?

If so, then the following lines are functionally equivalent but the second one should use much less stack:
PHP:
        '#<(xf' . $i . ':((?>[a-z0-9-]+?)))([^>]*)(?<!/)>((?:(?>[^<]*?)|<)*)</\\1>#siU',
PHP:
        '#<(xf' . $i . ':([a-z0-9-]++))([^>]*)(?<!/)>(.*+)</\\1>#siU',
 
Test string: <xf100:font blabla>
Original regex: <(xf100:((?>[a-z0-9-]+?)))([^>]*)(?<!/)>
Patched regex: <(xf100:((?>[a-z0-9-]+)))([^>]*)(?<!/)>

The ? is supposed to invert the greediness, which makes it greedy here. Based on that, the ? should help it but I'll need to check if PCRE isn't working how I expect.
Something important that might have been missed is that the original regex is run with "siU", The U inverts the match to be lazy by default. So the ? is making is greedy. (I used http://regex101.com/ to pull apart exactly what the regex is doing)

This means cclaerhout changes don't realty work with the rest of the regex due to this.

@JoshyPHP, Your regex changes doesn't match "<xf100:font blabla> aa </xf100:font>". I suspect due to the "U" modifier.
 
I had factored the greediness being inverted, the problem is with (.*+) -- it consumes everything till the end of the string with no backtracking.

I still can't wrap my head about that alternation that matches everything in the original regexp, but I don't know its purpose so that doesn't help. :) It looks like it should be (.*?) -- greedily matching everything but with backtracking.
 
@Mike, Why is _replaceTagFilter applied as a post processing step rather than as part of _wrapInHtml which is called by render* functions ?

I would have thought that by the time _wrapInHtml is called, the subtree rendering is finished, and the inner text of the tag can't change without invalidating a lot of other assumptions.

I can't see anything in _replaceTagFilter which is dependant on the outside context (besides _blockTagsEnd which can be setup on first use or something). Even the inspection of the <break> & <break-start> tags look like they are only operating on the inner-most text.
 
Last edited:
Something important that might have been missed is that the original regex is run with "siU", The U inverts the match to be lazy by default. So the ? is making is greedy.
You're totally right, which means the original regex pattern is correct. My change on your text only worked because the regex didn't match some elements, so the engine was not as much solicited.
 
I had factored the greediness being inverted, the problem is with (.*+) -- it consumes everything till the end of the string with no backtracking.
I still can't wrap my head about that alternation that matches everything in the original regexp, but I don't know its purpose so that doesn't help. :) It looks like it should be (.*?) -- greedily matching everything but with backtracking.
The atomic group are supposed to speed up the regex, but here's what gives me Regexbuddy (In the next steps, I got rid of the ungreedy modifier that makes the regex harder to read and that can't work with some regex tools, such as Regexbuddy)

#Content: (see Xon first post)
#Patterns:
Code:
<(xf10:((?>[a-z0-9-]+)))([^>]*)(?<!/)>((?:(?>[^<]*)|<)*)</\1>
371 steps (s modifier not needed here) - >= 0.0030sec in php
Code:
<(xf10:([a-z0-9-]+))([^>]*)(?<!/)>((?:(?>[^<]*)|<)*)</\1>
370 steps (s modifier not needed here) - >= 0.0030sec in php
Code:
<(xf10:([a-z0-9-]+))([^>]*)(?<!/)>(([^<]*|<)*)</\1>
284 steps (s modifier not needed here) >= 0.0030sec in php
Code:
<(xf10:([a-z0-9-]+))([^>]*)(?<!/)>(.*)</\1>
322 steps >= 0.0020sec in php

Code:
<(xf10:([a-z0-9-]+))([^>]*)(?<!/)>(.*?)</\1>
1524 steps >= 0.0040sec in php
Code:
<(xf10:([a-z0-9-]+?))([^>]*)(?<!/)>(.*?)</\1>
7323 steps >= 0.011sec in php

I have no idea if these steps impact on the recursion_limit setting you were talking about.
Another solution to reduce the steps is not to capture the inner (match[4]) using this regex:
Code:
<(xf10:([a-z0-9-]+))([^>]*)(?<!/)>.*</\1>
225 steps - 0.009sec in php
The inner can be resolved by something like this "str_replace(array($open, $close), '', $match[0]);"

#######
If I revert changes made to my apache configuration and apply this regex on the original post of Xon (ref), I can edit the post:
PHP:
'#<(xf' . $i . ':([a-z0-9-]+))([^>]*)(?<!/)>(.*)</\\1>#si',
Time of the tested function: 1,6 sec

With this regex as well and this modification (not perfect but should work since user can't type directly < or >):
PHP:
'#<(xf' . $i . ':([a-z0-9-]+))([^>]*)(?<!/)>.*</\\1>#si',
PHP:
$inner = str_replace(array($open, $close), '', $match[0]);
Time: 1sec
 
Last edited:
@Mike, Why is _replaceTagFilter applied as a post processing step rather than as part of _wrapInHtml which is called by render* functions ?

I would have thought that by the time _wrapInHtml is called, the subtree rendering is finished, and the inner text of the tag can't change without invalidating a lot of other assumptions.
I think it probably was down to the thought that we couldn't guarantee there wouldn't be other manipulation, though it could just be down to how the code changed over time. I am experimenting with a new method for this for 1.4 and I haven't found any cases that generate different output from before.

Could you send me the BB code of the post that was triggering the segfault for you? It may not be relevant, but please disable the RTE in your options and then go to edit it so we get the exact text as it's stored. (You can send it to me via a PM.)
 
I think it probably was down to the thought that we couldn't guarantee there wouldn't be other manipulation, though it could just be down to how the code changed over time. I am experimenting with a new method for this for 1.4 and I haven't found any cases that generate different output from before.
My original thought was it was leftovers from a previous approach which hasn't been refactored away. It is basically guaranteed for a codebase to have that.

But great to hear there is a better solution coming down the pipeline! 1.4 looks to be great improvement on XenForo's feature set.

Could you send me the BB code of the post that was triggering the segfault for you? It may not be relevant, but please disable the RTE in your options and then go to edit it so we get the exact text as it's stored. (You can send it to me via a PM.)
Here is a pastebin of the original post; http://pastebin.com/saGys5sK
 
New version seems to parse the same on that code (once I got it running :)), so will give that a shot for 1.4.
 
Top Bottom