Fixed [Speedup] Search Result Render - bbCodeStrip

Xon

Well-known member
I've noticed that when searching for a particular user's threads, where the first post is long and contains a significant number of nested bbcode tags is really really slow.

I was timing the search results page displaying at 19-25 seconds, to display 25 results as threads. Using xdebug I isolated it down to XenForo_Helper_String::bbCodeStrip calling preg_replace in a loop over the entire message.

PHP:
        while ($string != ($newString = preg_replace('#\[([a-z0-9]+)(=[^\]]*)?\](.*)\[/\1\]#siU', '\3', $string)))
        {
            $string = $newString;
        }

Replaced with the following code made the same results page render in 0.5-0.8 seconds.
PHP:
        $string = self::bbCodeStripTag($string);
...
    public static function bbCodeStripTag($string)
    {
        preg_match_all('#\[([a-z0-9]+)(=[^\]]*)?\](.*)\[/\1\]#siU', $string, $matches, PREG_SET_ORDER);
        foreach ($matches as $val)
        {
            $count = 1;
            if (trim($val[3]) != '')
                    $innerTag = self::bbCodeStripTag($val[3]);
            else
                    $innerTag = $val[3];
            $string = str_replace($val[0], $innerTag, $string, $count);
        }      
        return $string;  
    }

For a sufficiently long post with enough nested bbcode tags, the current code will spend more time shuffling string data around than anything else.
 
Last edited:
Unfortunately, your code doesn't behave the same way in all cases. The notable case is when the same tag is nested within. The existing code will strip that as expected.

Additionally, the original version is actually running 10-20x faster in my local tests. Can you provide the strings that are being processed?

Obviously, the more BB code the slower it will be. I do recall a post that was something like 900,000 characters which is basically a novel... :)
 
Unfortunately, your code doesn't behave the same way in all cases. The notable case is when the same tag is nested within. The existing code will strip that as expected.
Ugh. I thought that was working properly. I'll have to tinker with it then.

Additionally, the original version is actually running 10-20x faster in my local tests. Can you provide the strings that are being processed?

Obviously, the more BB code the slower it will be. I do recall a post that was something like 900,000 characters which is basically a novel... :)
The post in question wasn't terribly long, but had a lot of bbcode. I'll send a PM with info about my test site with the original post(s) and conditions.
 
Last edited:
First, I want to note that one of the posts consisted of approximately 200,000 characters. The default character limit (including BB code) is 10,000 so increasing that so significantly may cause performance challenges in a few cases.

Nonetheless, I have implemented an alternative method to the BB code stripping algorithm. Previously, with that 200,000 character test case, it took approximately 3.2 seconds to run. The new algorithm achieves the same result in 0.025 seconds. Decent gain there. :)

(I have created a case where it is marginally slower, but that's unlikely to come up in reality; it should be roughly the same speed in most normal cases.)
 
Top Bottom