Not planned Parser modification

cclaerhout · Jan 26, 2013

Description: this code will allow to use normal bbcodes in an opening tag options of a bbcode; ie: [mybbcode=[b]title[/b]]content[/mybbcode]. By default this inner bbcode will not be parsed.

File: {yourForum}\library\XenForo\BbCode\parser.php
Function: protected function _parseTag()

Search:

Code:

$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition);

Replace with (code on pastbin):

PHP:

        //Modification starts
        $bbCodesOptionsPattern = '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
        if(preg_match($bbCodesOptionsPattern, $this->_text, $matches, PREG_OFFSET_CAPTURE, $tagStartPosition) && isset($matches['closingBracket'][1]))
        {
            $tagContentEndPosition = $matches['closingBracket'][1];
        }
        else
        {
            $tagContentEndPosition = false;
        }
        //Modification ends
 
        //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference

Last edit: 2013/02/15 @thanks to Volion

cclaerhout · Jan 26, 2013

Demo
Simple example:

Code:

[article=[b]source[/b]]This is a test article[/article]

Results without the above modification (and without the Stop AutoLinking Patch addon which I want to get rid of):

Formatter tree element:

Code:

array(1) {
  [0] => array(4) {
    ["tag"] => string(7) "article"
    ["option"] => string(2) "[b"
    ["original"] => array(2) {
      [0] => string(12) "[article=[b]"
      [1] => string(10) "[/article]"
    }
    ["children"] => array(1) {
      [0] => string(33) "source[/b]]This is a test article"
    }
  }
}

Visual

Results with the above modification

Formatter tree element

Code:

array(1) {
  [0] => array(4) {
    ["tag"] => string(7) "article"
    ["option"] => string(13) "[b]source[/b]"
    ["original"] => array(2) {
      [0] => string(23) "[article=[b]source[/b]]"
      [1] => string(10) "[/article]"
    }
    ["children"] => array(1) {
      [0] => string(22) "This is a test article"
    }
  }
}

Visual

cclaerhout · Jan 26, 2013

Detailed Regex (not exactly the same but almost - just one supplementary capturing group at the beginning):

This regex is no more used, it has been updated in the first post

PHP:

        $Pattern = '#(?x)                                              \#active regex comments
                \[(.+?)=                                              \#opening tag starts with an option
                (                                                      \#starts the captering repeating group
                \[([\w\d]+)(?:=.+?)?\].+?\[/\3\]                      \#repeating group 1 - standard bbcode structure... no need in theory of recursive mask
                |                                                      \#or
                [^\[\]]                                                \#repeating group 2 - everything else except the closing bracket of the opening tag
                )+?                                                    \#close the captering repeatin group and execute it               
                \]                                                    \#opening tag ends
                #iu';                                                  //Options: case sensitive + unicode

cclaerhout · Jan 26, 2013

A last feedback to parse the inner bbcodes inside options:
Do NOT do this directly in the parser class, example of what must not be done:

protected function _pushTagOpen

PHP:

        if($tagOption !== null && $this->_hasTagsInOptions === true)
        {
            $that = new ReflectionClass($this);
                $clone = $that->newInstance($this->_formatter);
            $tagOption = $clone->render($tagOption);
        }

Why? Because when creating a customized BbCode with a callback every user entries must be secured with the php command "htmlspecialchars" to prevent html code injection. Which means the bbcodes inside the options will be parsed... but not in a raw mode. The only solution (that I've found) is the one already used in the Stop AutoLinking Patch: apply the htmlspecialchars protection on user entry, get back the Parser from the Formatter, parse the protected entry, then the html code is sure and ready to be used.

Brent W · Jan 26, 2013

If the day ever comes when xenForo is released from the burden of the lawsuit... we are going to have one hell of an update.

cclaerhout · Jan 27, 2013

My previous code was working... on theory... but with only 1 bbcode

. Translation: it was a mess on a live website.

I've updated it in the first post. Now considering that a preg_replace will be always slower than a strpos (cf for example this article), I don't think XenForo should have it in default.
Nevertheless, I've compared both on a page with a lot of bbcodes (I only use the stats from the debug mode...) and the results were the same.

So It would be nice if we can have the choice to modify this behaviour with the possibility to extend the parser.

cclaerhout · Feb 14, 2013

The code of the first post has been edited for those who are using php 5.4 and who would have errors in logs (from the thread preview - full template ; example).
If you have already done the modification, here is the way to update it:
Search:

PHP:

        //Modification starts
        $bbCodesOptionsPattern = '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
        preg_match($bbCodesOptionsPattern, $this->_text, $matches, PREG_OFFSET_CAPTURE, $tagStartPosition);
        $tagContentEndPosition = $matches['closingBracket'][1];
        //Modification ends
 
        //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference

Replace:

PHP:

        //Modification starts
        $bbCodesOptionsPattern = '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
        if(preg_match($bbCodesOptionsPattern, $this->_text, $matches, PREG_OFFSET_CAPTURE, $tagStartPosition) && isset($matches['closingBracket'][1]))
        {
            $tagContentEndPosition = $matches['closingBracket'][1];
        }
        else
        {
            $tagContentEndPosition = false;       
        }
        //Modification ends
 
        //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference

tyteen4a03 · Apr 11, 2013

Bumping this.

This supports nested codes (depth = 3+), does it?

cclaerhout · Apr 11, 2013

tyteen4a03 said:
This supports nested codes (depth = 3+), does it?

Since we are inside the options of the opening tag we don't need to make a crazy regex that matches nested Bb Codes with a recursive pattern.
All we needed to to is to target the ending bracket "]" of a tag and to add two repeated rules to get to it:

one is the basic pattern of a bbcode: \[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]
the other one is to match anything else except a closing bracket: [^\[\]]

These two rules must be repeated, which explains this: (rule1|rule2)+?
This global pattern must be put as an option, since it's the options of the beginning tag: (?:=(repeated rules)?

To finish the explanation, there are still two parts to explain:

The beginning one: \[(?:/)?[\w\d]+?
The ending one: (?P<closingBracket>\])

Both are working together of course, the beginning one is the opening bracket of the bbcode BUT since we are inside the parser of XenForo(*), we need to match both opening tags and closing tags, even if a closing tag will never have options. This explains the: (?:/)?;
The ending part is easy to understand, it's just a capturing group with the closing bracket, the one that has been targeted since the beginning. This capturing group as be named "closingBracket". It's easier to get it after.

So no, there is no nested codes here, just some repeated patterns.

* The XenForo parser is reading the code character by character and needs the position of the closing bracket... both for opening & closing tags.

Mike · Apr 11, 2013

Going to basically "no thanks" this. You can do nested BB code without issue if you wrap the option in " or '. Options are not designed for advanced BB code nesting like that.

AlexT · Apr 11, 2013

I wouldn't use regex-only for parsing BB codes - it's too easy to accidentally "parse away" information that wasn't part of the BB code.

cclaerhout · Apr 11, 2013

AlexT said:
I wouldn't use regex-only for parsing BB codes - it's too easy to accidentally "parse away" information that wasn't part of the BB code.

Agree but this modification is not a parser. It just a way to skip inner brackets used as a Bb Code. The parser comes after. There is an option in bbcm for this (just uses the XenForo parser, so it's still not a regex parser).

Mike said:
You can do nested BB code without issue if you wrap the option in " or '.

Thanks, I didn't know this code would work :

Content

. But it's still not very user friendly, would there be a listener to extend the parser as there is one for the formatter?

Not planned Parser modification

cclaerhout

Well-known member

cclaerhout

Well-known member

cclaerhout

Well-known member

cclaerhout

Well-known member

Brent W

Well-known member

cclaerhout

Well-known member

cclaerhout

Well-known member

tyteen4a03

Well-known member

cclaerhout

Well-known member

Mike

XenForo developer

AlexT

Well-known member

cclaerhout

Well-known member

We value your privacy