• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Not planned Parser modification

cclaerhout

Well-known member
#1
Description: this code will allow to use normal bbcodes in an opening tag options of a bbcode; ie: [mybbcode=[b]title[/b]]content[/mybbcode]. By default this inner bbcode will not be parsed.

File: {yourForum}\library\XenForo\BbCode\parser.php
Function: protected function _parseTag()

Search:
Code:
$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition);
Replace with (code on pastbin):
PHP:
        //Modification starts
        $bbCodesOptionsPattern = '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
        if(preg_match($bbCodesOptionsPattern, $this->_text, $matches, PREG_OFFSET_CAPTURE, $tagStartPosition) && isset($matches['closingBracket'][1]))
        {
            $tagContentEndPosition = $matches['closingBracket'][1];
        }
        else
        {
            $tagContentEndPosition = false;
        }
        //Modification ends
 
        //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference
Last edit: 2013/02/15 @thanks to Volion
 

cclaerhout

Well-known member
#2
Demo
Simple example:
Code:
[article=[b]source[/b]]This is a test article[/article]
Results without the above modification (and without the Stop AutoLinking Patch addon which I want to get rid of):
  1. Formatter tree element:
    Code:
    array(1) {
      [0] => array(4) {
        ["tag"] => string(7) "article"
        ["option"] => string(2) "[b"
        ["original"] => array(2) {
          [0] => string(12) "[article=[b]"
          [1] => string(10) "[/article]"
        }
        ["children"] => array(1) {
          [0] => string(33) "source[/b]]This is a test article"
        }
      }
    }
  2. Visual
    without.png
Results with the above modification
  1. Formatter tree element
    Code:
    array(1) {
      [0] => array(4) {
        ["tag"] => string(7) "article"
        ["option"] => string(13) "[b]source[/b]"
        ["original"] => array(2) {
          [0] => string(23) "[article=[b]source[/b]]"
          [1] => string(10) "[/article]"
        }
        ["children"] => array(1) {
          [0] => string(22) "This is a test article"
        }
      }
    }
  2. Visual
    with.png
 

cclaerhout

Well-known member
#3
Detailed Regex (not exactly the same but almost - just one supplementary capturing group at the beginning):

This regex is no more used, it has been updated in the first post
PHP:
        $Pattern = '#(?x)                                              \#active regex comments
                \[(.+?)=                                              \#opening tag starts with an option
                (                                                      \#starts the captering repeating group
                \[([\w\d]+)(?:=.+?)?\].+?\[/\3\]                      \#repeating group 1 - standard bbcode structure... no need in theory of recursive mask
                |                                                      \#or
                [^\[\]]                                                \#repeating group 2 - everything else except the closing bracket of the opening tag
                )+?                                                    \#close the captering repeatin group and execute it               
                \]                                                    \#opening tag ends
                #iu';                                                  //Options: case sensitive + unicode
 

cclaerhout

Well-known member
#4
A last feedback to parse the inner bbcodes inside options:
Do NOT do this directly in the parser class, example of what must not be done:

protected function _pushTagOpen
PHP:
        if($tagOption !== null && $this->_hasTagsInOptions === true)
        {
            $that = new ReflectionClass($this);
                $clone = $that->newInstance($this->_formatter);
            $tagOption = $clone->render($tagOption);
        }
Why? Because when creating a customized BbCode with a callback every user entries must be secured with the php command "htmlspecialchars" to prevent html code injection. Which means the bbcodes inside the options will be parsed... but not in a raw mode. The only solution (that I've found) is the one already used in the Stop AutoLinking Patch: apply the htmlspecialchars protection on user entry, get back the Parser from the Formatter, parse the protected entry, then the html code is sure and ready to be used.
 

cclaerhout

Well-known member
#6
My previous code was working... on theory... but with only 1 bbcode :rolleyes:. Translation: it was a mess on a live website.

I've updated it in the first post. Now considering that a preg_replace will be always slower than a strpos (cf for example this article), I don't think XenForo should have it in default.
Nevertheless, I've compared both on a page with a lot of bbcodes (I only use the stats from the debug mode...) and the results were the same.

So It would be nice if we can have the choice to modify this behaviour with the possibility to extend the parser.
 

cclaerhout

Well-known member
#7
The code of the first post has been edited for those who are using php 5.4 and who would have errors in logs (from the thread preview - full template ; example).
If you have already done the modification, here is the way to update it:
Search:
PHP:
        //Modification starts
        $bbCodesOptionsPattern = '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
        preg_match($bbCodesOptionsPattern, $this->_text, $matches, PREG_OFFSET_CAPTURE, $tagStartPosition);
        $tagContentEndPosition = $matches['closingBracket'][1];
        //Modification ends
 
        //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference
Replace:
PHP:
        //Modification starts
        $bbCodesOptionsPattern = '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
        if(preg_match($bbCodesOptionsPattern, $this->_text, $matches, PREG_OFFSET_CAPTURE, $tagStartPosition) && isset($matches['closingBracket'][1]))
        {
            $tagContentEndPosition = $matches['closingBracket'][1];
        }
        else
        {
            $tagContentEndPosition = false;       
        }
        //Modification ends
 
        //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference
 

cclaerhout

Well-known member
#9
This supports nested codes (depth = 3+), does it?
Since we are inside the options of the opening tag we don't need to make a crazy regex that matches nested Bb Codes with a recursive pattern.
All we needed to to is to target the ending bracket "]" of a tag and to add two repeated rules to get to it:
  1. one is the basic pattern of a bbcode: \[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]
  2. the other one is to match anything else except a closing bracket: [^\[\]]
These two rules must be repeated, which explains this: (rule1|rule2)+?
This global pattern must be put as an option, since it's the options of the beginning tag: (?:=(repeated rules)?

To finish the explanation, there are still two parts to explain:
  1. The beginning one: \[(?:/)?[\w\d]+?
  2. The ending one: (?P<closingBracket>\])
Both are working together of course, the beginning one is the opening bracket of the bbcode BUT since we are inside the parser of XenForo(*), we need to match both opening tags and closing tags, even if a closing tag will never have options. This explains the: (?:/)?;
The ending part is easy to understand, it's just a capturing group with the closing bracket, the one that has been targeted since the beginning. This capturing group as be named "closingBracket". It's easier to get it after.

So no, there is no nested codes here, just some repeated patterns.

* The XenForo parser is reading the code character by character and needs the position of the closing bracket... both for opening & closing tags.
 

Mike

XenForo developer
Staff member
#10
Going to basically "no thanks" this. You can do nested BB code without issue if you wrap the option in " or '. Options are not designed for advanced BB code nesting like that.
 

AlexT

Well-known member
#11
I wouldn't use regex-only for parsing BB codes - it's too easy to accidentally "parse away" information that wasn't part of the BB code.
 

cclaerhout

Well-known member
#12
I wouldn't use regex-only for parsing BB codes - it's too easy to accidentally "parse away" information that wasn't part of the BB code.
Agree but this modification is not a parser. It just a way to skip inner brackets used as a Bb Code. The parser comes after. There is an option in bbcm for this (just uses the XenForo parser, so it's still not a regex parser).

You can do nested BB code without issue if you wrap the option in " or '.
Thanks, I didn't know this code would work :
Content
. But it's still not very user friendly, would there be a listener to extend the parser as there is one for the formatter?