1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Not Planned Parser modification

Discussion in 'Closed Suggestions' started by cclaerhout, Jan 26, 2013.

  1. cclaerhout

    cclaerhout Well-Known Member

    Description: this code will allow to use normal bbcodes in an opening tag options of a bbcode; ie: [mybbcode=[b]title[/b]]content[/mybbcode]. By default this inner bbcode will not be parsed.

    File: {yourForum}\library\XenForo\BbCode\parser.php
    Function: protected function _parseTag()

    Search:
    Code:
    $tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition);
    Replace with (code on pastbin):
    PHP:
            //Modification starts
            
    $bbCodesOptionsPattern '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
            if(
    preg_match($bbCodesOptionsPattern$this->_text$matchesPREG_OFFSET_CAPTURE$tagStartPosition) && isset($matches['closingBracket'][1]))
            {
                
    $tagContentEndPosition $matches['closingBracket'][1];
            }
            else
            {
                
    $tagContentEndPosition false;
            }
            
    //Modification ends
     
            //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference
    Last edit: 2013/02/15 @thanks to Volion
     
  2. cclaerhout

    cclaerhout Well-Known Member

    Demo
    Simple example:
    Code:
    [article=[b]source[/b]]This is a test article[/article]
    Results without the above modification (and without the Stop AutoLinking Patch addon which I want to get rid of):
    1. Formatter tree element:
      Code:
      array(1) {
        [0] => array(4) {
          ["tag"] => string(7) "article"
          ["option"] => string(2) "[b"
          ["original"] => array(2) {
            [0] => string(12) "[article=[b]"
            [1] => string(10) "[/article]"
          }
          ["children"] => array(1) {
            [0] => string(33) "source[/b]]This is a test article"
          }
        }
      }
      
    2. Visual
      without.png
    Results with the above modification
    1. Formatter tree element
      Code:
      array(1) {
        [0] => array(4) {
          ["tag"] => string(7) "article"
          ["option"] => string(13) "[b]source[/b]"
          ["original"] => array(2) {
            [0] => string(23) "[article=[b]source[/b]]"
            [1] => string(10) "[/article]"
          }
          ["children"] => array(1) {
            [0] => string(22) "This is a test article"
          }
        }
      }
      
    2. Visual
      with.png
     
    Adam Howard likes this.
  3. cclaerhout

    cclaerhout Well-Known Member

    Detailed Regex (not exactly the same but almost - just one supplementary capturing group at the beginning):

    This regex is no more used, it has been updated in the first post
    PHP:
            $Pattern '#(?x)                                              \#active regex comments
                    \[(.+?)=                                              \#opening tag starts with an option
                    (                                                      \#starts the captering repeating group
                    \[([\w\d]+)(?:=.+?)?\].+?\[/\3\]                      \#repeating group 1 - standard bbcode structure... no need in theory of recursive mask
                    |                                                      \#or
                    [^\[\]]                                                \#repeating group 2 - everything else except the closing bracket of the opening tag
                    )+?                                                    \#close the captering repeatin group and execute it               
                    \]                                                    \#opening tag ends
                    #iu'
    ;                                                  //Options: case sensitive + unicode
     
    Adam Howard likes this.
  4. cclaerhout

    cclaerhout Well-Known Member

    A last feedback to parse the inner bbcodes inside options:
    Do NOT do this directly in the parser class, example of what must not be done:

    protected function _pushTagOpen
    PHP:
            if($tagOption !== null && $this->_hasTagsInOptions === true)
            {
                
    $that = new ReflectionClass($this);
                    
    $clone $that->newInstance($this->_formatter);
                
    $tagOption $clone->render($tagOption);
            }
    Why? Because when creating a customized BbCode with a callback every user entries must be secured with the php command "htmlspecialchars" to prevent html code injection. Which means the bbcodes inside the options will be parsed... but not in a raw mode. The only solution (that I've found) is the one already used in the Stop AutoLinking Patch: apply the htmlspecialchars protection on user entry, get back the Parser from the Formatter, parse the protected entry, then the html code is sure and ready to be used.
     
    Adam Howard likes this.
  5. BamaStangGuy

    BamaStangGuy Well-Known Member

    If the day ever comes when xenForo is released from the burden of the lawsuit... we are going to have one hell of an update.
     
    Adam Howard and Insy like this.
  6. cclaerhout

    cclaerhout Well-Known Member

    My previous code was working... on theory... but with only 1 bbcode :rolleyes:. Translation: it was a mess on a live website.

    I've updated it in the first post. Now considering that a preg_replace will be always slower than a strpos (cf for example this article), I don't think XenForo should have it in default.
    Nevertheless, I've compared both on a page with a lot of bbcodes (I only use the stats from the debug mode...) and the results were the same.

    So It would be nice if we can have the choice to modify this behaviour with the possibility to extend the parser.
     
  7. cclaerhout

    cclaerhout Well-Known Member

    The code of the first post has been edited for those who are using php 5.4 and who would have errors in logs (from the thread preview - full template ; example).
    If you have already done the modification, here is the way to update it:
    Search:
    PHP:
            //Modification starts
            
    $bbCodesOptionsPattern '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
            
    preg_match($bbCodesOptionsPattern$this->_text$matchesPREG_OFFSET_CAPTURE$tagStartPosition);
            
    $tagContentEndPosition $matches['closingBracket'][1];
            
    //Modification ends
     
            //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference
    Replace:
    PHP:
            //Modification starts
            
    $bbCodesOptionsPattern '#\[(?:/)?[\w\d]+?(?:=(\[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]|[^\[\]])+?)?(?P<closingBracket>\])#iu';
            if(
    preg_match($bbCodesOptionsPattern$this->_text$matchesPREG_OFFSET_CAPTURE$tagStartPosition) && isset($matches['closingBracket'][1]))
            {
                
    $tagContentEndPosition $matches['closingBracket'][1];
            }
            else
            {
                
    $tagContentEndPosition false;       
            }
            
    //Modification ends
     
            //$tagContentEndPosition = strpos($this->_text, ']', $tagStartPosition); //for reference
     
  8. tyteen4a03

    tyteen4a03 Well-Known Member

    Bumping this.

    This supports nested codes (depth = 3+), does it?
     
  9. cclaerhout

    cclaerhout Well-Known Member

    Since we are inside the options of the opening tag we don't need to make a crazy regex that matches nested Bb Codes with a recursive pattern.
    All we needed to to is to target the ending bracket "]" of a tag and to add two repeated rules to get to it:
    1. one is the basic pattern of a bbcode: \[([\w\d]+?)(?:=.+?)?\].+?\[/\2\]
    2. the other one is to match anything else except a closing bracket: [^\[\]]
    These two rules must be repeated, which explains this: (rule1|rule2)+?
    This global pattern must be put as an option, since it's the options of the beginning tag: (?:=(repeated rules)?

    To finish the explanation, there are still two parts to explain:
    1. The beginning one: \[(?:/)?[\w\d]+?
    2. The ending one: (?P<closingBracket>\])
    Both are working together of course, the beginning one is the opening bracket of the bbcode BUT since we are inside the parser of XenForo(*), we need to match both opening tags and closing tags, even if a closing tag will never have options. This explains the: (?:/)?;
    The ending part is easy to understand, it's just a capturing group with the closing bracket, the one that has been targeted since the beginning. This capturing group as be named "closingBracket". It's easier to get it after.

    So no, there is no nested codes here, just some repeated patterns.

    * The XenForo parser is reading the code character by character and needs the position of the closing bracket... both for opening & closing tags.
     
  10. Mike

    Mike XenForo Developer Staff Member

    Going to basically "no thanks" this. You can do nested BB code without issue if you wrap the option in " or '. Options are not designed for advanced BB code nesting like that.
     
  11. AlexT

    AlexT Well-Known Member

    I wouldn't use regex-only for parsing BB codes - it's too easy to accidentally "parse away" information that wasn't part of the BB code.
     
  12. cclaerhout

    cclaerhout Well-Known Member

    Agree but this modification is not a parser. It just a way to skip inner brackets used as a Bb Code. The parser comes after. There is an option in bbcm for this (just uses the XenForo parser, so it's still not a regex parser).

    Thanks, I didn't know this code would work :
    Content
    . But it's still not very user friendly, would there be a listener to extend the parser as there is one for the formatter?
     
    Volion and AlexT like this.

Share This Page