1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Non Greedy Regular Expression Template Modification Match

Discussion in 'XenForo Development Discussions' started by tenants, Mar 14, 2014.

  1. tenants

    tenants Well-Known Member

    Hello, this might be a misunderstanding by me, or a xenforo regex match bug (it's more likely to be the former)

    I'm trying to use a non greedy regex match for the template registration_form

    I'm just trying to find this:
    Code:
        <dl class="ctrlUnit">
            <dt><label for="ctrl_email">Email:</label></dt>
            <dd><input type="email" name="email" value="" dir="ltr" class="textCtrl" id="ctrl_email" /></dd>
        </dl>
    
    from this:

    Code:
    <dl class="ctrlUnit">
            <dt><label for="ctrl_username">Name:</label></dt>
            <dd>
                <input type="text" name="username" value="" class="textCtrl" id="ctrl_username" autofocus="true" autocomplete="off" />
                <p class="explain">This is the name that will be shown with your messages. You may use any name you wish. Once set, this cannot be changed.</p>
            </dd>
        </dl>
    
        <dl class="ctrlUnit">
            <dt><label for="ctrl_email">Email:</label></dt>
            <dd><input type="email" name="email" value="" dir="ltr" class="textCtrl" id="ctrl_email" /></dd>
        </dl>
    
        <fieldset>
            <dl class="ctrlUnit">
                <dt><label for="ctrl_password">Password:</label></dt>
                <dd><input type="password" name="password" class="textCtrl OptOut" id="ctrl_password" autocomplete="off" /></dd>
            </dl>
    
            <dl class="ctrlUnit">
                <dt><label for="ctrl_confirm_password">Confirm Password:</label></dt>
                <dd>
                    <input type="password" name="password_confirm" class="textCtrl OptOut" id="ctrl_confirm_password" />
                    <p class="explain">Enter your password in the first box and confirm it in the second.</p>
                </dd>
            </dl>
        </fieldset>
    
    I've used a non greedy expression:

    Code:
    #<dl([\S\s].+?)>([\S\s].+?)(id="ctrl_email")([\S\s].+?)/dl>#s
    
    But what this seems to do is get the largest match from left start, so this:

    Code:
    <dl class="ctrlUnit">
            <dt><label for="ctrl_username">Name:</label></dt>
            <dd>
                <input type="text" name="username" value="" class="textCtrl" id="ctrl_username" autofocus="true" autocomplete="off" />
                <p class="explain">This is the name that will be shown with your messages. You may use any name you wish. Once set, this cannot be changed.</p>
            </dd>
        </dl>
    
        <dl class="ctrlUnit">
            <dt><label for="ctrl_email">Email:</label></dt>
            <dd><input type="email" name="email" value="" dir="ltr" class="textCtrl" id="ctrl_email" /></dd>
        </dl>
    
    Is there something wrong with the way I'm using this regular expression (is it not greedy, since I'm ending in ?), or is this a xenforo bug?
     
    Last edited: Mar 14, 2014
  2. tenants

    tenants Well-Known Member

    Okay, this is a misunderstanding on my behalf (it's been a while since I've had to think about regex)

    Since even lazy modifies start from left to right and match as soon as possible, hmm, I will need to do a bit of research to get this match correct
     
  3. Adrian Schneider

    Adrian Schneider Active Member

    tenants likes this.
  4. cclaerhout

    cclaerhout Well-Known Member

    You're using the single line mode (#...#s). It makes things hard since you need first to focus on the beginning of the string you want to match. Once you match it, you can enable the single line mode and have fun with. To manually enable the single line mode (which means a point will become greedy and consider the string as a big single line) use in the regex: (?s). To disable it, use (?-s).

    Here's what you want, well at least one possible solution (do not enable automatically the single line mode, other your regex will become greedy as before):
    Code:
    #<dl[^>]*?>[\s]*<dt>.*ctrl_password(?s).*?</dl>#
    
    P.S: by the way, instead of using the manual activation of the single line mode, you can also use what you did [\s\S]* (will match white spaces and none white space), this will do the same (so no need both of them), which is very convenient with JavaScript (doesn't have the single line mode). Ie;
    Code:
    #<dl[^>]*?>[\s]*<dt>.*ctrl_password[\s\S]*?</dl>#
     
    tenants likes this.
  5. EQnoble

    EQnoble Well-Known Member

    This will match that entire ctrlUnit for that email block with the match @ \0
    Code:
    #<dl[^>]+>[^<]+<dt><label for="ctrl_email.+ctrl_email[^<]+</dd>[^<]+</dl>#siu
    Replace it like this for example if you want to add something after it.
    Code:
    \0
    <yourNewStuff>
        I am new stuff.
    </yourNewStuff>
    If you want to match a specific piece inside of that block that was matched and do a replacement on those pieces within the block you just add pattern groupings in your match expression and use the \references to replace the fragments as you see fit, it's hard to give you an example of that without knowing an exact situation.
     
    tenants likes this.
  6. tenants

    tenants Well-Known Member

    @Adrian Schneider
    Funny link ;), and good point, although this is not really html (it might look like it), these are template strings, using the xenfore core Template Modification System which has a regex matching option
    No user data is pushed through the regex, only admins template code
    The admin template code is finite (much smaller than usual html)

    @cclaerhout
    That's a clever solution, I've learnt something new about single line mode... I'm going to have to do some reading around this (I clearly wasn't using single line as intended, and just grabbed some partial regex that I had used previously)

    @EQnoble
    That's also a really intelligent solution, basically making sure the correct number gt lt occur to get the smallest match

    I have some food for thought now :)
     
    Last edited: Mar 14, 2014
    MattW likes this.
  7. tenants

    tenants Well-Known Member

    Just for information, I've found this to be quite good (very close to @EQnoble method, thanks for the help everyone :) )

    Code:
    #<dl[^>]*>[^<]+<dt[^>]*>[^<]*<label for="ctrl_email.+ctrl_email(.*?)</dl>#siu
    
    I add the (.*?) after ctrl_email so users can additional nodes, and the matches will still occur, it's non-greedy, so it will get the first match of </dl>
    This means, users can make quite a lot of template changes, and it should still find a match without issue

    I use * instead of + since 0 matches could occur, for instance: <dl[^>]*> should match <dl> (but <dl[^>]+> would not)

    The only situation where this regex wont find a match is if additional nodes are added between <dl>&<dt> or <dt>&<label, or something is added between <label & for
    And I think this should be fairly rare

    So now users can add classes to dl, dt and add any nodes such as <i></i> <span><div> etc.. as long as these are added after the <lable tag, allowing this regex to match with quite a lot of flexibility
     
    Last edited: Mar 15, 2014

Share This Page