Non Greedy Regular Expression Template Modification Match

tenants

Well-known member
Hello, this might be a misunderstanding by me, or a xenforo regex match bug (it's more likely to be the former)

I'm trying to use a non greedy regex match for the template registration_form

I'm just trying to find this:
Code:
    <dl class="ctrlUnit">
        <dt><label for="ctrl_email">Email:</label></dt>
        <dd><input type="email" name="email" value="" dir="ltr" class="textCtrl" id="ctrl_email" /></dd>
    </dl>

from this:

Code:
<dl class="ctrlUnit">
        <dt><label for="ctrl_username">Name:</label></dt>
        <dd>
            <input type="text" name="username" value="" class="textCtrl" id="ctrl_username" autofocus="true" autocomplete="off" />
            <p class="explain">This is the name that will be shown with your messages. You may use any name you wish. Once set, this cannot be changed.</p>
        </dd>
    </dl>

    <dl class="ctrlUnit">
        <dt><label for="ctrl_email">Email:</label></dt>
        <dd><input type="email" name="email" value="" dir="ltr" class="textCtrl" id="ctrl_email" /></dd>
    </dl>

    <fieldset>
        <dl class="ctrlUnit">
            <dt><label for="ctrl_password">Password:</label></dt>
            <dd><input type="password" name="password" class="textCtrl OptOut" id="ctrl_password" autocomplete="off" /></dd>
        </dl>

        <dl class="ctrlUnit">
            <dt><label for="ctrl_confirm_password">Confirm Password:</label></dt>
            <dd>
                <input type="password" name="password_confirm" class="textCtrl OptOut" id="ctrl_confirm_password" />
                <p class="explain">Enter your password in the first box and confirm it in the second.</p>
            </dd>
        </dl>
    </fieldset>

I've used a non greedy expression:

Code:
#<dl([\S\s].+?)>([\S\s].+?)(id="ctrl_email")([\S\s].+?)/dl>#s

But what this seems to do is get the largest match from left start, so this:

Code:
<dl class="ctrlUnit">
        <dt><label for="ctrl_username">Name:</label></dt>
        <dd>
            <input type="text" name="username" value="" class="textCtrl" id="ctrl_username" autofocus="true" autocomplete="off" />
            <p class="explain">This is the name that will be shown with your messages. You may use any name you wish. Once set, this cannot be changed.</p>
        </dd>
    </dl>

    <dl class="ctrlUnit">
        <dt><label for="ctrl_email">Email:</label></dt>
        <dd><input type="email" name="email" value="" dir="ltr" class="textCtrl" id="ctrl_email" /></dd>
    </dl>

Is there something wrong with the way I'm using this regular expression (is it not greedy, since I'm ending in ?), or is this a xenforo bug?
 
Last edited:
Okay, this is a misunderstanding on my behalf (it's been a while since I've had to think about regex)

Since even lazy modifies start from left to right and match as soon as possible, hmm, I will need to do a bit of research to get this match correct
 
You're using the single line mode (#...#s). It makes things hard since you need first to focus on the beginning of the string you want to match. Once you match it, you can enable the single line mode and have fun with. To manually enable the single line mode (which means a point will become greedy and consider the string as a big single line) use in the regex: (?s). To disable it, use (?-s).

Here's what you want, well at least one possible solution (do not enable automatically the single line mode, other your regex will become greedy as before):
Code:
#<dl[^>]*?>[\s]*<dt>.*ctrl_password(?s).*?</dl>#

P.S: by the way, instead of using the manual activation of the single line mode, you can also use what you did [\s\S]* (will match white spaces and none white space), this will do the same (so no need both of them), which is very convenient with JavaScript (doesn't have the single line mode). Ie;
Code:
#<dl[^>]*?>[\s]*<dt>.*ctrl_password[\s\S]*?</dl>#
 
I'm just trying to find this:
Code:
    <dl class="ctrlUnit">
        <dt><label for="ctrl_email">Email:</label></dt>
        <dd><input type="email" name="email" value="" dir="ltr" class="textCtrl" id="ctrl_email" /></dd>
    </dl>

from this:

Code:
<dl class="ctrlUnit">
        <dt><label for="ctrl_username">Name:</label></dt>
        <dd>
            <input type="text" name="username" value="" class="textCtrl" id="ctrl_username" autofocus="true" autocomplete="off" />
            <p class="explain">This is the name that will be shown with your messages. You may use any name you wish. Once set, this cannot be changed.</p>
        </dd>
    </dl>

    <dl class="ctrlUnit">
        <dt><label for="ctrl_email">Email:</label></dt>
        <dd><input type="email" name="email" value="" dir="ltr" class="textCtrl" id="ctrl_email" /></dd>
    </dl>

    <fieldset>
        <dl class="ctrlUnit">
            <dt><label for="ctrl_password">Password:</label></dt>
            <dd><input type="password" name="password" class="textCtrl OptOut" id="ctrl_password" autocomplete="off" /></dd>
        </dl>

        <dl class="ctrlUnit">
            <dt><label for="ctrl_confirm_password">Confirm Password:</label></dt>
            <dd>
                <input type="password" name="password_confirm" class="textCtrl OptOut" id="ctrl_confirm_password" />
                <p class="explain">Enter your password in the first box and confirm it in the second.</p>
            </dd>
        </dl>
    </fieldset>

This will match that entire ctrlUnit for that email block with the match @ \0
Code:
#<dl[^>]+>[^<]+<dt><label for="ctrl_email.+ctrl_email[^<]+</dd>[^<]+</dl>#siu
Replace it like this for example if you want to add something after it.
Code:
\0
<yourNewStuff>
    I am new stuff.
</yourNewStuff>

If you want to match a specific piece inside of that block that was matched and do a replacement on those pieces within the block you just add pattern groupings in your match expression and use the \references to replace the fragments as you see fit, it's hard to give you an example of that without knowing an exact situation.
 
@Adrian Schneider
Funny link ;), and good point, although this is not really html (it might look like it), these are template strings, using the xenfore core Template Modification System which has a regex matching option
No user data is pushed through the regex, only admins template code
The admin template code is finite (much smaller than usual html)

@cclaerhout
That's a clever solution, I've learnt something new about single line mode... I'm going to have to do some reading around this (I clearly wasn't using single line as intended, and just grabbed some partial regex that I had used previously)

@EQnoble
That's also a really intelligent solution, basically making sure the correct number gt lt occur to get the smallest match

I have some food for thought now :)
 
Last edited:
Just for information, I've found this to be quite good (very close to @EQnoble method, thanks for the help everyone :) )

Code:
#<dl[^>]*>[^<]+<dt[^>]*>[^<]*<label for="ctrl_email.+ctrl_email(.*?)</dl>#siu

I add the (.*?) after ctrl_email so users can additional nodes, and the matches will still occur, it's non-greedy, so it will get the first match of </dl>
This means, users can make quite a lot of template changes, and it should still find a match without issue

I use * instead of + since 0 matches could occur, for instance: <dl[^>]*> should match <dl> (but <dl[^>]+> would not)

The only situation where this regex wont find a match is if additional nodes are added between <dl>&<dt> or <dt>&<label, or something is added between <label & for
And I think this should be fairly rare

So now users can add classes to dl, dt and add any nodes such as <i></i> <span><div> etc.. as long as these are added after the <lable tag, allowing this regex to match with quite a lot of flexibility
 
Last edited:
Top Bottom