Preprocessing BbCode-like markup with different syntax

Bloodcinder

Well-known member
Let's say I want to create something similar to a custom BbCode that uses a different kind of syntax, specifically not based on bracketed tags, in order to markup post text. Note that this question is not about how to do BbCode callbacks. I know how to do those.

Suppose I want to devise markup whose syntax is a set of curly braces with an optional opening symbol, a phrase, a colon, and then a number.
Code:
The arrow struck its target {+Attack Roll:20} but barely missed the vitals {!Critical Threat:6} {Damage:4}.

And I want it to be parsed and display in the post (technically cached as with a normal BbCode) as follows where the symbol or absence thereof selects a predefined color.

example.webp

Note that I already have a set of existing custom BbCodes on my forum that works that way (which is where I got the screenshot), but it requires a more cumbersome tag-based BbCode syntax. Also note that this question is not about how to design a regular expression to match the markup. I know how to do those.

Given that I'm not using the standard [tag] and [/tag] BbCode syntax with square brackets, what's the appropriate strategy to use here for arbitrary regex replacement? Is there a way to use a standard BbCode callback that somehow uses a different syntax, or do I need to use a different replacement technique?

It looks to me like the BbCode parser script is hard-coded to use square brackets, so I'm guessing I have to work outside that system. A reasonable solution would be something that converts the shorter custom syntax into the corresponding preexisting BbCodes at post/edit time, so that the BbCode system can handle the rest. Future edits to the post would show the BbCodes instead of the shorter codes, so this would not be ideal.

TL/DR: Suppose I want to run an arbitrary regex replacement in a post similar in almost every way to a BbCode except with a custom non-bracketed syntax. What's the right way?
 
Last edited:
I may have found the solution.

If I override the preFilterText method of the BbCode formatter, I can execute arbitrary code just before any BbCode is processed. I currently have a proof-of-concept that does a regex replacement to substitute the long-form standard BbCode equivalents in for any of the short-form custom syntax it encounters. Then when the BbCode is processed, it is as if the standard BbCodes were used all along, so it parses them like normal. Meanwhile, if you go back and edit the post, the original short-codes are still in the text. Basically, it's a preprocessor.

So, I can do exactly what I want to do. Now I have a different question, probably easier to answer than my last one: should I do it that way?

How inefficient is it to run a regex every time preFilterText is called? I presume that the final version gets catched because why wouldn't it, but for times when it's not cached, will that be substantially slower than the process XenForo already goes through to parse the BbCodes?

Here's the only instance on the entire Internet of anybody mentioning preFilterText...
For the record, with XenForo 1.2, you can now extend the function "preFilterText" from the formatter: (code omitted)

Maybe @cclaerhout knows, or one of the devs?
 
I've implemented the above idea and can confirm that it's fairly efficient and does use the BbCode cache. So unless anybody points out any issues with security or standard practices related to the use of the preFilterText method, I'll consider this problem solved.
 
Back
Top Bottom