nrep
Well-known member
I'm considering importing data from old custom forum software, where some of the really old posts contain poorly formatted data (primarily as this data has been imported several times before).
The problem is that many of the old posts have nested quote tags below the main content. For example:
I'm struggling to figure out a way to remove these. I've not been able to figure out a regex that can filter out any quotes or nested quotes placed at the end of the post content. The closest I've found is to use the following, but it only works when there are no nested quotes at the end:
I would keep all of the data returned from the first (.*).
However on posts with multiple nested quotes at the end, I can't figure out a regex that would fully work, as it becomes too greedy and fails under certain conditions.
I'd be grateful for a fresh pair of eyes to consider this problem and see if I've missed an easier way to do this.[/QUOTE]
The problem is that many of the old posts have nested quote tags below the main content. For example:
Code:
Here is the main part of the post - useful information.
[QUOTE=User 2]
Old quote post, not useful to keep below the text
[QUOTE=User 3]
A useless nested quote
[/QUOTE]
[/QUOTE]
[QUOTE=User 4]Separate useless quote[/QUOTE]
I'm struggling to figure out a way to remove these. I've not been able to figure out a regex that can filter out any quotes or nested quotes placed at the end of the post content. The closest I've found is to use the following, but it only works when there are no nested quotes at the end:
/(.*)\n\[QUOTE=(.+)\](.+?)\[\/QUOTE\]$/is
I would keep all of the data returned from the first (.*).
However on posts with multiple nested quotes at the end, I can't figure out a regex that would fully work, as it becomes too greedy and fails under certain conditions.
I'd be grateful for a fresh pair of eyes to consider this problem and see if I've missed an easier way to do this.[/QUOTE]
Last edited: