• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Preg_replace

Chris D

XenForo developer
Staff member
#1
HTML:
<br clear="both" style="clear: both;"/>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:ea02d7ea91ed2d99fb6c6ea207565aff:bx3z8VZ1DMiEUKZ%2BLAlsHPgBbnqglPHpKpvGQEELiE%2FCdgn7Tv2yRxCIuL3UrcrXGXEEx%2FCTN0jADA%3D%3D'><img border='0' title='Email this Article' alt='Email this Article' src='http://images.pheedo.com/images/mm/emailthis.png'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:36ae735fce9e9cb922cd8f4385f568e3:Q4Hx1R2t4EIooQXmdImKy%2F%2Bwkc3J56YG%2B3SPxqTJhEzie51%2FZhKY6B1MKDdlyTfyIW3LE5wAwp3eig%3D%3D'><img border='0' title='Add to del.icio.us' alt='Add to del.icio.us' src='http://images.pheedo.com/images/mm/delicious.gif'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:a5a95c2d0566ee119c290d1d86081f56:eFP7oQsJX1Z8M3TpyUvVSCq6DlnbVuyKyek6Uc1nt%2Fk%2BJ2gShB6OkAQA9%2BNej9Yvtr9Y%2B0CiwNyIrA%3D%3D'><img border='0' title='Add to digg' alt='Add to digg' src='http://images.pheedo.com/images/mm/digg.gif'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:0b92b5dbeeb51dabbe960fab009bc5aa:%2BCzl%2BuCS5W40TY1AfKkW3s1UU%2BMJ6qM0VzhN5%2FcUFrpxcdUCJjYqpeYKAnfLAe22Z8kk3oeu5QozImk%3D'><img border='0' title='Add to Facebook' alt='Add to Facebook' src='http://images.pheedo.com/images/mm/facebook.gif'/></a>
<br clear="both" style="clear: both;"/>
Can anyone help with a preg_replace expression that will remove everything between the beginning and end <br clear... tags?

Cheers!
 

Chris D

XenForo developer
Staff member
#2
I think I may have already had a regex that would work with the above, but I've just realised that actually the code is this:

Code:
<br clear="both" style="clear: both;"/>
<br clear="both" style="clear: both;"/>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:ea02d7ea91ed2d99fb6c6ea207565aff:bx3z8VZ1DMiEUKZ%2BLAlsHPgBbnqglPHpKpvGQEELiE%2FCdgn7Tv2yRxCIuL3UrcrXGXEEx%2FCTN0jADA%3D%3D'><img border='0' title='Email this Article' alt='Email this Article' src='http://images.pheedo.com/images/mm/emailthis.png'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:36ae735fce9e9cb922cd8f4385f568e3:Q4Hx1R2t4EIooQXmdImKy%2F%2Bwkc3J56YG%2B3SPxqTJhEzie51%2FZhKY6B1MKDdlyTfyIW3LE5wAwp3eig%3D%3D'><img border='0' title='Add to del.icio.us' alt='Add to del.icio.us' src='http://images.pheedo.com/images/mm/delicious.gif'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:a5a95c2d0566ee119c290d1d86081f56:eFP7oQsJX1Z8M3TpyUvVSCq6DlnbVuyKyek6Uc1nt%2Fk%2BJ2gShB6OkAQA9%2BNej9Yvtr9Y%2B0CiwNyIrA%3D%3D'><img border='0' title='Add to digg' alt='Add to digg' src='http://images.pheedo.com/images/mm/digg.gif'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:0b92b5dbeeb51dabbe960fab009bc5aa:%2BCzl%2BuCS5W40TY1AfKkW3s1UU%2BMJ6qM0VzhN5%2FcUFrpxcdUCJjYqpeYKAnfLAe22Z8kk3oeu5QozImk%3D'><img border='0' title='Add to Facebook' alt='Add to Facebook' src='http://images.pheedo.com/images/mm/facebook.gif'/></a>
<br clear="both" style="clear: both;"/>
My code was looking to replace anything between <br clear="both" style="clear: both;"/> and <br clear="both" style="clear: both;"/> so it was probably operating on the first two lines... Either that or I was just wrong anyway! (possible)

If anyone can help with the correct regex for preg_replace for the code in this post it would be greatly appreciated.

To be clear, I would like to replace the above code with:

Code:
<br clear="both" style="clear: both;"/>
<br clear="both" style="clear: both;"/>
<br clear="both" style="clear: both;"/>
Or, replacing ALL of that code above with '' would be even better.
 

Naatan

Well-known member
#3
The problem here is that you have multiple occurrences of the br element, it makes it pretty complicated to indicate to regex with what br it should start and where it should stop.

Can you assume that the elements you want to strip are always "a" elements? And is the html code you pasted the entire bit that the regex would be performed on or did you cut out the relevant portion?
 

Chris D

XenForo developer
Staff member
#4
The elements I want to strip are always a elements, yes. I am doing this for Jeff Fuqua and I am trying to strip out some unnecessary content from one of his feeds, namely the four a elements which are sharing buttons for the site the feed comes from.

A typical feed item looks like:

Code:
<atom:link rel="self" href="http://feeds.cbssports.com/cbssportsline/nfl_ten_rapidreports" type="application/rss+xml"/>
<item>
<title><![CDATA[Beddingfield named Director of College Scouting]]></title>
<description><![CDATA[The Titans announced on Wednesday that Blake Beddingfield has been promoted to Director of College Scouting. Beddingfield is in his 14th season with Tennessee and has served as the team's scouting coordinator for the last five. &ldquo;I'm happy for Blake that we were able to get this done,&rdquo; GM Ruston Webster said. &ldquo;He's been a key piece of our scouting puzzle.&rdquo;<br clear="both" style="clear: both;"/>
<br clear="both" style="clear: both;"/>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:d23a51d25b244be58f8da229a9a0d7b6:9MP7DW6FcEhES%2FelVL2kUnqG4J70oYaOewONF6xNT2EA2St5aRgAQLBWPmD2%2BKxiOjdQGoofj2RWXg%3D%3D'><img border='0' title='Email this Article' alt='Email this Article' src='http://images.pheedo.com/images/mm/emailthis.png'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:4995ac10fd1cc5723ff2ec51d009b4df:eCvx3iMGnpZBaZSAbwXLDV3xdRYHFZeX0NabbCWr5nQiYz6Nr4GFVUdoDU9YIMVHGkIyJ8q8LXbDaA%3D%3D'><img border='0' title='Add to del.icio.us' alt='Add to del.icio.us' src='http://images.pheedo.com/images/mm/delicious.gif'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:b818fc7d9ea6ddcd55301a23a7b36134:vJ4UfGFoIQVkHMmwzq82uNNcAlTHn%2F7JaxJYj4cts6pdUMb9OCFnzroU%2Bz0%2Bwi0rKtgRWyNlZc9USw%3D%3D'><img border='0' title='Add to digg' alt='Add to digg' src='http://images.pheedo.com/images/mm/digg.gif'/></a>
  <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:646a53e89fced4ad67275e1f7525e93b:EDfrNkxHUh0kRpp00ZiffhRePDajC9IQeUsSEIO%2B2VMW9ZKvqpSbhLfTfkifamlZZNLWzsdZbMvjPb8%3D'><img border='0' title='Add to Facebook' alt='Add to Facebook' src='http://images.pheedo.com/images/mm/facebook.gif'/></a>
<br clear="both" style="clear: both;"/>
<a href="http://ads.pheedo.com/click.phdo?s=d134935ba12864acb047a8d407c1a804&p=1"><img alt="" style="border: 0;" border="0" src="http://ads.pheedo.com/img.phdo?s=d134935ba12864acb047a8d407c1a804&p=1"/></a>
<img alt="" height="0" width="0" border="0" style="display:none" src="http://tags.bluekai.com/site/5148"/><img alt="" height="0" width="0" border="0" style="display:none" src="http://insight.adsrvr.org/track/evnt/?ct=0:nbu4f5v&adv=wouzn4v&fmt=3"/>]]></description>
<link>http://feeds.cbssports.com/click.phdo?i=d134935ba12864acb047a8d407c1a804</link>
<pheedo:origLink>http://www.cbssports.com/nfl/rapid-reports/post/19281852</pheedo:origLink>
<guid isPermaLink="false">http://www.cbssports.com/nfl/rapid-reports/post/19281852</guid>
<category>Tennessee Titans</category>
<pubDate>Wed, 06 Jun 2012 18:25:56 EST</pubDate>
</item>
I've highlighted the bit I don't need.

As you can see, every single instance of <br clear can be removed safely. It's not as if they're littered all over the place, hopefully that makes it easier?
 

Naatan

Well-known member
#5
This is VERY basic regex though, and will certainly not pass every use-case you can throw at it, but given that the content is always more or less what you pasted above the following ought to work.

Code:
/\<br.*?\<a.*?\<br.*?\/\>/si
Example usage:

PHP:
preg_replace('/\<br.*?\<a.*?\<br.*?\/\>/si', '', $string)
 

Chris D

XenForo developer
Staff member
#7
It works like a charm :) Well done.

I've ensured it only runs the replace on a specific feed by wrapping it in

PHP:
if ($feed['title'] == 'Whatever Feed')
So as long as they don't majorly change their format, we should be ok :D

Thanks again, Naatan.