1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Preg_replace

Discussion in 'XenForo Development Discussions' started by Chris D, Jun 6, 2012.

  1. Chris D

    Chris D XenForo Developer Staff Member

    HTML:
    <br clear="both" style="clear: both;"/>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:ea02d7ea91ed2d99fb6c6ea207565aff:bx3z8VZ1DMiEUKZ%2BLAlsHPgBbnqglPHpKpvGQEELiE%2FCdgn7Tv2yRxCIuL3UrcrXGXEEx%2FCTN0jADA%3D%3D'><img border='0' title='Email this Article' alt='Email this Article' src='http://images.pheedo.com/images/mm/emailthis.png'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:36ae735fce9e9cb922cd8f4385f568e3:Q4Hx1R2t4EIooQXmdImKy%2F%2Bwkc3J56YG%2B3SPxqTJhEzie51%2FZhKY6B1MKDdlyTfyIW3LE5wAwp3eig%3D%3D'><img border='0' title='Add to del.icio.us' alt='Add to del.icio.us' src='http://images.pheedo.com/images/mm/delicious.gif'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:a5a95c2d0566ee119c290d1d86081f56:eFP7oQsJX1Z8M3TpyUvVSCq6DlnbVuyKyek6Uc1nt%2Fk%2BJ2gShB6OkAQA9%2BNej9Yvtr9Y%2B0CiwNyIrA%3D%3D'><img border='0' title='Add to digg' alt='Add to digg' src='http://images.pheedo.com/images/mm/digg.gif'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:0b92b5dbeeb51dabbe960fab009bc5aa:%2BCzl%2BuCS5W40TY1AfKkW3s1UU%2BMJ6qM0VzhN5%2FcUFrpxcdUCJjYqpeYKAnfLAe22Z8kk3oeu5QozImk%3D'><img border='0' title='Add to Facebook' alt='Add to Facebook' src='http://images.pheedo.com/images/mm/facebook.gif'/></a>
    <br clear="both" style="clear: both;"/>
    Can anyone help with a preg_replace expression that will remove everything between the beginning and end <br clear... tags?

    Cheers!
     
  2. Chris D

    Chris D XenForo Developer Staff Member

    I think I may have already had a regex that would work with the above, but I've just realised that actually the code is this:

    Code:
    <br clear="both" style="clear: both;"/>
    <br clear="both" style="clear: both;"/>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:ea02d7ea91ed2d99fb6c6ea207565aff:bx3z8VZ1DMiEUKZ%2BLAlsHPgBbnqglPHpKpvGQEELiE%2FCdgn7Tv2yRxCIuL3UrcrXGXEEx%2FCTN0jADA%3D%3D'><img border='0' title='Email this Article' alt='Email this Article' src='http://images.pheedo.com/images/mm/emailthis.png'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:36ae735fce9e9cb922cd8f4385f568e3:Q4Hx1R2t4EIooQXmdImKy%2F%2Bwkc3J56YG%2B3SPxqTJhEzie51%2FZhKY6B1MKDdlyTfyIW3LE5wAwp3eig%3D%3D'><img border='0' title='Add to del.icio.us' alt='Add to del.icio.us' src='http://images.pheedo.com/images/mm/delicious.gif'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:a5a95c2d0566ee119c290d1d86081f56:eFP7oQsJX1Z8M3TpyUvVSCq6DlnbVuyKyek6Uc1nt%2Fk%2BJ2gShB6OkAQA9%2BNej9Yvtr9Y%2B0CiwNyIrA%3D%3D'><img border='0' title='Add to digg' alt='Add to digg' src='http://images.pheedo.com/images/mm/digg.gif'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:0b92b5dbeeb51dabbe960fab009bc5aa:%2BCzl%2BuCS5W40TY1AfKkW3s1UU%2BMJ6qM0VzhN5%2FcUFrpxcdUCJjYqpeYKAnfLAe22Z8kk3oeu5QozImk%3D'><img border='0' title='Add to Facebook' alt='Add to Facebook' src='http://images.pheedo.com/images/mm/facebook.gif'/></a>
    <br clear="both" style="clear: both;"/>
    My code was looking to replace anything between <br clear="both" style="clear: both;"/> and <br clear="both" style="clear: both;"/> so it was probably operating on the first two lines... Either that or I was just wrong anyway! (possible)

    If anyone can help with the correct regex for preg_replace for the code in this post it would be greatly appreciated.

    To be clear, I would like to replace the above code with:

    Code:
    <br clear="both" style="clear: both;"/>
    <br clear="both" style="clear: both;"/>
    <br clear="both" style="clear: both;"/>
    Or, replacing ALL of that code above with '' would be even better.
     
  3. Naatan

    Naatan Well-Known Member

    The problem here is that you have multiple occurrences of the br element, it makes it pretty complicated to indicate to regex with what br it should start and where it should stop.

    Can you assume that the elements you want to strip are always "a" elements? And is the html code you pasted the entire bit that the regex would be performed on or did you cut out the relevant portion?
     
    Yorick likes this.
  4. Chris D

    Chris D XenForo Developer Staff Member

    The elements I want to strip are always a elements, yes. I am doing this for Jeff Fuqua and I am trying to strip out some unnecessary content from one of his feeds, namely the four a elements which are sharing buttons for the site the feed comes from.

    A typical feed item looks like:

    Code:
    <atom:link rel="self" href="http://feeds.cbssports.com/cbssportsline/nfl_ten_rapidreports" type="application/rss+xml"/>
    <item>
    <title><![CDATA[Beddingfield named Director of College Scouting]]></title>
    <description><![CDATA[The Titans announced on Wednesday that Blake Beddingfield has been promoted to Director of College Scouting. Beddingfield is in his 14th season with Tennessee and has served as the team's scouting coordinator for the last five. &ldquo;I'm happy for Blake that we were able to get this done,&rdquo; GM Ruston Webster said. &ldquo;He's been a key piece of our scouting puzzle.&rdquo;<br clear="both" style="clear: both;"/>
    <br clear="both" style="clear: both;"/>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:d23a51d25b244be58f8da229a9a0d7b6:9MP7DW6FcEhES%2FelVL2kUnqG4J70oYaOewONF6xNT2EA2St5aRgAQLBWPmD2%2BKxiOjdQGoofj2RWXg%3D%3D'><img border='0' title='Email this Article' alt='Email this Article' src='http://images.pheedo.com/images/mm/emailthis.png'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:4995ac10fd1cc5723ff2ec51d009b4df:eCvx3iMGnpZBaZSAbwXLDV3xdRYHFZeX0NabbCWr5nQiYz6Nr4GFVUdoDU9YIMVHGkIyJ8q8LXbDaA%3D%3D'><img border='0' title='Add to del.icio.us' alt='Add to del.icio.us' src='http://images.pheedo.com/images/mm/delicious.gif'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:b818fc7d9ea6ddcd55301a23a7b36134:vJ4UfGFoIQVkHMmwzq82uNNcAlTHn%2F7JaxJYj4cts6pdUMb9OCFnzroU%2Bz0%2Bwi0rKtgRWyNlZc9USw%3D%3D'><img border='0' title='Add to digg' alt='Add to digg' src='http://images.pheedo.com/images/mm/digg.gif'/></a>
      <a style='font-size: 10px; color: maroon;' href='http://www.pheedcontent.com/hostedMorselClick.php?hfmm=v3:646a53e89fced4ad67275e1f7525e93b:EDfrNkxHUh0kRpp00ZiffhRePDajC9IQeUsSEIO%2B2VMW9ZKvqpSbhLfTfkifamlZZNLWzsdZbMvjPb8%3D'><img border='0' title='Add to Facebook' alt='Add to Facebook' src='http://images.pheedo.com/images/mm/facebook.gif'/></a>
    <br clear="both" style="clear: both;"/>
    <a href="http://ads.pheedo.com/click.phdo?s=d134935ba12864acb047a8d407c1a804&p=1"><img alt="" style="border: 0;" border="0" src="http://ads.pheedo.com/img.phdo?s=d134935ba12864acb047a8d407c1a804&p=1"/></a>
    <img alt="" height="0" width="0" border="0" style="display:none" src="http://tags.bluekai.com/site/5148"/><img alt="" height="0" width="0" border="0" style="display:none" src="http://insight.adsrvr.org/track/evnt/?ct=0:nbu4f5v&adv=wouzn4v&fmt=3"/>]]></description>
    <link>http://feeds.cbssports.com/click.phdo?i=d134935ba12864acb047a8d407c1a804</link>
    <pheedo:origLink>http://www.cbssports.com/nfl/rapid-reports/post/19281852</pheedo:origLink>
    <guid isPermaLink="false">http://www.cbssports.com/nfl/rapid-reports/post/19281852</guid>
    <category>Tennessee Titans</category>
    <pubDate>Wed, 06 Jun 2012 18:25:56 EST</pubDate>
    </item>
    I've highlighted the bit I don't need.

    As you can see, every single instance of <br clear can be removed safely. It's not as if they're littered all over the place, hopefully that makes it easier?
     
  5. Naatan

    Naatan Well-Known Member

    This is VERY basic regex though, and will certainly not pass every use-case you can throw at it, but given that the content is always more or less what you pasted above the following ought to work.

    Code:
    /\<br.*?\<a.*?\<br.*?\/\>/si
    Example usage:

    PHP:
    preg_replace('/\<br.*?\<a.*?\<br.*?\/\>/si'''$string)
     
    Yorick likes this.
  6. Chris D

    Chris D XenForo Developer Staff Member

    I will try it and let you know :D Thanks.
     
  7. Chris D

    Chris D XenForo Developer Staff Member

    It works like a charm :) Well done.

    I've ensured it only runs the replace on a specific feed by wrapping it in

    PHP:
    if ($feed['title'] == 'Whatever Feed')
    So as long as they don't majorly change their format, we should be ok :D

    Thanks again, Naatan.
     
    Naatan likes this.

Share This Page