RSS parsing issue

Jeff Fuqua

Well-known member
I just switched to XF and am loving it.

One small snag I have is that any rss feed I bring into the forum is having the blank lines between paragraphs removed. Any suggestions as to how to correct this?

I think this is a bug.

The HTML of the first item of that feed is here:

<div>� <strong>AFC Scenarios</strong>: <a href="" target="_blank">East</a> | <a href="" target="_blank">West</a> | <a href="" target="_blank">North</a> | <a href="" target="_blank">South</a><p><em>Yes, the start of training camps is two months away, but it’s never too early to consider the coming season. A look at the best-case and worst-case scenarios for the Titans in 2012.</em></p><p><img src="" alt="" width="80" height="80" class="floatright"/><strong>Dream scenario (11-5):</strong> <a href="">Jake Locker</a> beats out <a href="">Matt Hasselbeck</a> in the training camp quarterback battle and never looks back. The second-year signal-caller provides huge energy for the Titans, alleviating concerns about his accuracy. He spreads the ball around to a nice stable of receivers, including <a href="">Kenny Britt</a>, who stays healthy all season; <a href="">Nate Washington</a>, who matches last year’s effort; and <a href="">Kendall Wright</a>, who catches on quickly and doesn’t look like a rookie.</p><p>With a running quarterback under center and all those receivers helping stretch the field, <a href="">Chris Johnson</a> gets room and has a big rebound year. Defenses have to decide: Stack the box and risk yielding big passes or keep numbers in coverage and see CJ break off chunks.</p><p>The pass rush fares far better than last season because <a href="">Kamerion Wimbley</a> proves to be a great signing -- one that's made even more so because the offense gives Tennessee leads that make opponents one-dimensional.</p><p>Mike Munchak is a coach of the year candidate in line for an extension as he takes the Titans to the playoffs.</p><p><strong>Nightmare scenario (5-11):</strong> They head into camp thinking they have two quarterbacks but wind up with one getting hurt and the other struggling. Britt’s not healthy, Wright’s not effective and Johnson doesn’t rebound from last year, prompting speculation that his time as a playmaker has passed.</p><p>With inconsistent offense and not a lot of points, too much falls on the defense.</p><p>Teams get them in nickel and attack the guy in the slot. The Titans roll through several options there and none of them prove nearly as effective as <a href="">Cortland Finnegan</a> was. <a href="">Derrick Morgan</a> can’t mount the healthy and productive pass-rush campaign the team was banking on and Wimbley is also unable to lead any sort of consistent charge at opposing quarterbacks.</p><p>The Titans finish the year talking about how much better Locker will be in 2013. They also enter an uncertain time with Munchak and his staff, which head into the final year of their contracts not having shown they warrant extensions.</p></div><img src="" border="0" height="1" width="1" />

If you follow the below link, this is how it should output:

This is how it actualy outputs:

That to me is a bug.
The underlying problem is the HTML isn't converted properly.

<p>This is text</p>
<p>This is also text</p>

Gets output as:

This is text
This is also text

Instead of:

This is text
This is also text
I'm trawling through code to find how it parses this and where it converts content enclosed in <p> tags to have a single <br> but I haven't found it yet. If anyone else knows that'd be great :D
Still think this should be fixed properly.

The main issue is that <p> is treated as a block element by the HTML renderer which converts HTML to BB Code, I think.

I considered writing an add-on for the renderer to fix this issue, but I was unsure how that would then affect other parts of the software as I imagine that the XenForo_Html_Renderer_BbCode renderer is used in multiple features and I didn't want that to affect anything else.

From what I can tell looking at the code at


for these tags:

protected $_blockTags = array(
'p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
'dl', 'dt', 'dd', 'ol', 'ul', 'li',
'address', 'blockquote', 'del', 'div', 'hr', 'ins', 'pre',
'table', 'thead', 'tbody', 'tfoot', 'tr',
'header', 'nav', 'footer', 'article'

All of these tags, amongst other stuff that goes on, only a single "\n" is added. That would be correct for some, bug definitely not all of them.

I don't fully understand everything that goes on in the renderTag function, but definitely part of it is only adding one line break after some elements which actually require two. Or maybe this is just really tricky and there are some instances where only one line break would be required for <p> elements.

Anyway: I will leave all the complex stuff to people smarter than me :D

I have written a fix that specifically targets the prepareFeedEntry function. It replaces the existing function with the same function but with some additional replacements in $entry['content_html'] before it is then sent to the BBCode renderer.

It seems as though with the RSS feed that Jeff provided, the same problem with <p> tags occurs also with <div> tags. I have also identified another issue where an image with float: left applied in the style has that stripped and causes a bit of a layout malfunction. I will probably look at this too. But for now this solves the immediate issue.

The fix is attached.


    1.8 KB · Views: 32
Top Bottom