Problem importing RSS Items

robdog

Well-known member
I am importing news from an RSS feed and for some reason certain stories will get imported over and over and over again. Here is an example of a story:

Code:
<item>
<title>
Crosby is Hall of Fame's featured player of the week
</title>
<link>
http://www.packers.com/news-and-events/article-1/Crosby-is-Hall-of-Fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4
</link>
<description>
Kicker Mason Crosby is the Packers Hall of Fame’s Featured Player of the week for Week 7. In Sunday afternoon’s 33-27 victory over the Minnesota Vikings, he had four successful extra points and fo...
</description>
<pubDate>Mon, 24 Oct 2011 18:59:41 GMT</pubDate>
<guid>
http://www.packers.com/news-and-events/article-1/Crosby-is-Hall-of-Fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4
</guid>
<dc:date>2011-10-24T18:59:41Z</dc:date>
</item>

Any idea why the above story will get imported numerous times? Really causing problems with my site and my Twitter feed, lol.
 
I am unable to reproduce the problem on my test forum using this feed. I am not getting any double posts. The guids look good. I will keep the feed running to pull in new items as they are posted. I will report back.

You say the Crosby is Hall of Fame's featured player of the week story keeps getting posted? Are there any others?

This may be a problem that is specific to your forum. I can take a look if you are comfortable giving me access to your server and forum.
 
After doing some digging, I am getting an invalid index error when it is trying to UNSET duplicate RSS Feed Entries.

Here is the index value:
Code:
http://www.packers.com/news-and-events/article-1/Crosby-is-hall-of-fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4

Here is the array of indexes that it is looking through to unset:
Code:
Array
(
    [http://www.packers.com/news-and-events/article_zeller/article-1/Inside-Slant-AJ-Hawk/e58db9a8-1011-4548-acd7-b3620e6e1d11] => 0
    [http://www.packers.com/news-and-events/article_zeller/article-1/Burnetts-career-on-the-rise/da8fabd0-cadd-4d82-a221-ef7ead32fc70] => 1
    [http://www.packers.com/news-and-events/article-1/James-Westrich-named-HS-Coach-of-the-Week/61f124cf-aa8b-4935-8873-f8997fef31aa] => 2
    [http://www.packers.com/news-and-events/article-1/Packers-Bye-Week-Dope-Sheet/53ed3dc8-26ff-46b6-ab96-b2a1cdb1d4cd] => 3
    [http://www.packers.com/news-and-events/article_spofford/article-1/Crosby-wins-second-weekly-award-of-2011/fb15e234-67e7-4c29-96aa-0af2799f0a61] => 4
    [http://www.packers.com/news-and-events/article-1/Point-counterpoint-/d0ca20a6-5258-4b7f-83fe-81c0ac062bcd] => 5
    [http://www.packers.com/news-and-events/article-1/Tuesdays-with-McCarthy/f5296a20-8df2-4da1-9952-7849e0f3f26d] => 6
    [http://www.packers.com/news-and-events/article-1/Aaron-Rodgers-nominated-for-Air-NFL-Player-of-the-Week/d01af2b3-c0da-4ec3-adae-1fff7a76cd1a] => 7
    [http://www.packers.com/news-and-events/article_spofford/article-1/McCarthy-says-Rodgers-best-decision-maker/a5c0c4cd-2473-42f1-83ea-68f8c0072751] => 8
    [http://www.packers.com/news-and-events/article_ketchman/article-1/Capers-says-defense-can-get-better/1a1ebea2-6f4f-44ed-bc0d-d78396fac006] => 9
    [http://www.packers.com/news-and-events/article-1/Crosby-is-Hall-of-Fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4] => 10
    [http://www.packers.com/news-and-events/article_spofford/article-1/Game-notes-Crosby-kicks-franchise-record-FG/280db62f-1391-47ad-a33c-e7739b910d0c] => 11
    [http://www.packers.com/news-and-events/article_spofford/article-1/Woodson-interceptions-spark-third-quarter-spurt/8a8ae9fb-2cb3-4ff8-9a50-afc417d86839] => 12
    [http://www.packers.com/news-and-events/article_ketchman/article-1/What-if-defense-comes-to-life-too/f5d1a0cc-761a-4e3f-aa9a-29b752cf5e14] => 13
    [http://www.packers.com/news-and-events/article_ketchman/article-1/Life-is-good-after-win-in-Minnesota/b9e00978-e87f-4c04-a01a-63fa52c97f3a] => 14
)

As you can see, the index value is in the array, but it is not finding it for some reason. Really weird...
 
Can you post the complete error message from the log? I assume that error is from this function:

XenForo_Model_Feed::_checkProcessedEntries

I would like to see the exact error though.

I still haven't encountered any errors with that feed on my forum.
 
I think this is what you want:

Code:
Server Error

Undefined index: http://www.packers.com/news-and-events/article-1/Crosby-is-hall-of-fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4

XenForo_Application::handlePhpError() in XenForo/Model/Feed.php at line 405
XenForo_Model_Feed->_checkProcessedEntries() in XenForo/Model/Feed.php at line 537
XenForo_Model_Feed->importFeedData() in XenForo/ControllerAdmin/Feed.php at line 164
XenForo_ControllerAdmin_Feed->actionImport() in XenForo/FrontController.php at line 310
XenForo_FrontController->dispatch() in XenForo/FrontController.php at line 132
XenForo_FrontController->run() in /home/packerfo/public_html/admin.php at line 13

But I modifed the Feed.php file to output some debug. Basically the error line looks like this:

echo $ids[$id] . "<br /><br />";

Here is the entire function just in case you want to see it. (lots of debug code, lol. YEAH ECHO!)

PHP:
protected function _checkProcessedEntries(array $feedData, array $feed)
    {
        $ids = array();

        foreach ($feedData['entries'] AS $i => &$entry)
        {
            $ids[$entry['id']] = $i;

            $entry['hash'] = md5($entry['id'] . $entry['title'] . $entry['content_html']);
        }

        if (!$ids)
        {
            return $feedData;
        }

        $existing = $this->_getDb()->fetchCol('
            SELECT unique_id
            FROM xf_feed_log
            WHERE feed_id = ?
                AND unique_id IN (' . $this->_getDb()->quote(array_keys($ids)) . ')
        ', $feed['feed_id']);
        echo "Feed Data Entries:<br />";
        print_r($feedData['entries']);
        echo "<br /><br />";
        echo "Existing Entries:<br />";
        print_r($existing);
        echo "<br /><br />";

        foreach ($existing AS $id)
        {
            echo "ID Entries:<br />";
            print_r($ids);
            echo "<br /><br />";

            $output = $id;
            $output = preg_replace('/[^(\x20-\x7F)]*/','', $output);
            echo $output . "<br />";
            echo $id . "<br />";
            echo $ids[$id] . "<br /><br />";

            if (isset($ids[$id]))
            {
                unset($feedData['entries'][$ids[$id]]);
            }
        }

        $feedData['entries'] = $this->_limitEntries($feedData['entries'], self::$_maxEntriesPerImport);

        return $feedData;
    }
 
Line 405 falls in the middle of block comments.

What version of XenForo? Have you modified that file? (library/XenForo/Model/Feed.php)

I updated the Feed.php to output debug information. You can see the modified function in my last reply. Very weird that this whole thing is not working right with an array index problem. :(

I mean the feed entry is in the DB and inside the existing item array, but the index is just invalid when the checking happens. :(
 
Which line is 405? There are lots of arrays being handled in that function. I need to narrow this down so I know what index and where.

edit - ideally I need the full error message using a default Feed.php file.

edit 2 - and I still want to confirm the version of XenForo, just to be sure.
 
That is part of the problem. There is no error message if I use the default Feed.php. Since the array index check at ISSET will always return false for the entry that gets inserted over and over. I am only getting the error about the unset array index when I do the echo on that index.

I am use 1.1Beta3
 
I am only getting the error about the unset array index when I do the echo on that index.

That is a problem with your debug code. This line will return an "undefined index" error if $ids[$id] is not set which is the case for all "new" RSS items in the normal execution of this function:

Rich (BB code):
        foreach ($existing AS $id)
        {
            echo "ID Entries:<br />";
            print_r($ids);
            echo "<br /><br />";

            $output = $id;
            $output = preg_replace('/[^(\x20-\x7F)]*/','', $output);
            echo $output . "<br />";
            echo $id . "<br />";
            echo $ids[$id] . "<br /><br />";

            if (isset($ids[$id]))
            {
                unset($feedData['entries'][$ids[$id]]);
            }
        }

This error has nothing to do with the duplicate RSS posts. Back to square one.

I am still not having any problems with this feed. I will test it on 1.1 now...
 
Yeah, I understand the error, but when I do a print_r on $ids the key is in there. But for some reason, it is not finding that 1 specific key. Oh well, I will keep digging and see what I can find out.
 
Here is what I had to change it to in order for it to work:
PHP:
    protected function _checkProcessedEntries(array $feedData, array $feed)
    {
        $ids = array();

        foreach ($feedData['entries'] AS $i => &$entry)
        {
            //$ids[(string)$entry['id'] . ''] = $i;

            $entry['hash'] = md5($entry['id'] . $entry['title'] . $entry['content_html']);
            $ids[$entry['hash']] = $i;
        }

        if (!$ids)
        {
            return $feedData;
        }

        $existing = $this->_getDb()->fetchCol('
            SELECT hash
            FROM xf_feed_log
            WHERE feed_id = ?
                AND hash IN (' . $this->_getDb()->quote(array_keys($ids)) . ')
        ', $feed['feed_id']);

        foreach ($existing AS $id)
        {
            if (isset($ids[$id]))
            {
                unset($feedData['entries'][$ids[$id]]);
            }
        }

        $feedData['entries'] = $this->_limitEntries($feedData['entries'], self::$_maxEntriesPerImport);

        return $feedData;
    }

Basically switching the array index to the hash instead of the id. I do need to figure out if I want the entry title and entry content html in the hash, but I am just going to leave it that was for now.
 
Ok I see what you are getting at.

The id must not be equal then. Maybe there is an extra space character, or some entities or something. But it works fine on my forum. I would have to examine your database and forum more closely to make a determination.

That modification is a good alternative for processing dupes.
 
Top Bottom