1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Problem importing RSS Items

Discussion in 'Troubleshooting and Problems' started by robdog, Oct 26, 2011.

  1. robdog

    robdog Well-Known Member

    I am importing news from an RSS feed and for some reason certain stories will get imported over and over and over again. Here is an example of a story:

    Code:
    <item>
    <title>
    Crosby is Hall of Fame's featured player of the week
    </title>
    <link>
    http://www.packers.com/news-and-events/article-1/Crosby-is-Hall-of-Fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4
    </link>
    <description>
    Kicker Mason Crosby is the Packers Hall of Fame’s Featured Player of the week for Week 7. In Sunday afternoon’s 33-27 victory over the Minnesota Vikings, he had four successful extra points and fo...
    </description>
    <pubDate>Mon, 24 Oct 2011 18:59:41 GMT</pubDate>
    <guid>
    http://www.packers.com/news-and-events/article-1/Crosby-is-Hall-of-Fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4
    </guid>
    <dc:date>2011-10-24T18:59:41Z</dc:date>
    </item>
    Any idea why the above story will get imported numerous times? Really causing problems with my site and my Twitter feed, lol.
     
  2. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    Dodgeboard likes this.
  3. robdog

    robdog Well-Known Member

  4. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    I am unable to reproduce the problem on my test forum using this feed. I am not getting any double posts. The guids look good. I will keep the feed running to pull in new items as they are posted. I will report back.

    You say the Crosby is Hall of Fame's featured player of the week story keeps getting posted? Are there any others?

    This may be a problem that is specific to your forum. I can take a look if you are comfortable giving me access to your server and forum.
     
    robdog likes this.
  5. robdog

    robdog Well-Known Member

    After doing some digging, I am getting an invalid index error when it is trying to UNSET duplicate RSS Feed Entries.

    Here is the index value:
    Code:
    http://www.packers.com/news-and-events/article-1/Crosby-is-hall-of-fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4


    Here is the array of indexes that it is looking through to unset:
    Code:
    Array
    (
        [http://www.packers.com/news-and-events/article_zeller/article-1/Inside-Slant-AJ-Hawk/e58db9a8-1011-4548-acd7-b3620e6e1d11] => 0
        [http://www.packers.com/news-and-events/article_zeller/article-1/Burnetts-career-on-the-rise/da8fabd0-cadd-4d82-a221-ef7ead32fc70] => 1
        [http://www.packers.com/news-and-events/article-1/James-Westrich-named-HS-Coach-of-the-Week/61f124cf-aa8b-4935-8873-f8997fef31aa] => 2
        [http://www.packers.com/news-and-events/article-1/Packers-Bye-Week-Dope-Sheet/53ed3dc8-26ff-46b6-ab96-b2a1cdb1d4cd] => 3
        [http://www.packers.com/news-and-events/article_spofford/article-1/Crosby-wins-second-weekly-award-of-2011/fb15e234-67e7-4c29-96aa-0af2799f0a61] => 4
        [http://www.packers.com/news-and-events/article-1/Point-counterpoint-/d0ca20a6-5258-4b7f-83fe-81c0ac062bcd] => 5
        [http://www.packers.com/news-and-events/article-1/Tuesdays-with-McCarthy/f5296a20-8df2-4da1-9952-7849e0f3f26d] => 6
        [http://www.packers.com/news-and-events/article-1/Aaron-Rodgers-nominated-for-Air-NFL-Player-of-the-Week/d01af2b3-c0da-4ec3-adae-1fff7a76cd1a] => 7
        [http://www.packers.com/news-and-events/article_spofford/article-1/McCarthy-says-Rodgers-best-decision-maker/a5c0c4cd-2473-42f1-83ea-68f8c0072751] => 8
        [http://www.packers.com/news-and-events/article_ketchman/article-1/Capers-says-defense-can-get-better/1a1ebea2-6f4f-44ed-bc0d-d78396fac006] => 9
        [http://www.packers.com/news-and-events/article-1/Crosby-is-Hall-of-Fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4] => 10
        [http://www.packers.com/news-and-events/article_spofford/article-1/Game-notes-Crosby-kicks-franchise-record-FG/280db62f-1391-47ad-a33c-e7739b910d0c] => 11
        [http://www.packers.com/news-and-events/article_spofford/article-1/Woodson-interceptions-spark-third-quarter-spurt/8a8ae9fb-2cb3-4ff8-9a50-afc417d86839] => 12
        [http://www.packers.com/news-and-events/article_ketchman/article-1/What-if-defense-comes-to-life-too/f5d1a0cc-761a-4e3f-aa9a-29b752cf5e14] => 13
        [http://www.packers.com/news-and-events/article_ketchman/article-1/Life-is-good-after-win-in-Minnesota/b9e00978-e87f-4c04-a01a-63fa52c97f3a] => 14
    )
    As you can see, the index value is in the array, but it is not finding it for some reason. Really weird...
     
  6. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    Can you post the complete error message from the log? I assume that error is from this function:

    XenForo_Model_Feed::_checkProcessedEntries

    I would like to see the exact error though.

    I still haven't encountered any errors with that feed on my forum.
     
  7. robdog

    robdog Well-Known Member

    I think this is what you want:

    Code:
    Server Error
    
    Undefined index: http://www.packers.com/news-and-events/article-1/Crosby-is-hall-of-fames-featured-player-of-the-week/86179382-7d56-4548-b38a-faf998f834e4
    
    XenForo_Application::handlePhpError() in XenForo/Model/Feed.php at line 405
    XenForo_Model_Feed->_checkProcessedEntries() in XenForo/Model/Feed.php at line 537
    XenForo_Model_Feed->importFeedData() in XenForo/ControllerAdmin/Feed.php at line 164
    XenForo_ControllerAdmin_Feed->actionImport() in XenForo/FrontController.php at line 310
    XenForo_FrontController->dispatch() in XenForo/FrontController.php at line 132
    XenForo_FrontController->run() in /home/packerfo/public_html/admin.php at line 13
    But I modifed the Feed.php file to output some debug. Basically the error line looks like this:

    echo $ids[$id] . "<br /><br />";

    Here is the entire function just in case you want to see it. (lots of debug code, lol. YEAH ECHO!)

    PHP:
    protected function _checkProcessedEntries(array $feedData, array $feed)
        {
            
    $ids = array();

            foreach (
    $feedData['entries'] AS $i => &$entry)
            {
                
    $ids[$entry['id']] = $i;

                
    $entry['hash'] = md5($entry['id'] . $entry['title'] . $entry['content_html']);
            }

            if (!
    $ids)
            {
                return 
    $feedData;
            }

            
    $existing $this->_getDb()->fetchCol('
                SELECT unique_id
                FROM xf_feed_log
                WHERE feed_id = ?
                    AND unique_id IN (' 
    $this->_getDb()->quote(array_keys($ids)) . ')
            '
    $feed['feed_id']);
            echo 
    "Feed Data Entries:<br />";
            
    print_r($feedData['entries']);
            echo 
    "<br /><br />";
            echo 
    "Existing Entries:<br />";
            
    print_r($existing);
            echo 
    "<br /><br />";

            foreach (
    $existing AS $id)
            {
                echo 
    "ID Entries:<br />";
                
    print_r($ids);
                echo 
    "<br /><br />";

                
    $output $id;
                
    $output preg_replace('/[^(\x20-\x7F)]*/',''$output);
                echo 
    $output "<br />";
                echo 
    $id "<br />";
                echo 
    $ids[$id] . "<br /><br />";

                if (isset(
    $ids[$id]))
                {
                    unset(
    $feedData['entries'][$ids[$id]]);
                }
            }

            
    $feedData['entries'] = $this->_limitEntries($feedData['entries'], self::$_maxEntriesPerImport);

            return 
    $feedData;
        }
     
  8. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    Line 405 falls in the middle of block comments.

    What version of XenForo? Have you modified that file? (library/XenForo/Model/Feed.php)
     
  9. robdog

    robdog Well-Known Member

    I updated the Feed.php to output debug information. You can see the modified function in my last reply. Very weird that this whole thing is not working right with an array index problem. :(

    I mean the feed entry is in the DB and inside the existing item array, but the index is just invalid when the checking happens. :(
     
  10. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    Which line is 405? There are lots of arrays being handled in that function. I need to narrow this down so I know what index and where.

    edit - ideally I need the full error message using a default Feed.php file.

    edit 2 - and I still want to confirm the version of XenForo, just to be sure.
     
  11. robdog

    robdog Well-Known Member

    That is part of the problem. There is no error message if I use the default Feed.php. Since the array index check at ISSET will always return false for the entry that gets inserted over and over. I am only getting the error about the unset array index when I do the echo on that index.

    I am use 1.1Beta3
     
  12. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    That is a problem with your debug code. This line will return an "undefined index" error if $ids[$id] is not set which is the case for all "new" RSS items in the normal execution of this function:

    Code:
            foreach ($existing AS $id)
            {
                echo "ID Entries:<br />";
                print_r($ids);
                echo "<br /><br />";
    
                $output = $id;
                $output = preg_replace('/[^(\x20-\x7F)]*/','', $output);
                echo $output . "<br />";
                echo $id . "<br />";
                echo $ids[$id] . "<br /><br />";
    
                if (isset($ids[$id]))
                {
                    unset($feedData['entries'][$ids[$id]]);
                }
            }
    
    This error has nothing to do with the duplicate RSS posts. Back to square one.

    I am still not having any problems with this feed. I will test it on 1.1 now...
     
  13. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    It works fine in 1.1 too. As before, I will let it continue to run so I can see what happens as new items are added to the feed.
     
  14. robdog

    robdog Well-Known Member

    Yeah, I understand the error, but when I do a print_r on $ids the key is in there. But for some reason, it is not finding that 1 specific key. Oh well, I will keep digging and see what I can find out.
     
  15. robdog

    robdog Well-Known Member

    Here is what I had to change it to in order for it to work:
    PHP:
        protected function _checkProcessedEntries(array $feedData, array $feed)
        {
            
    $ids = array();

            foreach (
    $feedData['entries'] AS $i => &$entry)
            {
                
    //$ids[(string)$entry['id'] . ''] = $i;

                
    $entry['hash'] = md5($entry['id'] . $entry['title'] . $entry['content_html']);
                
    $ids[$entry['hash']] = $i;
            }

            if (!
    $ids)
            {
                return 
    $feedData;
            }

            
    $existing $this->_getDb()->fetchCol('
                SELECT hash
                FROM xf_feed_log
                WHERE feed_id = ?
                    AND hash IN (' 
    $this->_getDb()->quote(array_keys($ids)) . ')
            '
    $feed['feed_id']);

            foreach (
    $existing AS $id)
            {
                if (isset(
    $ids[$id]))
                {
                    unset(
    $feedData['entries'][$ids[$id]]);
                }
            }

            
    $feedData['entries'] = $this->_limitEntries($feedData['entries'], self::$_maxEntriesPerImport);

            return 
    $feedData;
        }
    Basically switching the array index to the hash instead of the id. I do need to figure out if I want the entry title and entry content html in the hash, but I am just going to leave it that was for now.
     
  16. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    Ok I see what you are getting at.

    The id must not be equal then. Maybe there is an extra space character, or some entities or something. But it works fine on my forum. I would have to examine your database and forum more closely to make a determination.

    That modification is a good alternative for processing dupes.
     

Share This Page