RSS/XML feed parser

Here's some php:

PHP
function xml_parser($page,$container,$tags,$number,$cdata) {
  if (!$number) {$number=100;}
  $stories=0;
  $xml=file_get_contents($page);
  preg_match_all("/<$container>.+<\/$container>/sU",$xml, $items);
  $items=$items[0];
  $itemsArray=array();
   foreach ($items as $item) {
    for($i=0; $i<count($tags); $i++) {
    preg_match("/<$tags[$i](.+)(<\/$tags[$i]>)/sU", $item, $tag);
    $this[$i]=preg_replace("/<$tags[$i]>(.+)(<\/$tags[$i]>)/sU",'$1',$tag);
    $this[$i]=array_map('html_entity_decode', $this[$i]);
    }
     if (count($itemsArray)<$number) {array_push($itemsArray, $this);}
   }
  $theData="<dl>";
  foreach ($itemsArray as $item) {
  for($i=0; $i<count($tags); $i++) {
  $data[$i]=$item[$i][0];    }
   $title=$data[0];
   $dpatterns[0]="/<img(.+)><\/img>/sU"; $dreplacements[0]='<img$1>';
   $dpatterns[1]="/<img(.+)\/>/sU"; $dreplacements[1]='<img$1>';
   $dpatterns[2]="/<(\/|)content?(.+|)>/sU"; $dreplacements[2]='';
   $dpatterns[3]="/border=\"0\"/sU"; $dreplacements[3]='';
   if ($cdata!='hide') {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='$1';
   }
   else {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='';
   }
   $description=preg_replace($dpatterns,$dreplacements,$data[1]);
   $link=preg_replace("/<link.+href=\"(.+)\"(.+|)\/>/sU",'$1',$data[2]);
   $date=$data[3];
   $theData.="
   <dt><a href=\"$link\">$title</a></dt>
   <dd class=\"story\">$description</dd>
   <dd>Date: $date</dd>\r";
  }
$theData.="</dl>";
return $theData;
}

$container='item';
$tags=array('title','description','link','pubDate');
$bbc=xml_parser("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml",$container,$tags,10,'');
$cnn=xml_parser("http://rss.cnn.com/rss/cnn_topstories.rss",$container,$tags,10,'');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'hide');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'');

$container='entry';
$tags=array('title','content','link','published');
$flickr=xml_parser("http://api.flickr.com/services/feeds/photos_public.gne",$container,$tags,10,'');

Here's some HTML with PHP

HTML/PHP
<h2>bbc</h2>
<?php echo $bbc; ?>
<h2>cnn</h2>
<?php echo $cnn; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome1; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome2; ?>
<h2>flickr</h2>
<?php echo $flickr; ?>

Here's what we get... (the lastest feeds from the BBC, CNN, Lockergnome - with CDATA stripped and shown - and flickr).

bbc

European shares knocked by Spain
European stock markets open lower after a ratings agency downgrades 16 Spanish banks and uncertainty continues about the fate of Greece.
Date: Fri, 18 May 2012 07:40:18 GMT
N Rock rescue 'could cost £2bn'
The taxpayer could lose about £2bn once the assets of collapsed bank Northern Rock are wound down, the National Audit Office estimates.
Date: Thu, 17 May 2012 23:00:31 GMT
Plaque to dead children stolen
Thieves steal a metal plaque erected in memory of two young boys killed by IRA bombs in the Cheshire town of Warrington.
Date: Fri, 18 May 2012 07:54:49 GMT
Float values Facebook at $104bn
Facebook prices its shares at $38 each ahead of one of the most eagerly anticipated share flotations in recent stock market history.
Date: Thu, 17 May 2012 23:35:25 GMT
Dementia patient 'had 106 carers'
A man who died from a dementia-related illness had been assigned 106 different carers in a single year, says his wife.
Date: Fri, 18 May 2012 05:21:54 GMT
Cameron to meet Hollande at G8
Prime Minister David Cameron is to hold his first face-to-face talks with newly elected French President Francois Hollande at the G8 summit.
Date: Fri, 18 May 2012 03:40:00 GMT
Queen's monarch lunch criticised
The King of Bahrain and Swaziland's King Mswati III are among controversial monarchs expected at a Windsor Castle lunch hosted by The Queen later.
Date: Fri, 18 May 2012 03:26:33 GMT
Zimmerman found with bloody nose
A Florida neighbourhood watch volunteer who shot an unarmed black teenager had a bloody nose and a cut on his head, according to police reports.
Date: Fri, 18 May 2012 02:20:18 GMT
Search for missing fishing boat
A major search is under way to find a fishing boat which has disappeared without trace with a crew of three off the Dorset coast.
Date: Fri, 18 May 2012 07:37:45 GMT
Niger malnutrition crisis growing
Months of warnings have failed to stop a major malnutrition crisis in Niger, affecting more than six million people, Save the Children says.
Date: Thu, 17 May 2012 23:00:41 GMT

cnn

New documents, photos shed light on Trayvon Martin case
Newly released material in the Trayvon Martin shooting paints the most complete picture yet of how investigators built the case as well as its complexity.
Date: Fri, 18 May 2012 01:56:50 EDT
Facebook IPO: Internet glee, skepticism
Friends may be priceless. But 'friending' is worth $38 a share.
Date: Thu, 17 May 2012 19:43:56 EDT
Man questioned in highway killings
Ballistics tests have linked the shooting deaths of two people along roadways in Mississippi, a source who has been briefed on the investigation said Thursday.
Date: Thu, 17 May 2012 23:26:43 EDT
Travolta accuser hires Gloria Allred
The remaining plaintiff in the sexual battery lawsuit against John Travolta fired his lawyer, bringing an end to the case, the lawyer told CNN on Thursday.
Date: Thu, 17 May 2012 20:40:50 EDT
Flesh-eating infections: Scores per year
A foundation devoted to education about and treatment of flesh-eating bacteria cites government figures estimating 500 to 1,500 cases occur in the United States each year. But media coverage of these cases is rare, so the story of a Georgia grad student fighting the disease may help raise awareness, the foundation's co-founder says.
Date: Thu, 17 May 2012 23:07:15 EDT
Do voters want moderates out of D.C.?
Republican Speaker of the House John Boehner drew a hard line in the sand this week, renewing a battle over the debt ceiling unless President Barack Obama agreed to significant budget cuts during what may be a lame-duck session after the November elections.
Date: Thu, 17 May 2012 19:00:38 EDT
Court upholds Mississippi pardons
Mississippi's Supreme Court on Thursday denied the state attorney general's attempt that it reconsider its assent to controversial pardons -- several of them for convicted killers -- issued earlier this year by outgoing Gov. Haley Barbour.
Date: Thu, 17 May 2012 19:50:02 EDT
Now it's the jury's turn in Edwards trial
Closing arguments concluded Thursday afternoon in the corruption trial of John Edwards, after his defense team rested its case without calling the former Democratic presidential candidate's ex-mistress to testify.
Date: Thu, 17 May 2012 18:58:50 EDT
Breyer's D.C. home hit by burglary
Thieves must have something against Justice Stephen Breyer.
Date: Thu, 17 May 2012 17:05:42 EDT
Opinion: Dimon's $23M isn't the problem
Ingo Walter and Jennifer Carpenter: Focus on changing the risk incentives of banks rather than the level of executive pay
Date: Thu, 17 May 2012 17:01:14 EDT

lockergnome (hidden CDATA)

The lockergnome feed seems to be down.

lockergnome

The lockergnome feed seems to be down.

flickr

Warner-0043

Dr. Warner posted a photo:

Warner-0043

Date: 2012-05-18T08:07:23Z
IMGP0206.jpg

ErikHalfacre posted a photo:

IMGP0206.jpg

Date: 2012-05-18T08:07:24Z

PNUT1990 posted a photo:

Date: 2012-05-18T08:07:25Z
UA-263921 001.jpg

serMaks posted a photo:

UA-263921 001.jpg

Date: 2012-05-18T08:07:25Z
IMG_7798

North East And Regional Bus Photos posted a photo:

IMG_7798

Date: 2012-05-18T08:07:26Z

PhotoTrick posted a photo:

Date: 2012-05-18T08:07:26Z
EDD_3296

twistedmx13 posted a photo:

EDD_3296

Date: 2012-05-18T08:07:26Z
IMG_6834

Ghiblidaisuki posted a photo:

IMG_6834

Date: 2012-05-18T08:07:27Z

W1W1N posted a photo:

Date: 2012-05-18T08:07:27Z
Brown-0159

晨暘遊戲學園活動花絮 posted a photo:

Brown-0159

Date: 2012-05-18T08:07:28Z

Comments

#1
2007-03-02 dumb_dave says :

Sorry, I'm new to this stuff, willing to learn and all that, but I don't get the idea. Copy that snippet of PHP code into a file and call it, say, parser.php. Copy the other snippet of HTML into a file and call it, for lack of inventiveness, parser.html. Right so far? If so, where's the intermediate step? How does this HTML "call" or "include" the PHP in order to function? Or am I missing something so basic that even asking this will earn me the cherished "Idiot of the Day Award"? Thanks.

#2
2007-03-02 BonRouge says :

dave,
You can include the php or just have it in one page. The page would have a '.php' extension - not '.html.'
Here's a simple example of this page (with no style or anthing) in one file.
Save it and change the extension to '.php'. If you don't have a server installed on your machine, you'll have to upload it to a remote server to view it.
If you want, you can take the php code out of that page and save it in a different file and include it into the page - that way, you could use it on more than one page if you wanted.

I hope that makes it a bit clearer.

#3
2007-03-02 dumb_dave says :

Thanks for the explanations. Much clearer now and ... yes, it indeed works like a champ. (Maybe I was just too tired? Putting 1 and 1 together and coming up with 11 instead of two?) Best regards and thanks for all the tips elsewhere as well.

#4
2007-03-07 dumb_dave says :

Useful indeed, BonRouge, but how does one display the <description> tagged material that is buried behind things like <![CDATA[ <p> etc.? Is the PHP code easily modified to handle that? And if so, can one apply it selectively? That is, show the fuller "description" material for one site but then reduce the next site entry to "headines" only (i.e., "titles" and "links") and then toggle the next one back to fuller details? Hope this is not a major headache, but it's beyond my ability to work it out at this stage ... and everything tried brought the larger process to a grinding halt. (This isn't a do-my-homework-for-me question. I'm bewildered by the code.) Thanks.

#5
2007-03-07 BonRouge says :

dave,
I thought I'd already sorted out the problem of data wrapped in the CDATA stuff. Does the code have a problem? If you could show me where it's not working, I'll try to improve it.
As for choosing whether to show that particular data or not, yes - I think you could do that by adding another variable. You see near the top where there's a preg_replace() to remove the CDATA tags? You could put that in an if statement - if the variable is not present, remove the CDATA tags, if it is, leave them where they are.
Does that make sense?

#6
2007-03-10 BonRouge says :

dave,
I think I found the problem and sorted it out. As you can see, it seems to work OK now. Some of the characters in the Lockergnome feed don't show right on this page though. I wonder if it's anything to do with me being in Japan. Do you see strange characters?

#7
2007-05-01 Ice says :

I have been trawling the web for days looking for something like this. Thanks a WHOLE lot man. I was also wondering if you can modify this parser to merge these fields and display, say, only the latest 10 items? wine

#8
2007-11-02 steve says :

thanks sorted out my cdata parasing problem, seems that is not too clear in the docs

s

Comment form

Please type the word 'Liverpool' here:

BB code available :

  • [b]...[/b] : bold
  • [it]...[/it] : italic
  • [q]...[/q] : quote
  • [c]...[/c] : code
  • [url=...]...[/url] : url