RSS/XML feed parser

Here's some php:

PHP
function xml_parser($page,$container,$tags,$number,$cdata) {
  if (!$number) {$number=100;}
  $stories=0;
  $xml=file_get_contents($page);
  preg_match_all("/<$container>.+<\/$container>/sU",$xml, $items);
  $items=$items[0];
  $itemsArray=array();
   foreach ($items as $item) {
    for($i=0; $i<count($tags); $i++) {
    preg_match("/<$tags[$i](.+)(<\/$tags[$i]>)/sU", $item, $tag);
    $this[$i]=preg_replace("/<$tags[$i]>(.+)(<\/$tags[$i]>)/sU",'$1',$tag);
    $this[$i]=array_map('html_entity_decode', $this[$i]);
    }
     if (count($itemsArray)<$number) {array_push($itemsArray, $this);}
   }
  $theData="<dl>";
  foreach ($itemsArray as $item) {
  for($i=0; $i<count($tags); $i++) {
  $data[$i]=$item[$i][0];    }
   $title=$data[0];
   $dpatterns[0]="/<img(.+)><\/img>/sU"; $dreplacements[0]='<img$1>';
   $dpatterns[1]="/<img(.+)\/>/sU"; $dreplacements[1]='<img$1>';
   $dpatterns[2]="/<(\/|)content?(.+|)>/sU"; $dreplacements[2]='';
   $dpatterns[3]="/border=\"0\"/sU"; $dreplacements[3]='';
   if ($cdata!='hide') {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='$1';
   }
   else {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='';
   }
   $description=preg_replace($dpatterns,$dreplacements,$data[1]);
   $link=preg_replace("/<link.+href=\"(.+)\"(.+|)\/>/sU",'$1',$data[2]);
   $date=$data[3];
   $theData.="
   <dt><a href=\"$link\">$title</a></dt>
   <dd class=\"story\">$description</dd>
   <dd>Date: $date</dd>\r";
  }
$theData.="</dl>";
return $theData;
}

$container='item';
$tags=array('title','description','link','pubDate');
$bbc=xml_parser("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml",$container,$tags,10,'');
$cnn=xml_parser("http://rss.cnn.com/rss/cnn_topstories.rss",$container,$tags,10,'');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'hide');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'');

$container='entry';
$tags=array('title','content','link','published');
$flickr=xml_parser("http://api.flickr.com/services/feeds/photos_public.gne",$container,$tags,10,'');

Here's some HTML with PHP

HTML/PHP
<h2>bbc</h2>
<?php echo $bbc; ?>
<h2>cnn</h2>
<?php echo $cnn; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome1; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome2; ?>
<h2>flickr</h2>
<?php echo $flickr; ?>

Here's what we get... (the lastest feeds from the BBC, CNN, Lockergnome - with CDATA stripped and shown - and flickr).

bbc

Four deny charges over expenses
Three MPs and a peer tell a court they are not guilty of charges of false accounting in relation to their expenses claims.
Date: Thu, 11 Mar 2010 19:09:09 GMT
High-speed rail plans announced
Plans for a new high-speed rail line between London and Birmingham are announced by Transport Secretary Lord Adonis.
Date: Thu, 11 Mar 2010 14:51:23 GMT
Man, 64, dies after 'youth abuse'
A man with learning difficulties dies outside his Greater Manchester home after suffering years of abuse from youths.
Date: Thu, 11 Mar 2010 18:25:14 GMT
James Bulger's mother meets Straw
Justice Secretary Jack Straw meets the mother of murdered toddler James Bulger to discuss the return to prison of one of her son's killers.
Date: Thu, 11 Mar 2010 19:59:13 GMT
'No giveaway Budget' says Darling
Chancellor Alistair Darling warns people not to expect a "giveaway" when he unveils his Budget later this month.
Date: Thu, 11 Mar 2010 19:02:19 GMT
Pinera sworn in as new quake hits
Sebastian Pinera is sworn in as president of quake-hit Chile, as a 6.9-magnitude aftershock strikes the centre of the country.
Date: Thu, 11 Mar 2010 20:08:36 GMT
Iraq PM takes early lead in poll
First results from Iraq's election show PM Nouri Maliki's grouping leading in two southern provinces, officials say.
Date: Thu, 11 Mar 2010 15:03:15 GMT
Pakistan kidnap boy not released
The Foreign Office says a five-year-old boy from Oldham kidnapped in Pakistan has not been released.
Date: Thu, 11 Mar 2010 11:38:43 GMT
Ex-Bosnian leader 'owed apology'
Britain should apologise to ex-Bosnian president for "mistreating" him in prison, says chairman of the joint presidency of Bosnia-Herzegovina.
Date: Thu, 11 Mar 2010 18:48:21 GMT
Man jailed for organic egg scam
A businessman is jailed for masterminding a scam which saw tens of millions of battery hen eggs sold as free-range or organic.
Date: Thu, 11 Mar 2010 15:54:54 GMT

cnn

Hustler can't have pics of hiker's body
Gruesome photos of a murdered hiker sought by Hustler magazine will not be released, a Georgia judge ruled. Lawmakers now want a permanent fix.
Date: Thu, 11 Mar 2010 10:09:22 EST
3 strong quakes strike Chile
Three strong earthquakes rocked Chile this morning, just as the country was swearing in a new president.
Date: Thu, 11 Mar 2010 14:10:30 EST
Report: 12-year-olds abusing inhalants
When their kids turn 12, parents are concerned about peers pressuring them to smoke cigarettes, drink and use drugs, but it turns out 12-year-olds are doing something else: getting high on inhalants.
Date: Thu, 11 Mar 2010 14:20:51 EST
Dems go behind doors on health plan
Health care reform takes center stage Thursday as President Obama and top congressional Democrats work behind closed doors to nail down a final agreement.
Date: Thu, 11 Mar 2010 11:55:57 EST
House GOP adopts earmark ban
House Republicans agreed Thursday to adopt a ban on congressional earmarks in spending bills for next year, upping the ante with Democrats in the political battle over fiscal responsibility and pork barrel spending.
Date: Thu, 11 Mar 2010 13:39:18 EST
28 schools closing in Kansas City, MO
Superintendent John Covington called for the closing or consolidation of almost half of the schools in the Kansas City, Missouri, school district.
Date: Thu, 11 Mar 2010 12:04:25 EST
Bacterial meningitis kills Oklahoma girl
An Oklahoma elementary school student has died of bacterial meningitis, officials said Thursday, and two other students are hospitalized with the illness.
Date: Thu, 11 Mar 2010 13:36:15 EST
Early Iraq election results reported
Iraq's election commission is expected Thursday to announce partial results of last week's vote.
Date: Thu, 11 Mar 2010 10:57:43 EST
Former NFL star, actor Merlin Olsen dies
Former football star and television actor Merlin Olsen has died after a long battle with cancer, the St. Louis Rams said Thursday.
Date: Thu, 11 Mar 2010 13:11:38 EST
Friends: Corey Haim was beating drugs
Corey Haim seemed to be winning his battle against drug abuse in the weeks before his death, according to both his manager and closest friend.
Date: Thu, 11 Mar 2010 14:38:42 EST

lockergnome (hidden CDATA)

The lockergnome feed seems to be down.

lockergnome

The lockergnome feed seems to be down.

flickr

IMG_2010

metroarts yearbok 2008/2009 posted a photo:

IMG_2010

Date: 2010-03-11T20:09:57Z
RSA_2010_212

Qualys, Inc. posted a photo:

RSA_2010_212

Date: 2010-03-11T20:09:57Z
IMG_2883

Phil Neal Walker Law posted a photo:

IMG_2883

Date: 2010-03-11T20:09:58Z
PenelopeAuction2010-104-web

jhc123093 posted a photo:

PenelopeAuction2010-104-web

Date: 2010-03-11T20:10:00Z
Lightroom for Jessca-21

brunopion posted a photo:

Lightroom for Jessca-21

Date: 2010-03-11T20:10:00Z
INV. 11 MARZO 2010 AM 012

ad.chicoloapan posted a photo:

INV. 11 MARZO 2010 AM 012

Date: 2010-03-11T20:10:00Z
l_85a7dee80cfb4763ac84c96bd91c44ea

Zac Clicks Inc posted a photo:

l_85a7dee80cfb4763ac84c96bd91c44ea

Date: 2010-03-11T20:10:00Z
IMG_2756

kosolson1005 posted a photo:

IMG_2756

Date: 2010-03-11T20:09:57Z
20100309BRICPortraits079Ed

WSSU Photography posted a photo:

20100309BRICPortraits079Ed

Date: 2010-03-11T20:09:58Z
DSCF0248

larslehmann05 posted a photo:

DSCF0248

Date: 2010-03-11T20:09:59Z

Comments

#1
2007-03-02 dumb_dave says :

Sorry, I'm new to this stuff, willing to learn and all that, but I don't get the idea. Copy that snippet of PHP code into a file and call it, say, parser.php. Copy the other snippet of HTML into a file and call it, for lack of inventiveness, parser.html. Right so far? If so, where's the intermediate step? How does this HTML "call" or "include" the PHP in order to function? Or am I missing something so basic that even asking this will earn me the cherished "Idiot of the Day Award"? Thanks.

#2
2007-03-02 BonRouge says :

dave,
You can include the php or just have it in one page. The page would have a '.php' extension - not '.html.'
Here's a simple example of this page (with no style or anthing) in one file.
Save it and change the extension to '.php'. If you don't have a server installed on your machine, you'll have to upload it to a remote server to view it.
If you want, you can take the php code out of that page and save it in a different file and include it into the page - that way, you could use it on more than one page if you wanted.

I hope that makes it a bit clearer.

#3
2007-03-02 dumb_dave says :

Thanks for the explanations. Much clearer now and ... yes, it indeed works like a champ. (Maybe I was just too tired? Putting 1 and 1 together and coming up with 11 instead of two?) Best regards and thanks for all the tips elsewhere as well.

#4
2007-03-07 dumb_dave says :

Useful indeed, BonRouge, but how does one display the <description> tagged material that is buried behind things like <![CDATA[ <p> etc.? Is the PHP code easily modified to handle that? And if so, can one apply it selectively? That is, show the fuller "description" material for one site but then reduce the next site entry to "headines" only (i.e., "titles" and "links") and then toggle the next one back to fuller details? Hope this is not a major headache, but it's beyond my ability to work it out at this stage ... and everything tried brought the larger process to a grinding halt. (This isn't a do-my-homework-for-me question. I'm bewildered by the code.) Thanks.

#5
2007-03-07 BonRouge says :

dave,
I thought I'd already sorted out the problem of data wrapped in the CDATA stuff. Does the code have a problem? If you could show me where it's not working, I'll try to improve it.
As for choosing whether to show that particular data or not, yes - I think you could do that by adding another variable. You see near the top where there's a preg_replace() to remove the CDATA tags? You could put that in an if statement - if the variable is not present, remove the CDATA tags, if it is, leave them where they are.
Does that make sense?

#6
2007-03-10 BonRouge says :

dave,
I think I found the problem and sorted it out. As you can see, it seems to work OK now. Some of the characters in the Lockergnome feed don't show right on this page though. I wonder if it's anything to do with me being in Japan. Do you see strange characters?

#7
2007-05-01 Ice says :

I have been trawling the web for days looking for something like this. Thanks a WHOLE lot man. I was also wondering if you can modify this parser to merge these fields and display, say, only the latest 10 items? wine

#8
2007-11-02 steve says :

thanks sorted out my cdata parasing problem, seems that is not too clear in the docs

s

Comment form

Please type the word 'orange' here:

BB code available :

  • [b]...[/b] : bold
  • [it]...[/it] : italic
  • [q]...[/q] : quote
  • [c]...[/c] : code
  • [url=...]...[/url] : url