RSS/XML feed parser

Here's some php:

PHP
function xml_parser($page,$container,$tags,$number,$cdata) {
  if (!$number) {$number=100;}
  $stories=0;
  $xml=file_get_contents($page);
  preg_match_all("/<$container>.+<\/$container>/sU",$xml, $items);
  $items=$items[0];
  $itemsArray=array();
   foreach ($items as $item) {
    for($i=0; $i<count($tags); $i++) {
    preg_match("/<$tags[$i](.+)(<\/$tags[$i]>)/sU", $item, $tag);
    $this[$i]=preg_replace("/<$tags[$i]>(.+)(<\/$tags[$i]>)/sU",'$1',$tag);
    $this[$i]=array_map('html_entity_decode', $this[$i]);
    }
     if (count($itemsArray)<$number) {array_push($itemsArray, $this);}
   }
  $theData="<dl>";
  foreach ($itemsArray as $item) {
  for($i=0; $i<count($tags); $i++) {
  $data[$i]=$item[$i][0];    }
   $title=$data[0];
   $dpatterns[0]="/<img(.+)><\/img>/sU"; $dreplacements[0]='<img$1>';
   $dpatterns[1]="/<img(.+)\/>/sU"; $dreplacements[1]='<img$1>';
   $dpatterns[2]="/<(\/|)content?(.+|)>/sU"; $dreplacements[2]='';
   $dpatterns[3]="/border=\"0\"/sU"; $dreplacements[3]='';
   if ($cdata!='hide') {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='$1';
   }
   else {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='';
   }
   $description=preg_replace($dpatterns,$dreplacements,$data[1]);
   $link=preg_replace("/<link.+href=\"(.+)\"(.+|)\/>/sU",'$1',$data[2]);
   $date=$data[3];
   $theData.="
   <dt><a href=\"$link\">$title</a></dt>
   <dd class=\"story\">$description</dd>
   <dd>Date: $date</dd>\r";
  }
$theData.="</dl>";
return $theData;
}

$container='item';
$tags=array('title','description','link','pubDate');
$bbc=xml_parser("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml",$container,$tags,10,'');
$cnn=xml_parser("http://rss.cnn.com/rss/cnn_topstories.rss",$container,$tags,10,'');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'hide');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'');

$container='entry';
$tags=array('title','content','link','published');
$flickr=xml_parser("http://api.flickr.com/services/feeds/photos_public.gne",$container,$tags,10,'');

Here's some HTML with PHP

HTML/PHP
<h2>bbc</h2>
<?php echo $bbc; ?>
<h2>cnn</h2>
<?php echo $cnn; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome1; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome2; ?>
<h2>flickr</h2>
<?php echo $flickr; ?>

Here's what we get... (the lastest feeds from the BBC, CNN, Lockergnome - with CDATA stripped and shown - and flickr).

bbc

Israel to pause Gaza bombardment
Israel says it will halt its bombing of the Gaza Strip for a three-hour period each day, as pressure grows for a cease-fire.
Date: Wed, 07 Jan 2009 10:30:24 GMT
M&S to close stores and cut jobs
Marks and Spencer says it plans to close 25 Simply Food stores and two of its regular stores, and cut 1,230 jobs.
Date: Wed, 07 Jan 2009 10:42:04 GMT
Pietersen out as England captain
Kevin Pietersen leaves his position as England captain in the wake of the row between him and coach Peter Moores.
Date: Wed, 07 Jan 2009 11:05:28 GMT
Thousands face further rail chaos
Thousands of rail passengers face long delays after a power failure stops all trains in to and out of London Euston.
Date: Wed, 07 Jan 2009 10:56:45 GMT
Freezing temperatures hit new low
Temperatures plunge to -12C in Oxfordshire on the coldest night so far of Britain's new year big freeze.
Date: Wed, 07 Jan 2009 08:25:07 GMT
Dispute hits Europe gas supplies
Exports of Russian gas to Europe via Ukraine stop altogether with both countries accusing each other of turning off the tap.
Date: Wed, 07 Jan 2009 11:12:22 GMT
UK car sales fall 11.3% in 2008
UK car sales in 2008 are 11.3% lower than they had been in 2007, following a 21.2% drop in December.
Date: Wed, 07 Jan 2009 09:35:22 GMT
Cases have 'cut UK terror threat'
The terrorist threat to the UK has been reduced by a series of successful criminal prosecutions, the head of MI5 says.
Date: Wed, 07 Jan 2009 09:54:51 GMT
Swayze 'may live only two years'
Actor Patrick Swayze, who has pancreatic cancer, admits in a US TV interview he may survive only two years.
Date: Wed, 07 Jan 2009 09:40:32 GMT
Blocking it out - why Tetris could help ease traumatic stress
Playing the computer puzzle game Tetris might help reduce the effects of traumatic stress, say UK researchers.
Date: Wed, 07 Jan 2009 00:29:25 GMT

cnn

Israel offers short respite from strikes
Israel will halt its bombardment of Gaza for three hours every day to allow residents of the Hamas-ruled Palestinian territory to obtain much-needed supplies, a military spokesman said Wednesday.
Date: Wed, 07 Jan 2009 04:16:02 EST
Brown: Let journalists into Gaza
We have been trying to report as accurately as possible on the fighting in Gaza.
Date: Tue, 06 Jan 2009 21:17:49 EST
Al Qaeda message blames Obama for Gaza
An audio message reportedly from al Qaeda's deputy chief vows revenge for Israel's air and ground assault on Gaza and calls the Jewish state's actions against Hamas militants "a gift" from U.S. President-elect Barack Obama.
Date: Tue, 06 Jan 2009 22:37:06 EST
Secret Service to unveil new presidential limo
As a candidate, Barack Obama promoted hybrid cars. Shortly after taking the oath of office, Obama will climb into the mother of all hybrids -- part car, part truck and, from the looks of it, part tank. In keeping with recent tradition, the Secret Service will place a brand-new presidential limousine into service for the inaugural parade down Pennsylvania Avenue.
Date: Tue, 06 Jan 2009 13:15:11 EST
'Let me sleep,' suicidal anthrax suspect wrote
Dr. Bruce Ivins, the former government scientist blamed for a string of deadly 2001 anthrax attacks, behaved oddly and was "sarcastic and nasty" to his wife in the final weeks of his life, police documents said.
Date: Tue, 06 Jan 2009 22:37:18 EST
DNA, reward may hold key to '93 slaying
Gail Parker was an environmentalist who gave speeches about America's polluted waterways. She volunteered at the local hospital, helped the elderly and cancer patients. In 1993, Parker was killed -- most probably for her purse and jewelry -- and her body was left in the Arizona desert. To date, no weapon has been found and no suspects have been arrested. The killing remains a cold case.
Date: Tue, 06 Jan 2009 15:54:49 EST
CNN's Gupta eyed for surgeon general
Dr. Sanjay Gupta, CNN's chief medical correspondent, has been approached by Barack Obama's transition team about the U.S. surgeon general's post, according to sources inside the transition and at CNN. Gupta, a neurosurgeon, has served as a White House fellow and as a special adviser to then-first lady Hillary Clinton.
Date: Tue, 06 Jan 2009 20:09:20 EST
Borger: An Obama team oops on CIA pick?
It had to happen, and it did.
Date: Tue, 06 Jan 2009 17:55:21 EST
Woman, 19, dies of bird flu
A 19-year-old woman who handled ducks in northern China has died in Beijing from bird flu, the World Health Organization said Wednesday.
Date: Wed, 07 Jan 2009 00:29:43 EST
Travolta, Preston return home with son's ashes
John Travolta and Kelly Preston have returned to Ocala, Florida, with the remains of their teenage son, a family friend told CNN on Tuesday.
Date: Tue, 06 Jan 2009 16:22:12 EST

lockergnome (hidden CDATA)

The lockergnome feed seems to be down.

lockergnome

The lockergnome feed seems to be down.

flickr

DSC_5763

buddyvalero posted a photo:

DSC_5763

Date: 2009-01-07T11:17:13Z
MYDC4347

lavtcbeauty posted a photo:

MYDC4347

Mydc4347.Jpg

Date: 2009-01-07T11:17:13Z
IMG_5451

menniti giovanni posted a photo:

IMG_5451

Date: 2009-01-07T11:17:13Z
DSC_1119

Brad Olcott posted a photo:

DSC_1119

Date: 2009-01-07T11:17:13Z
foto fine 12 2008 - 174

Paolo Motta posted a photo:

foto fine 12 2008 - 174

Date: 2009-01-07T11:17:12Z
P1000283

Photos by Tony posted a photo:

P1000283

Date: 2009-01-07T11:17:12Z
090107_74265_remote_skilift-ongeluk

moniqueschouten posted a photo:

090107_74265_remote_skilift-ongeluk

Date: 2009-01-07T11:17:16Z
IMG_0282

Rolyataylor posted a photo:

IMG_0282

Date: 2009-01-07T11:17:14Z
P1000348

rikbie posted a photo:

P1000348

Date: 2009-01-07T11:17:06Z
0058c7ddde4642214bedd50cf9c87924

winnie8219 posted a photo:

0058c7ddde4642214bedd50cf9c87924

Date: 2009-01-07T11:17:15Z

Comments

#1
2007-03-02 dumb_dave says :

Sorry, I'm new to this stuff, willing to learn and all that, but I don't get the idea. Copy that snippet of PHP code into a file and call it, say, parser.php. Copy the other snippet of HTML into a file and call it, for lack of inventiveness, parser.html. Right so far? If so, where's the intermediate step? How does this HTML "call" or "include" the PHP in order to function? Or am I missing something so basic that even asking this will earn me the cherished "Idiot of the Day Award"? Thanks.

#2
2007-03-02 BonRouge says :

dave,
You can include the php or just have it in one page. The page would have a '.php' extension - not '.html.'
Here's a simple example of this page (with no style or anthing) in one file.
Save it and change the extension to '.php'. If you don't have a server installed on your machine, you'll have to upload it to a remote server to view it.
If you want, you can take the php code out of that page and save it in a different file and include it into the page - that way, you could use it on more than one page if you wanted.

I hope that makes it a bit clearer.

#3
2007-03-02 dumb_dave says :

Thanks for the explanations. Much clearer now and ... yes, it indeed works like a champ. (Maybe I was just too tired? Putting 1 and 1 together and coming up with 11 instead of two?) Best regards and thanks for all the tips elsewhere as well.

#4
2007-03-07 dumb_dave says :

Useful indeed, BonRouge, but how does one display the <description> tagged material that is buried behind things like <![CDATA[ <p> etc.? Is the PHP code easily modified to handle that? And if so, can one apply it selectively? That is, show the fuller "description" material for one site but then reduce the next site entry to "headines" only (i.e., "titles" and "links") and then toggle the next one back to fuller details? Hope this is not a major headache, but it's beyond my ability to work it out at this stage ... and everything tried brought the larger process to a grinding halt. (This isn't a do-my-homework-for-me question. I'm bewildered by the code.) Thanks.

#5
2007-03-07 BonRouge says :

dave,
I thought I'd already sorted out the problem of data wrapped in the CDATA stuff. Does the code have a problem? If you could show me where it's not working, I'll try to improve it.
As for choosing whether to show that particular data or not, yes - I think you could do that by adding another variable. You see near the top where there's a preg_replace() to remove the CDATA tags? You could put that in an if statement - if the variable is not present, remove the CDATA tags, if it is, leave them where they are.
Does that make sense?

#6
2007-03-10 BonRouge says :

dave,
I think I found the problem and sorted it out. As you can see, it seems to work OK now. Some of the characters in the Lockergnome feed don't show right on this page though. I wonder if it's anything to do with me being in Japan. Do you see strange characters?

#7
2007-05-01 Ice says :

I have been trawling the web for days looking for something like this. Thanks a WHOLE lot man. I was also wondering if you can modify this parser to merge these fields and display, say, only the latest 10 items? wine

#8
2007-11-02 steve says :

thanks sorted out my cdata parasing problem, seems that is not too clear in the docs

s

Comment form

Please type the word 'HTML' here:

BB code available :

  • [b]...[/b] : bold
  • [it]...[/it] : italic
  • [q]...[/q] : quote
  • [c]...[/c] : code
  • [url=...]...[/url] : url