RSS/XML feed parser

Here's some php:

PHP
function xml_parser($page,$container,$tags,$number,$cdata) {
  if (!$number) {$number=100;}
  $stories=0;
  $xml=file_get_contents($page);
  preg_match_all("/<$container>.+<\/$container>/sU",$xml, $items);
  $items=$items[0];
  $itemsArray=array();
   foreach ($items as $item) {
    for($i=0; $i<count($tags); $i++) {
    preg_match("/<$tags[$i](.+)(<\/$tags[$i]>)/sU", $item, $tag);
    $this[$i]=preg_replace("/<$tags[$i]>(.+)(<\/$tags[$i]>)/sU",'$1',$tag);
    $this[$i]=array_map('html_entity_decode', $this[$i]);
    }
     if (count($itemsArray)<$number) {array_push($itemsArray, $this);}
   }
  $theData="<dl>";
  foreach ($itemsArray as $item) {
  for($i=0; $i<count($tags); $i++) {
  $data[$i]=$item[$i][0];    }
   $title=$data[0];
   $dpatterns[0]="/<img(.+)><\/img>/sU"; $dreplacements[0]='<img$1>';
   $dpatterns[1]="/<img(.+)\/>/sU"; $dreplacements[1]='<img$1>';
   $dpatterns[2]="/<(\/|)content?(.+|)>/sU"; $dreplacements[2]='';
   $dpatterns[3]="/border=\"0\"/sU"; $dreplacements[3]='';
   if ($cdata!='hide') {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='$1';
   }
   else {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='';
   }
   $description=preg_replace($dpatterns,$dreplacements,$data[1]);
   $link=preg_replace("/<link.+href=\"(.+)\"(.+|)\/>/sU",'$1',$data[2]);
   $date=$data[3];
   $theData.="
   <dt><a href=\"$link\">$title</a></dt>
   <dd class=\"story\">$description</dd>
   <dd>Date: $date</dd>\r";
  }
$theData.="</dl>";
return $theData;
}

$container='item';
$tags=array('title','description','link','pubDate');
$bbc=xml_parser("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml",$container,$tags,10,'');
$cnn=xml_parser("http://rss.cnn.com/rss/cnn_topstories.rss",$container,$tags,10,'');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'hide');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'');

$container='entry';
$tags=array('title','content','link','published');
$flickr=xml_parser("http://api.flickr.com/services/feeds/photos_public.gne",$container,$tags,10,'');

Here's some HTML with PHP

HTML/PHP
<h2>bbc</h2>
<?php echo $bbc; ?>
<h2>cnn</h2>
<?php echo $cnn; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome1; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome2; ?>
<h2>flickr</h2>
<?php echo $flickr; ?>

Here's what we get... (the lastest feeds from the BBC, CNN, Lockergnome - with CDATA stripped and shown - and flickr).

bbc

Ferry transcript reveals crew panic
The last communications between the South Korean ferry that sank on Wednesday and traffic services reveal panic and indecision by the crew.
Date: Mon, 21 Apr 2014 00:19:46 GMT
PM's Christianity remarks 'divisive'
Leading public figures claim David Cameron risks causing division in society with his recent comments that the UK is a Christian country.
Date: Sun, 20 Apr 2014 23:50:10 GMT
Skin cancer rates 'surge since 70s'
The incidence of the most serious skin cancer is now five times higher than it was in the 1970s, figures show.
Date: Mon, 21 Apr 2014 00:03:38 GMT
Russian outrage at Ukraine killing
The Russian foreign ministry expresses outrage at a fatal shooting incident in eastern Ukraine which it blames on Ukrainian nationalists.
Date: Sun, 20 Apr 2014 19:46:44 GMT
Finland air crash kills skydivers
Eight skydivers on an Easter Sunday jump are killed as their plane crashes near the town of Jamijarvi in Finland.
Date: Sun, 20 Apr 2014 21:04:23 GMT
Flood defences mended for thousands
Flood defences and other protections are restored for more than 100,000 properties left at risk after this winter's severe weather.
Date: Mon, 21 Apr 2014 00:21:56 GMT
Iran's Rouhani urges women's rights
Iran's President Hassan Rouhani urges equal opportunities and rights for women and condemns discrimination in a speech marking Women's Day.
Date: Sun, 20 Apr 2014 21:33:10 GMT
Egypt's Sisi to face sole poll rival
Left-winger Hamdeen Sabahi will be the only challenger to ex-army chief Abdel Fattah al-Sisi in Egypt's presidential election, as registration ends.
Date: Sun, 20 Apr 2014 22:18:41 GMT
Rubin 'Hurricane' Carter dies
Rubin "Hurricane" Carter, the US boxer whose wrongful conviction for murder caused an international outcry, dies aged 76.
Date: Sun, 20 Apr 2014 16:51:06 GMT
Gibraltar fire hits online gambling
An explosion and fire on Gibraltar cuts power to much of the British territory and disrupts a number of online betting operations around the world.
Date: Mon, 21 Apr 2014 00:02:36 GMT

cnn

Is the galaxy full of Earths?
Jim Bell says NASA's latest discovery support the notion that habitable worlds are probably common in the galaxy.
Date: Fri, 18 Apr 2014 17:09:31 EDT
Ukraine crisis could pull U.S. to war
Graham Allison says if an unchecked and emboldened Russia foments conflict in a nation like Latvia, a NATO member with a Russian speaking population, the West would have to defend it.
Date: Fri, 18 Apr 2014 13:12:15 EDT
See why Navy paid $3 billion for this
The US Navy's newest stealth destroyer is twice as long as the Statue of Liberty but barely shows up on radar.
Date: Sat, 19 Apr 2014 18:43:08 EDT
The best show you're not watching
If fans thought season one of "Orphan Black" was mysterious, get ready to spiral deeper down the rabbit hole for a second season. The show returns this Saturday April 19 on BBC America.
Date: Sat, 19 Apr 2014 17:12:55 EDT
Taylor Swift surprises bride at shower
Date: Wed, 16 Apr 2014 19:25:17 EDT
What a shot! Amazing sports photos
Date: Sun, 20 Apr 2014 16:17:41 EDT
The week in 33 photos
Date: Sun, 20 Apr 2014 16:16:35 EDT
64 dead; 238 still missing
Passengers couldn't board lifeboats as the ferry had listed too much, a crew member said, according to a transcript released today.
Date: Sun, 20 Apr 2014 19:54:00 EDT
'Please hurry' -- transcript of calls
The following is a partial transcript of communication between an unidentified crew member aboard the sinking South Korean ferry Sewol and local maritime traffic control centers on Wednesday.
Date: Sun, 20 Apr 2014 10:35:03 EDT
Challenging conditions
Under the water, conditions for divers carrying out the grim task of looking for survivors from the sunken South Korean ferry are challenging at best.
Date: Sun, 20 Apr 2014 03:18:27 EDT

lockergnome (hidden CDATA)

The lockergnome feed seems to be down.

lockergnome

The lockergnome feed seems to be down.

flickr

_DSC4740

aarongrenke posted a photo:

_DSC4740

Date: 2014-04-21T00:37:53Z
Singapore Day 2 - 180414.Credit: Louis Zee

Barry Zee posted a photo:

Singapore Day 2 - 180414.Credit: Louis Zee

Singapore Day 2 - 180414
Credit: Louis Zee

Date: 2014-04-21T00:37:36Z
IMG_1995

Marion_Cole posted a photo:

IMG_1995

Date: 2014-04-21T00:37:46Z
DSC_0717

Emery :) posted a photo:

DSC_0717

Date: 2014-04-21T00:37:47Z
_DSC1177

McMoosie posted a photo:

_DSC1177

Date: 2014-04-21T00:37:53Z

J&LWalcott posted a photo:

Date: 2014-04-21T00:37:45Z
P1010113

joedec posted a photo:

P1010113

Date: 2014-04-21T00:37:54Z
Photo

xavierboswell99 posted a photo:

Photo

Date: 2014-04-21T00:37:55Z
IMG_0019

TrevorPR posted a photo:

IMG_0019

Date: 2014-04-21T00:37:54Z
Puspitek Lubana Sengkol 20140420-4267

Arqom Noval posted a photo:

Puspitek Lubana Sengkol 20140420-4267

Date: 2014-04-21T00:37:50Z

Comments

#1
2007-03-02 dumb_dave says :

Sorry, I'm new to this stuff, willing to learn and all that, but I don't get the idea. Copy that snippet of PHP code into a file and call it, say, parser.php. Copy the other snippet of HTML into a file and call it, for lack of inventiveness, parser.html. Right so far? If so, where's the intermediate step? How does this HTML "call" or "include" the PHP in order to function? Or am I missing something so basic that even asking this will earn me the cherished "Idiot of the Day Award"? Thanks.

#2
2007-03-02 BonRouge says :

dave,
You can include the php or just have it in one page. The page would have a '.php' extension - not '.html.'
Here's a simple example of this page (with no style or anthing) in one file.
Save it and change the extension to '.php'. If you don't have a server installed on your machine, you'll have to upload it to a remote server to view it.
If you want, you can take the php code out of that page and save it in a different file and include it into the page - that way, you could use it on more than one page if you wanted.

I hope that makes it a bit clearer.

#3
2007-03-02 dumb_dave says :

Thanks for the explanations. Much clearer now and ... yes, it indeed works like a champ. (Maybe I was just too tired? Putting 1 and 1 together and coming up with 11 instead of two?) Best regards and thanks for all the tips elsewhere as well.

#4
2007-03-07 dumb_dave says :

Useful indeed, BonRouge, but how does one display the <description> tagged material that is buried behind things like <![CDATA[ <p> etc.? Is the PHP code easily modified to handle that? And if so, can one apply it selectively? That is, show the fuller "description" material for one site but then reduce the next site entry to "headines" only (i.e., "titles" and "links") and then toggle the next one back to fuller details? Hope this is not a major headache, but it's beyond my ability to work it out at this stage ... and everything tried brought the larger process to a grinding halt. (This isn't a do-my-homework-for-me question. I'm bewildered by the code.) Thanks.

#5
2007-03-07 BonRouge says :

dave,
I thought I'd already sorted out the problem of data wrapped in the CDATA stuff. Does the code have a problem? If you could show me where it's not working, I'll try to improve it.
As for choosing whether to show that particular data or not, yes - I think you could do that by adding another variable. You see near the top where there's a preg_replace() to remove the CDATA tags? You could put that in an if statement - if the variable is not present, remove the CDATA tags, if it is, leave them where they are.
Does that make sense?

#6
2007-03-10 BonRouge says :

dave,
I think I found the problem and sorted it out. As you can see, it seems to work OK now. Some of the characters in the Lockergnome feed don't show right on this page though. I wonder if it's anything to do with me being in Japan. Do you see strange characters?

#7
2007-05-01 Ice says :

I have been trawling the web for days looking for something like this. Thanks a WHOLE lot man. I was also wondering if you can modify this parser to merge these fields and display, say, only the latest 10 items? wine

#8
2007-11-02 steve says :

thanks sorted out my cdata parasing problem, seems that is not too clear in the docs

s

Comment form

Please type the word 'brown' here:

BB code available :

  • [b]...[/b] : bold
  • [it]...[/it] : italic
  • [q]...[/q] : quote
  • [c]...[/c] : code
  • [url=...]...[/url] : url