RSS/XML feed parser

Here's some php:

PHP
function xml_parser($page,$container,$tags,$number,$cdata) {
  if (!$number) {$number=100;}
  $stories=0;
  $xml=file_get_contents($page);
  preg_match_all("/<$container>.+<\/$container>/sU",$xml, $items);
  $items=$items[0];
  $itemsArray=array();
   foreach ($items as $item) {
    for($i=0; $i<count($tags); $i++) {
    preg_match("/<$tags[$i](.+)(<\/$tags[$i]>)/sU", $item, $tag);
    $this[$i]=preg_replace("/<$tags[$i]>(.+)(<\/$tags[$i]>)/sU",'$1',$tag);
    $this[$i]=array_map('html_entity_decode', $this[$i]);
    }
     if (count($itemsArray)<$number) {array_push($itemsArray, $this);}
   }
  $theData="<dl>";
  foreach ($itemsArray as $item) {
  for($i=0; $i<count($tags); $i++) {
  $data[$i]=$item[$i][0];    }
   $title=$data[0];
   $dpatterns[0]="/<img(.+)><\/img>/sU"; $dreplacements[0]='<img$1>';
   $dpatterns[1]="/<img(.+)\/>/sU"; $dreplacements[1]='<img$1>';
   $dpatterns[2]="/<(\/|)content?(.+|)>/sU"; $dreplacements[2]='';
   $dpatterns[3]="/border=\"0\"/sU"; $dreplacements[3]='';
   if ($cdata!='hide') {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='$1';
   }
   else {
    $dpatterns[4]="/<\!\[CDATA\[(.+)\]\]>/sU"; $dreplacements[4]='';
   }
   $description=preg_replace($dpatterns,$dreplacements,$data[1]);
   $link=preg_replace("/<link.+href=\"(.+)\"(.+|)\/>/sU",'$1',$data[2]);
   $date=$data[3];
   $theData.="
   <dt><a href=\"$link\">$title</a></dt>
   <dd class=\"story\">$description</dd>
   <dd>Date: $date</dd>\r";
  }
$theData.="</dl>";
return $theData;
}

$container='item';
$tags=array('title','description','link','pubDate');
$bbc=xml_parser("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml",$container,$tags,10,'');
$cnn=xml_parser("http://rss.cnn.com/rss/cnn_topstories.rss",$container,$tags,10,'');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'hide');

$tags=array('title','content:encoded','link','pubDate');
$lockergnome=xml_parser("http://feed.lockergnome.com/nexus/all",$container,$tags,5,'');

$container='entry';
$tags=array('title','content','link','published');
$flickr=xml_parser("http://api.flickr.com/services/feeds/photos_public.gne",$container,$tags,10,'');

Here's some HTML with PHP

HTML/PHP
<h2>bbc</h2>
<?php echo $bbc; ?>
<h2>cnn</h2>
<?php echo $cnn; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome1; ?>
<h2>lockergnome</h2>
<?php echo $lockergnome2; ?>
<h2>flickr</h2>
<?php echo $flickr; ?>

Here's what we get... (the lastest feeds from the BBC, CNN, Lockergnome - with CDATA stripped and shown - and flickr).

bbc

UK soldiers push to clear Taliban
Hundreds of UK soldiers launch an operation to clear Taliban insurgents from a key stronghold in southern Afghanistan.
Date: Fri, 30 Jul 2010 10:43:38 +0000
Prescott Iraq intelligence doubts
The intelligence on Iraq's weapons threat was not "very substantial", former deputy prime minister Lord Prescott says.
Date: Fri, 30 Jul 2010 10:01:04 +0000
French mother 'relieved by truth'
A French mother who admitted killing eight of her newborn babies is relieved that her secret is finally out in the open, her lawyer says.
Date: Fri, 30 Jul 2010 10:38:20 +0000
Strikes and ash extend BA losses
BA reveals a steep quarterly loss of £164m after being hit by cabin crew strikes and disruption caused by the volcanic ash cloud.
Date: Fri, 30 Jul 2010 10:09:02 +0000
Expenses four in appeals defeat
Three ex Labour MPs and an ex-Tory peer lose appeals over a ruling that they are not protected from prosecution over expenses fraud allegations.
Date: Fri, 30 Jul 2010 10:12:42 +0000
Proposal to scrap benefit system
Merging all tax credits and benefits into a single payment is one option being considered by Iain Duncan Smith in a "radical" welfare shake-up.
Date: Fri, 30 Jul 2010 08:31:53 +0000
Police officer in murder arrest
A 26-year-old quizzed over the death of a man at a Merseyside pub was an off-duty police officer, it has emerged.
Date: Fri, 30 Jul 2010 07:39:11 +0000
Syria and Saudi leaders in Beirut
Syria's president heads to Lebanon after years of tension between the two countries, in a visit with the Saudi king to try to avert a looming political crisis.
Date: Fri, 30 Jul 2010 04:12:17 +0000
MoD 'to pay for Trident renewal'
The MoD is facing further pressure on its budget after the chancellor says it will have to pay for new nuclear submarines, and not the Treasury as before.
Date: Fri, 30 Jul 2010 08:39:16 +0000
London saddles up for bike scheme
A bike hire scheme designed to encourage thousands more cycle journeys in central London begins.
Date: Fri, 30 Jul 2010 08:34:47 +0000

cnn

Daughters: Mother accused of killing babies was secretive
A French woman who admitted to giving birth to and smothering eight babies over a 17-year period was secretive but always supportive of her family, according to two of her daughters.
Date: Fri, 30 Jul 2010 03:21:20 EDT
New BP boss to discuss Gulf recovery
Incoming BP CEO Bob Dudley is expected to discuss the oil giant's long-term recovery efforts in the Gulf of Mexico during a news conference in Mississippi on Friday.
Date: Fri, 30 Jul 2010 05:30:11 EDT
Soldier taken to Va. in Wikileaks probe
An Army private suspected of leaking classified material, including videos, has been transferred from Kuwait to a Marine Corps brig in Quantico, Virginia.
Date: Fri, 30 Jul 2010 01:48:24 EDT
New Gulf leak may take 10 days to cap
An abandoned well struck by a barge in southeastern Louisiana early Tuesday is still spewing a mixture of oil, gas and water, and it could take 10 days before it is capped.
Date: Fri, 30 Jul 2010 02:33:40 EDT
Protests mark 1 year of hikers in Iran
A protest Friday afternoon outside Iran's U.N. mission in New York kicks off a weekend of events demanding the country release three American hikers it has held for one year.
Date: Fri, 30 Jul 2010 04:10:28 EDT
Governor appeals immigration law
The legal battle over Arizona's new immigration law entered its next stage when Gov. Jan Brewer filed an expedited appeal.
Date: Fri, 30 Jul 2010 05:33:37 EDT
Ethics panel charges Democrat Rangel
The House ethics panel accused veteran Rep. Charles Rangel of 13 violations of House rules involving alleged financial wrongdoing and harming Congress' credibility.
Date: Fri, 30 Jul 2010 05:23:12 EDT
Police: Ex-NBA player shot in homicide
A body found in Memphis this week was identified as former NBA player Lorenzen Wright, police said. The death was ruled a "homicide by gunshot wound."
Date: Fri, 30 Jul 2010 01:04:15 EDT
Miramax sold for $660M
The Walt Disney Company said Friday it has agreed to sell Miramax Films for around $660 million to an investor group.
Date: Fri, 30 Jul 2010 05:26:28 EDT
Life insurance target of fraud inquiry
New York Attorney General Andrew Cuomo is launching a fraud investigation into the life insurance industry for "practices that appear to have denied grieving military families and others of millions in life-insurance cash," Cuomo's office announced Thursday.
Date: Fri, 30 Jul 2010 02:06:00 EDT

lockergnome (hidden CDATA)

The lockergnome feed seems to be down.

lockergnome

The lockergnome feed seems to be down.

flickr

DSCF3670

ae_greene posted a photo:

DSCF3670

Date: 2010-07-30T10:51:24Z
borneo!!! 324 good

dnfrd_chrs posted a photo:

borneo!!! 324 good

Date: 2010-07-30T10:51:24Z
Disguise Your Beauty

dsjack posted a photo:

Disguise Your Beauty

Bromotions
17th July Creation Studio
BOTB

Date: 2010-07-30T10:51:25Z
nerd1

secretchild posted a photo:

nerd1

Date: 2010-07-30T10:51:25Z
saudade

Tralalá Arte em E.V.A e Scrapbook_ Valdicélia X. posted a photo:

saudade

Date: 2010-07-30T10:51:26Z
TG MR

Cyndy Fassett posted a photo:

TG MR

Date: 2010-07-30T10:51:26Z
DSCF1076

forglemmigej 726 out all day back later posted a photo:

DSCF1076

Date: 2010-07-30T10:51:26Z
4daagse 2010 044

fransmiggelbrink posted a photo:

4daagse 2010 044

Date: 2010-07-30T10:51:27Z
IMG_1574

яαƒ яαƒ♥ posted a photo:

IMG_1574

Date: 2010-07-30T10:51:27Z
P1260140

shanlung posted a photo:

P1260140

Date: 2010-07-30T10:51:27Z

Comments

#1
2007-03-02 dumb_dave says :

Sorry, I'm new to this stuff, willing to learn and all that, but I don't get the idea. Copy that snippet of PHP code into a file and call it, say, parser.php. Copy the other snippet of HTML into a file and call it, for lack of inventiveness, parser.html. Right so far? If so, where's the intermediate step? How does this HTML "call" or "include" the PHP in order to function? Or am I missing something so basic that even asking this will earn me the cherished "Idiot of the Day Award"? Thanks.

#2
2007-03-02 BonRouge says :

dave,
You can include the php or just have it in one page. The page would have a '.php' extension - not '.html.'
Here's a simple example of this page (with no style or anthing) in one file.
Save it and change the extension to '.php'. If you don't have a server installed on your machine, you'll have to upload it to a remote server to view it.
If you want, you can take the php code out of that page and save it in a different file and include it into the page - that way, you could use it on more than one page if you wanted.

I hope that makes it a bit clearer.

#3
2007-03-02 dumb_dave says :

Thanks for the explanations. Much clearer now and ... yes, it indeed works like a champ. (Maybe I was just too tired? Putting 1 and 1 together and coming up with 11 instead of two?) Best regards and thanks for all the tips elsewhere as well.

#4
2007-03-07 dumb_dave says :

Useful indeed, BonRouge, but how does one display the <description> tagged material that is buried behind things like <![CDATA[ <p> etc.? Is the PHP code easily modified to handle that? And if so, can one apply it selectively? That is, show the fuller "description" material for one site but then reduce the next site entry to "headines" only (i.e., "titles" and "links") and then toggle the next one back to fuller details? Hope this is not a major headache, but it's beyond my ability to work it out at this stage ... and everything tried brought the larger process to a grinding halt. (This isn't a do-my-homework-for-me question. I'm bewildered by the code.) Thanks.

#5
2007-03-07 BonRouge says :

dave,
I thought I'd already sorted out the problem of data wrapped in the CDATA stuff. Does the code have a problem? If you could show me where it's not working, I'll try to improve it.
As for choosing whether to show that particular data or not, yes - I think you could do that by adding another variable. You see near the top where there's a preg_replace() to remove the CDATA tags? You could put that in an if statement - if the variable is not present, remove the CDATA tags, if it is, leave them where they are.
Does that make sense?

#6
2007-03-10 BonRouge says :

dave,
I think I found the problem and sorted it out. As you can see, it seems to work OK now. Some of the characters in the Lockergnome feed don't show right on this page though. I wonder if it's anything to do with me being in Japan. Do you see strange characters?

#7
2007-05-01 Ice says :

I have been trawling the web for days looking for something like this. Thanks a WHOLE lot man. I was also wondering if you can modify this parser to merge these fields and display, say, only the latest 10 items? wine

#8
2007-11-02 steve says :

thanks sorted out my cdata parasing problem, seems that is not too clear in the docs

s

Comment form

Please type the word 'purple' here:

BB code available :

  • [b]...[/b] : bold
  • [it]...[/it] : italic
  • [q]...[/q] : quote
  • [c]...[/c] : code
  • [url=...]...[/url] : url