Taking apart Google News Feed.

Discussion in 'XML & RSS' started by Skinny, Feb 4, 2007.

  1. #1
    Hey guys,

    Okay I'm working on displaying a google's news feed and I can parse it ok in MagpieRSS.

    My problem is that the description displays the image (which I don't mind, but I hate how it's displayed) and then the discription (which I want to extract) and then associated links (which I don't want).

    How do I find out each part of the description (image, actual description, and links), so that I can eliminate them or play with each part seperately.

    What are the names of the various parts of the description?

    Skinny
     
    Skinny, Feb 4, 2007 IP
  2. Skinny

    Skinny Peon

    Messages:
    1,864
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Well This time I thought of looking at the source code. :)

    Seems like there is a table structure built into the description "field" that adds everything in (pic, snippet, and links).

    Does anyone know how I can just take out the snippet? and then display it using MagpieRSS (I know how to display it just need to know how to retrieve it)?

    Skinny
     
    Skinny, Feb 4, 2007 IP
  3. Skinny

    Skinny Peon

    Messages:
    1,864
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Alright Here's one example. This is what is part of what is displayed in the code for Google news under Sci/Tech for one headline.

    - <item>
      <title>Kids at greater risk of seeing online Internet porn than ever - iTWire</title> 
      <link>http://news.google.ca/news/url?sa=T&ct=ca/0-0&fd=R&url=http://www.itwire.com.au/content/view/9241/53/&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw</link> 
      <guid isPermaLink="false">tag:news.google.com,2005:cluster=425c2ab8</guid> 
      <pubDate>Mon, 05 Feb 2007 12:18:00 GMT</pubDate> 
      <description><br><table border=0 width= valign=top cellpadding=2 cellspacing=7><tr><td width=80 align=center valign=top><a href="http://news.google.ca/news/url?sa=T&ct=ca/0i-0&fd=R&url=http://www.playfuls.com/news_06082_Web_Surfing_Exposes_Children_to_Unwanted_Porn_Content.html&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw"><img src=http://news.google.ca/news?imgefp=G-VSe9BbV2EJ&imgurl=www.playfuls.com/scitech/gimages/modsites2.jpg width=80 height=60 alt="" border=1><br><font size=-2>Playfuls.com</font></a></td><td valign=top><a href="http://news.google.ca/news/url?sa=T&ct=ca/0-0&fd=R&url=http://www.itwire.com.au/content/view/9241/53/&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw"><b>Kids at greater risk of seeing online Internet porn than ever</b></a><br><font size=-1><font color=#6f6f6f><b>iTWire&nbsp;-</font> <nobr>4 hours ago</nobr></b></font><br><font size=-1>By Alex Zaharov-Reutt. According to a report called ‘Taking on the Internet Porn Industry’, and a report from the Pediatrics Journal, kids are at greater risk of being exposed to online porn than ever before, giving parents a tougher time than ever in <b>...</b></font><br><font size=-1><a href="http://news.google.ca/news/url?sa=T&ct=ca/0-1&fd=R&url=http://news.xinhuanet.com/english/2007-02/05/content_5699635.htm&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw">More kids, teens exposed to online porn in US</a> <font size=-1 color=#6f6f6f><nobr>Xinhua</nobr></font></font><br><font size=-1><a href="http://news.google.ca/news/url?sa=T&ct=ca/0-2&fd=R&url=http://www.playfuls.com/news_06082_Web_Surfing_Exposes_Children_to_Unwanted_Porn_Content.html&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw">Web-Surfing Exposes Children to Unwanted Porn Content</a> <font size=-1 color=#6f6f6f><nobr>Playfuls.com</nobr></font></font><br><font size=-1 class=p><a href="http://news.google.ca/news/url?sa=T&ct=ca/0-3&fd=R&url=http://tvnz.co.nz/view/page/411749/978818&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw"><nobr>TVNZ</nobr></a>&nbsp;- <a href="http://news.google.ca/news/url?sa=T&ct=ca/0-4&fd=R&url=http://www.latimes.com/news/printedition/asection/la-na-briefs5.2feb05,1,2308979.story%3Fcoll%3Dla-news-a_section&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw"><nobr>Los Angeles Times</nobr></a>&nbsp;- <a href="http://news.google.ca/news/url?sa=T&ct=ca/0-5&fd=R&url=http://qconline.com/archives/qco/display.php%3Fid%3D325726&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw"><nobr>Quad-Cities Online</nobr></a>&nbsp;- <a href="http://news.google.ca/news/url?sa=T&ct=ca/0-6&fd=R&url=http://www.jpost.com/servlet/Satellite%3Fcid%3D1170359781735%26pagename%3DJPost%252FJPArticle%252FShowFull&cid=1113336504&ei=TmHHRZCYK4jyoQKbj_y2Aw"><nobr>Jerusalem Post</nobr></a></font><br/><font class=p size=-1><a class=p href=http://news.google.ca/?ned=ca&ncl=1113336504&hl=en><nobr><b>all 309 news articles</b></nobr></a></font></table></description> 
      </item>
    
    Code (markup):
    The description I want to display is

    According to a report called ‘Taking on the Internet Porn Industry’, and a report from the Pediatrics Journal, kids are at greater risk of being exposed to online porn than ever before, giving parents a tougher time than ever in ...

    How do I extract this from the HTML structure created by Google?

    Skinny
     
    Skinny, Feb 5, 2007 IP