Welcome to Dream.In.Code
Become a Java Expert!

Join 150,386 Java Programmers for FREE! Get instant access to thousands of Java experts, tutorials, code snippets, and more! There are 1,123 people online right now. Registration is fast and FREE... Join Now!




Doing basic parsing of an RSS feed

 
Reply to this topicStart new topic

Doing basic parsing of an RSS feed, Sounds straightforward, right?

Imek
26 Jan, 2008 - 09:21 AM
Post #1

D.I.C Head
**

Joined: 25 Oct, 2007
Posts: 52


My Contributions
Hey,

I'm doing a program in Java that takes data from various XML sources and mashes it together using RDF descriptions. In this particular stage I've been grabbing data from Audioscrobbler (last.fm) web services (http://www.audioscrobbler.net/data/webservices/), but some of it is in RSS form. I tried using the DOM parser to read the RSS feeds, but I couldn't get it to give me anything other than a big pile of nulls. I eventually found RSS Utilities on the Sun website, which kind of works, but there are some non-standard elements that it seems to ignore. For example:

CODE
<item>
  <title>An Evening with Dream Theater on 28 Jan 2008</title>
  <description>Location: They are south of Greenhill Road and west of Goodwood Road in the suburb of Wayville., Adelaide, Australia

</description>
  <link>http://www.last.fm/event/416069</link>
  <guid>http://www.last.fm/event/416069</guid>
  <pubDate>Wed, 14 Nov 2007 12:15:15 +0000</pubDate>

  <xcal:dtstart>2008-01-28T00:00:00Z</xcal:dtstart>
  <xcal:dtend>2008-01-28T23:59:59Z</xcal:dtend>
  <xcal:location>http://www.last.fm/venue/8780432</xcal:location>
</item>


This is an item for an event, and I want to get the start and end dates out of it as well. However, the Item class that comes with the parser only lets me get the standard stuff like title and pubDate - I've messed around with the data I can get out of it, but as far as I can tell the dtstart, dtend and location data isn't even in there anywhere.

Anyone have any advice as to how I could do this? Is there any trick to just getting the DOM parser to read the RSS file? Or is there a better parser out there?

Thanks,

-Joe
User is offlineProfile CardPM
+Quote Post

Tom9729
RE: Doing Basic Parsing Of An RSS Feed
26 Jan, 2008 - 09:51 AM
Post #2

Debian guru
Group Icon

Joined: 30 Dec, 2007
Posts: 1,593



Thanked: 12 times
Dream Kudos: 325
My Contributions
I've done something like this before, except I just used the String class to "parse" it. wink2.gif
User is online!Profile CardPM
+Quote Post

Martyr2
RE: Doing Basic Parsing Of An RSS Feed
26 Jan, 2008 - 10:20 AM
Post #3

Programming Theoretician
Group Icon

Joined: 18 Apr, 2007
Posts: 5,660



Thanked: 314 times
Expert In: C/C++, Java, VB, VB.NET, C#, PHP, Web Development, HTML & CSS, Javascript

My Contributions
You could do the classic DOM parse of this with the following example I created for you as a guide.

CODE

import javax.xml.*;
import java.io.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;

public class ParseFile {
    public static void main(String args[]) {
        // Read in our XML document
        File thefile = new File("c:\\test.xml");

        // Create our DocumentBuilderFactory object and get an instance
        DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();

        try {
            // Parse the file into our DOM tree
            DocumentBuilder p = f.newDocumentBuilder();
            Document doc = p.parse(thefile);
    
            // Get the dtstart time, making sure to specify the namespace.
            NodeList list = doc.getElementsByTagName("xcal:dtstart");
            Element startEl = (Element)list.item(0);
            String startdt = ((Text)startEl.getFirstChild()).getData();

            // Print out its value
            System.out.println("Start date is: " + startdt);

            // Lets change its node value to the 26th
            ((Node)startEl).setNodeValue("2008-01-26T00:00:00Z");

            // Get the value and reprint
            startdt = ((Text)startEl.getFirstChild()).getData();

            System.out.println("Start date is now: " + startdt);
        }

        // Catch any of our exceptions (not handled for example purposes)
        catch (ParserConfigurationException ex) { }
        catch (SAXException ex) { }
        catch (IOException ex) { }

    }
}


Notice how I yank that start date right out of the tree and even go as far as changing it. You can do the same without having to download anything special. All this comes included with Java 1.4+.

Enjoy!

"At DIC we be XML masters of the universe... that and code ninjas!" decap.gif

This post has been edited by Martyr2: 26 Jan, 2008 - 10:23 AM
User is offlineProfile CardPM
+Quote Post

Programmist
RE: Doing Basic Parsing Of An RSS Feed
26 Jan, 2008 - 03:09 PM
Post #4

Four-letter word
Group Icon

Joined: 2 Jan, 2006
Posts: 1,257



Thanked: 11 times
Dream Kudos: 100
Expert In: Java

My Contributions
I created a GUI-based RSS/Atom reader several years ago and I used SAX for parsing. I made a few posts about it on my blog. I also made a post about it here, which may give you some more info.
User is offlineProfile CardPM
+Quote Post

Imek
RE: Doing Basic Parsing Of An RSS Feed
27 Jan, 2008 - 01:24 PM
Post #5

D.I.C Head
**

Joined: 25 Oct, 2007
Posts: 52


My Contributions
Thanks for the replies, guys. I was kind of weirded out as to why it wasn't working with DOM in the first place - I had a method that took a URL as an argument and gave the document node. Was it not working because it could get the document node for an RSS file or something? Not sure why that is, but it's working with the getElementsByTagName method.
User is offlineProfile CardPM
+Quote Post

Imek
RE: Doing Basic Parsing Of An RSS Feed
27 Jan, 2008 - 03:34 PM
Post #6

D.I.C Head
**

Joined: 25 Oct, 2007
Posts: 52


My Contributions
Okay, I have a second question. I'm now fetching XML from amazon with this example URL:

http://ecs.amazonaws.co.uk/onca/xml?Servic...k,Medium,Tracks

Here's the part of my code that grabs the nodes labelled "Item":

CODE
public static Document getDOMDocument(String stringurl) throws Exception
{
    URL url = new URL(stringurl);
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc=builder.parse(url.openStream());
    return doc;
}


CODE
String dataURL = getAlbumSearchURL(name);
Document document = SiteReader.getDOMDocument(dataURL);
NodeList items = document.getElementsByTagName("Item");

for (int i=0; i<items.getLength() && i<Globals.defaultNoAmazonResults; i++)
{
(...)

Why is this giving me an empty NodeList?
User is offlineProfile CardPM
+Quote Post

Imek
RE: Doing Basic Parsing Of An RSS Feed
28 Jan, 2008 - 01:45 AM
Post #7

D.I.C Head
**

Joined: 25 Oct, 2007
Posts: 52


My Contributions
Haha, never mind.. stupid mistake on my part (I had defaultNoAmazonResults set to 0).

I'll stop posting now.
User is offlineProfile CardPM
+Quote Post

Fast ReplyReply to this topicStart new topic
Time is now: 1/9/09 04:56PM

Be Social

Dream.In.Code RSS Feed Dream.In.Code LinkedIn Group Follow Us On Twitter

Live Java Help!

Java Tutorials

Reference Sheets

Java Snippets

DIC Chatroom

Bye Bye Ads

Monthly Drawing

Thumb Drive

Top Contributors

Top 10 Kudos This Month