thr3ads.net - R help - [R] using XML package to read RSS [May 2012]

If this information is useful, please help other people find it:
Share via:

J Toll

2012-May-17 01:40 UTC

[R] using XML package to read RSS

Hi,

I'm trying to use the XML package to read an RSS feed.  To get
started, I was trying to use this post as an example:

http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/

I can replicate the beginning section of the post, but when I try to
use another RSS feed I have an issue.  The RSS feed I would like to
use is:
> URL <-
"http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=&company=&dateb=&owner=include&start=0&count=40&output=atom"
> library(XML)
> doc <- xmlTreeParse(URL)
> src <- xpathApply(xmlRoot(doc), "//entry")
I get an empty list rather than a list of each of the "entry":
> srclist()
attr(,"class")
[1] "XMLNodeSet"

I'm not sure how to fix this.  Any suggestions?  Do I need to provide
a namespace, or is the RSS malformed?

Thanks,


James

Duncan Temple Lang

2012-May-17 02:02 UTC

head link

[R] using XML package to read RSS

Hi James.

 Yes, you need to identify the namespace in the query, e.g.

  getNodeSet(doc, "//x:entry", c(x =
"http://www.w3.org/2005/Atom"))

This yeilds 40 matching nodes.

(getNodeSet() is more convenient to use when you don't specify a function
to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the
entire document with the query "//...".)

 BTW, you want to use xmlParse() and not xmlTreeParse().

   D.


On 5/16/12 6:40 PM, J Toll wrote:> Hi,
> 
> I'm trying to use the XML package to read an RSS feed.  To get
> started, I was trying to use this post as an example:
> 
>
http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/
> 
> I can replicate the beginning section of the post, but when I try to
> use another RSS feed I have an issue.  The RSS feed I would like to
> use is:
> 
>> URL <-
"http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=&company=&dateb=&owner=include&start=0&count=40&output=atom"
> 
>> library(XML)
>> doc <- xmlTreeParse(URL)
> 
>> src <- xpathApply(xmlRoot(doc), "//entry")
> 
> I get an empty list rather than a list of each of the "entry":
> 
>> src
> list()
> attr(,"class")
> [1] "XMLNodeSet"
> 
> I'm not sure how to fix this.  Any suggestions?  Do I need to provide
> a namespace, or is the RSS malformed?
> 
> Thanks,
> 
> 
> James
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more maybe matching threads

R help - May 2012 - using XML package to read RSS

[R] using XML package to read RSS

[R] using XML package to read RSS

Seemingly Similar Threads