Hi James.
Yes, you need to identify the namespace in the query, e.g.
getNodeSet(doc, "//x:entry", c(x =
"w3.org/2005/Atom"))
This yeilds 40 matching nodes.
(getNodeSet() is more convenient to use when you don't specify a function
to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the
entire document with the query "//...".)
BTW, you want to use xmlParse() and not xmlTreeParse().
D.
On 5/16/12 6:40 PM, J Toll wrote:> Hi,
>
> I'm trying to use the XML package to read an RSS feed. To get
> started, I was trying to use this post as an example:
>
>
r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page
>
> I can replicate the beginning section of the post, but when I try to
> use another RSS feed I have an issue. The RSS feed I would like to
> use is:
>
>> URL <-
"sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=&company=&dateb=&owner=include&start=0&count=40&output=atom"
>
>> library(XML)
>> doc <- xmlTreeParse(URL)
>
>> src <- xpathApply(xmlRoot(doc), "//entry")
>
> I get an empty list rather than a list of each of the "entry":
>
>> src
> list()
> attr(,"class")
> [1] "XMLNodeSet"
>
> I'm not sure how to fix this. Any suggestions? Do I need to provide
> a namespace, or is the RSS malformed?
>
> Thanks,
>
>
> James
>
> ______________________________________________
> R-help at r-project.org mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.