Adam Cooper
2011-Apr-06 18:32 UTC
[R] Treatment of xml-stylesheet processing instructions in XML module
Hello again, Another stumble here that is defeating me. I try: a<-readLines(url("http://feeds.feedburner.com/grokin")) t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE, asText=TRUE) elem<- XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]] And I get: Start tag expected, '<' not found Error: 1: Start tag expected, '<' not found When I modify the second line in "a" to remove the following (just leaving the <rss> tag with its attributes), I do not get the error. I removed: <?xml-stylesheet type=\"text/xsl\" media=\"screen\" href\"/~d/styles/rss2full.xsl\"?><?xml-stylesheet type=\"text/css\" media\"screen\" href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css \"?> I would have expected the PI to be totally ignored by default. Have I missed something?? Thanks in advance... Cheers, Adam
Duncan Temple Lang
2011-Apr-06 23:06 UTC
[R] Treatment of xml-stylesheet processing instructions in XML module
Hi Adam To use XPath and getNodeSet on an XML document, you will want to use xmlParse() and not xmlTreeParse() to parse the XML content. So t = xmlParse(I(a)) # or asText = TRUE elem = getNodeSet(t, "/rss/channel/item")[[1]] works fine. You don't need to specify the root node, but rather the document in getNodeSet. Also, if you have the package loaded, you don't need the XML:: prefix before the function names. HTH D. On 4/6/11 11:32 AM, Adam Cooper wrote:> Hello again, > Another stumble here that is defeating me. > > I try: > a<-readLines(url("http://feeds.feedburner.com/grokin")) > t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE, > asText=TRUE) > elem<- XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]] > > And I get: > Start tag expected, '<' not found > Error: 1: Start tag expected, '<' not found > > When I modify the second line in "a" to remove the following (just > leaving the <rss> tag with its attributes), I do not get the error. > I removed: > <?xml-stylesheet type=\"text/xsl\" media=\"screen\" href> \"/~d/styles/rss2full.xsl\"?><?xml-stylesheet type=\"text/css\" media> \"screen\" href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css > \"?> > > I would have expected the PI to be totally ignored by default. > Have I missed something?? > > Thanks in advance... > > Cheers, Adam > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.