Adam Cooper
2011-Apr-06 18:32 UTC
[R] Treatment of xml-stylesheet processing instructions in XML module
Hello again,
Another stumble here that is defeating me.
I try:
a<-readLines(url("http://feeds.feedburner.com/grokin"))
t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE,
asText=TRUE)
elem<- XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]]
And I get:
Start tag expected, '<' not found
Error: 1: Start tag expected, '<' not found
When I modify the second line in "a" to remove the following (just
leaving the <rss> tag with its attributes), I do not get the error.
I removed:
<?xml-stylesheet type=\"text/xsl\" media=\"screen\"
href\"/~d/styles/rss2full.xsl\"?><?xml-stylesheet
type=\"text/css\" media\"screen\"
href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css
\"?>
I would have expected the PI to be totally ignored by default.
Have I missed something??
Thanks in advance...
Cheers, Adam
Duncan Temple Lang
2011-Apr-06 23:06 UTC
[R] Treatment of xml-stylesheet processing instructions in XML module
Hi Adam
To use XPath and getNodeSet on an XML document,
you will want to use xmlParse() and not xmlTreeParse()
to parse the XML content. So
t = xmlParse(I(a)) # or asText = TRUE
elem = getNodeSet(t, "/rss/channel/item")[[1]]
works fine.
You don't need to specify the root node, but rather the document
in getNodeSet.
Also, if you have the package loaded, you don't need the XML::
prefix before the function names.
HTH
D.
On 4/6/11 11:32 AM, Adam Cooper wrote:> Hello again,
> Another stumble here that is defeating me.
>
> I try:
> a<-readLines(url("http://feeds.feedburner.com/grokin"))
> t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE,
> asText=TRUE)
> elem<-
XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]]
>
> And I get:
> Start tag expected, '<' not found
> Error: 1: Start tag expected, '<' not found
>
> When I modify the second line in "a" to remove the following
(just
> leaving the <rss> tag with its attributes), I do not get the error.
> I removed:
> <?xml-stylesheet type=\"text/xsl\" media=\"screen\"
href> \"/~d/styles/rss2full.xsl\"?><?xml-stylesheet
type=\"text/css\" media> \"screen\"
href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css
> \"?>
>
> I would have expected the PI to be totally ignored by default.
> Have I missed something??
>
> Thanks in advance...
>
> Cheers, Adam
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.