thr3ads.net - R help - [R] Remove superscripts from HTML objects [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Chris Stubben

2012-Apr-12 01:56 UTC

[R] Remove superscripts from HTML objects

Is there some way to remove superscripts from objects returned by
html/xmlParse (XML package)?

h <-
"<html><p>Cat<sup>a</sup></p><p>Dog</p></html>"
doc <- htmlParse(h)
 xpathSApply(doc, "//p", xmlValue)
[1] "Cata" "Dog"

I could probably remove the  <sup> tags from the "h" object
above, but I'd
rather just work with the results from htmlParse if possible (and not use
readLines to load raw HTML first).

Thanks,
Chris Stubben
 


--
View this message in context:
http://r.789695.n4.nabble.com/Remove-superscripts-from-HTML-objects-tp4550738p4550738.html
Sent from the R help mailing list archive at Nabble.com.

mlell08

2012-Apr-12 14:31 UTC

head link

[R] Remove superscripts from HTML objects

Hi,

h <-
"<html><p>Cat<sup>a</sup></p><p>Dog</p></html>"
sub("<sup.*sup>","",h)

see http://en.wikibooks.org/wiki/R_Programming/Text_Processing for more
information.

Regards!

S Ellison

2012-Apr-13 12:42 UTC

head link

[R] Remove superscripts from HTML objects

> h <-
"<html><p>Cat<sup>a</sup></p><p>Dog</p></html>"
> sub("<sup.*sup>","",h)
Probably safer to do  

gsub("<sup.*?sup>","",h)

to avoid replacing multiple superscripts.

eg 
h2 <-
"<html><p>Cat<sup>a</sup></p><p>Dog</p><p>Mouse<sup>a</sup></p><p>Raccoon</p></html>"
sub("<sup.*sup>","",h2)                 #drops
everything between first <sup and last sup>
gsub("<sup.*?sup>","",h2)            #Drops each
<sub>xxx</sup>


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Chris Stubben

2012-Apr-13 16:39 UTC

head link

[R] Remove superscripts from HTML objects

Sorry if I was not clear.  I wanted to remove the superscripts using xpath
queries if possible.  For example this will get p nodes with superscripts,
but how do I remove the superscripts if there are many matching nodes and
different superscripts?

xpathSApply(doc, "//p[sup]", xmlValue) 
[1] "Cata"


Chris

--
View this message in context:
http://r.789695.n4.nabble.com/Remove-superscripts-from-HTML-objects-tp4550738p4555370.html
Sent from the R help mailing list archive at Nabble.com.

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Apr 2012 - Remove superscripts from HTML objects

[R] Remove superscripts from HTML objects

[R] Remove superscripts from HTML objects

[R] Remove superscripts from HTML objects

[R] Remove superscripts from HTML objects

Apparently Analagous Threads