Tony Breyal
2008-Nov-04 12:37 UTC
[R] How to suppress errors from htmlTreeParse() function in XML package?
Dear R-help,
The following code downloads an html document into variable 'doc' and
then stores an internal representation into variable 'html.tree'. Even
if the html code is malformed, this still works which is fantastic.
However, as in the example below, i do get some ouput from R in the
console which i would like to suppress somehow, so i can keep my
window a bit cleaner.
I understand that the output is just letting me know that the html
code is malformed, but for my purposes i can ignore that output. Is
there a way to achieve this?
### Example:
library(RCurl); library(XML)
doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project
%22&as_qdr=d1&num=100')
html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE)
### Output - this is what i would like to suppress
Tag nobr invalid
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
### etc.
I attempted to use try(expr, silent=TRUE) but that didn't work for
me:> try(htmlTreeParse(doc, useInternalNodes = TRUE), silent=TRUE)
Many thanks in advance for any help,
Tony Breyal
### O/S = Windows Vista Ultimate ###> sessionInfo()
R version 2.8.0 (2008-10-20)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
1252;LC_MONETARY=English_United Kingdom.
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
base
other attached packages:
[1] XML_1.98-1 RCurl_0.91-0>
Martin Morgan
2008-Nov-04 17:21 UTC
[R] How to suppress errors from htmlTreeParse() function in XML package?
Hi Tony -- Tony Breyal <tony.breyal at googlemail.com> writes:> Dear R-help, > > The following code downloads an html document into variable 'doc' and > then stores an internal representation into variable 'html.tree'. Even > if the html code is malformed, this still works which is fantastic. > However, as in the example below, i do get some ouput from R in the > console which i would like to suppress somehow, so i can keep my > window a bit cleaner. > > I understand that the output is just letting me know that the html > code is malformed, but for my purposes i can ignore that output. Is > there a way to achieve this? > > ### Example: > library(RCurl); library(XML) > doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project > %22&as_qdr=d1&num=100') > html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE)How about capture.output res <- capture.output(html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE)) Martin> ### Output - this is what i would like to suppress > Tag nobr invalid > htmlParseEntityRef: expecting ';' > htmlParseEntityRef: expecting ';' > ### etc. > > I attempted to use try(expr, silent=TRUE) but that didn't work for me: >> try(htmlTreeParse(doc, useInternalNodes = TRUE), silent=TRUE) > > > Many thanks in advance for any help, > Tony Breyal > > > ### O/S = Windows Vista Ultimate ### >> sessionInfo() > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > 1252;LC_MONETARY=English_United Kingdom. > 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] XML_1.98-1 RCurl_0.91-0 >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
Tony Breyal
2008-Nov-05 10:26 UTC
[R] How to suppress errors from htmlTreeParse() function in XML package?
Thank you both Martin and Duncan, each suggestion work beautifully!
re: capture.output() - I don't remember ever coming across this little
function before, which is a shame because I can think of several
places where it would have been rather useful. R has so many lovely
little functions, just wish i could remember them all (though i have
found the cheat sheet to be a great time saving resource:
http://cran.r-project.org/doc/contrib/Short-refcard.pdf)
re: error = function(...){} - I'm not sure how i missed the
'error=xmlErrorCumulator()' parameter in the '?htmlTreeParse'
file ,
but i am grateful to you for supplying this form of the the parameter
because I had no idea you could use an empty function in this way;
brilliant!
Cheers,
Tony Breyal
On 4 Nov, 12:37, Tony Breyal <tony.bre... at googlemail.com>
wrote:> Dear R-help,
>
> The following code downloads an html document into variable 'doc'
and
> then stores an internal representation into variable 'html.tree'.
Even
> if the html code is malformed, this still works which is fantastic.
> However, as in the example below, i do get some ouput from R in the
> console which i would like to suppress somehow, so i can keep my
> window a bit cleaner.
>
> I understand that the output is just letting me know that the html
> code is malformed, but for my purposes i can ignore that output. Is
> there a way to achieve this?
>
> ### Example:
> library(RCurl); library(XML)
> doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project
> %22&as_qdr=d1&num=100')
> html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE)
>
> ### Output - this is what i would like to suppress
> Tag nobr invalid
> htmlParseEntityRef: expecting ';'
> htmlParseEntityRef: expecting ';'
> ### etc.
>
> I attempted to use try(expr, silent=TRUE) but that didn't work for me:
>
> > ?try(htmlTreeParse(doc, useInternalNodes = TRUE), silent=TRUE)
>
> Many thanks in advance for any help,
> Tony Breyal
>
> ### O/S = Windows Vista Ultimate ###> sessionInfo()
>
> R version 2.8.0 (2008-10-20)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
> 1252;LC_MONETARY=English_United Kingdom.
> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods
> base
>
> other attached packages:
> [1] XML_1.98-1 ? RCurl_0.91-0
>
>
>
> ______________________________________________
> R-h... at r-project.org mailing
listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.