Tony Breyal
2008-Nov-04 12:37 UTC
[R] How to suppress errors from htmlTreeParse() function in XML package?
Dear R-help, The following code downloads an html document into variable 'doc' and then stores an internal representation into variable 'html.tree'. Even if the html code is malformed, this still works which is fantastic. However, as in the example below, i do get some ouput from R in the console which i would like to suppress somehow, so i can keep my window a bit cleaner. I understand that the output is just letting me know that the html code is malformed, but for my purposes i can ignore that output. Is there a way to achieve this? ### Example: library(RCurl); library(XML) doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project %22&as_qdr=d1&num=100') html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE) ### Output - this is what i would like to suppress Tag nobr invalid htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' ### etc. I attempted to use try(expr, silent=TRUE) but that didn't work for me:> try(htmlTreeParse(doc, useInternalNodes = TRUE), silent=TRUE)Many thanks in advance for any help, Tony Breyal ### O/S = Windows Vista Ultimate ###> sessionInfo()R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_1.98-1 RCurl_0.91-0>
Martin Morgan
2008-Nov-04 17:21 UTC
[R] How to suppress errors from htmlTreeParse() function in XML package?
Hi Tony -- Tony Breyal <tony.breyal at googlemail.com> writes:> Dear R-help, > > The following code downloads an html document into variable 'doc' and > then stores an internal representation into variable 'html.tree'. Even > if the html code is malformed, this still works which is fantastic. > However, as in the example below, i do get some ouput from R in the > console which i would like to suppress somehow, so i can keep my > window a bit cleaner. > > I understand that the output is just letting me know that the html > code is malformed, but for my purposes i can ignore that output. Is > there a way to achieve this? > > ### Example: > library(RCurl); library(XML) > doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project > %22&as_qdr=d1&num=100') > html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE)How about capture.output res <- capture.output(html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE)) Martin> ### Output - this is what i would like to suppress > Tag nobr invalid > htmlParseEntityRef: expecting ';' > htmlParseEntityRef: expecting ';' > ### etc. > > I attempted to use try(expr, silent=TRUE) but that didn't work for me: >> try(htmlTreeParse(doc, useInternalNodes = TRUE), silent=TRUE) > > > Many thanks in advance for any help, > Tony Breyal > > > ### O/S = Windows Vista Ultimate ### >> sessionInfo() > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > 1252;LC_MONETARY=English_United Kingdom. > 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] XML_1.98-1 RCurl_0.91-0 >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
Tony Breyal
2008-Nov-05 10:26 UTC
[R] How to suppress errors from htmlTreeParse() function in XML package?
Thank you both Martin and Duncan, each suggestion work beautifully! re: capture.output() - I don't remember ever coming across this little function before, which is a shame because I can think of several places where it would have been rather useful. R has so many lovely little functions, just wish i could remember them all (though i have found the cheat sheet to be a great time saving resource: http://cran.r-project.org/doc/contrib/Short-refcard.pdf) re: error = function(...){} - I'm not sure how i missed the 'error=xmlErrorCumulator()' parameter in the '?htmlTreeParse' file , but i am grateful to you for supplying this form of the the parameter because I had no idea you could use an empty function in this way; brilliant! Cheers, Tony Breyal On 4 Nov, 12:37, Tony Breyal <tony.bre... at googlemail.com> wrote:> Dear R-help, > > The following code downloads an html document into variable 'doc' and > then stores an internal representation into variable 'html.tree'. Even > if the html code is malformed, this still works which is fantastic. > However, as in the example below, i do get some ouput from R in the > console which i would like to suppress somehow, so i can keep my > window a bit cleaner. > > I understand that the output is just letting me know that the html > code is malformed, but for my purposes i can ignore that output. Is > there a way to achieve this? > > ### Example: > library(RCurl); library(XML) > doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project > %22&as_qdr=d1&num=100') > html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE) > > ### Output - this is what i would like to suppress > Tag nobr invalid > htmlParseEntityRef: expecting ';' > htmlParseEntityRef: expecting ';' > ### etc. > > I attempted to use try(expr, silent=TRUE) but that didn't work for me: > > > ?try(htmlTreeParse(doc, useInternalNodes = TRUE), silent=TRUE) > > Many thanks in advance for any help, > Tony Breyal > > ### O/S = Windows Vista Ultimate ###> sessionInfo() > > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > 1252;LC_MONETARY=English_United Kingdom. > 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods > base > > other attached packages: > [1] XML_1.98-1 ? RCurl_0.91-0 > > > > ______________________________________________ > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.