thr3ads.net - R help - [R] reading tables from multiple HTML pages [Aug 2011]

If this information is useful, please help other people find it:
Share via:

s1oliver

2011-Aug-29 16:04 UTC

[R] reading tables from multiple HTML pages

Hi, beginner to R and was having some problems scraping data from tables in
html using the XML package. I have included some code below.

I am trying to loop through a series of html pages, each of which contains a
single table from which I want to scrape data. However, some of the pages
are blank - and so it throws me an error message when it gets to
htmlParse(). The loop then closes out and I get the error message below:

Error in htmlParse(url) : 
  error in creating parser for
http://www.szrd.gov.cn/viewcommondbfc.do?id=728

How might be best to go about keeping the loop running so I can parse the
rest?

****************************************************

library(XML)

url_root<-"http://www.szrd.gov.cn/viewcommondbfc.do?id="

for(i in 700:750){
	url = paste(url_root, i, sep="")
	doc = htmlParse(url)
	
	tableNodes = getNodeSet(doc, "//table")
	tbl = readHTMLTable(tableNodes[[3]])
}
****************************************************

Steve Oliver
Department of Political Science
University of California at San Diego
9500 Gilman Dr.
La Jolla, CA 92092

--
View this message in context:
http://r.789695.n4.nabble.com/reading-tables-from-multiple-HTML-pages-tp3776605p3776605.html
Sent from the R help mailing list archive at Nabble.com.

Dennis Murphy

2011-Aug-29 18:39 UTC

head link

[R] reading tables from multiple HTML pages

?tryCatch

HTH,
Dennis

On Mon, Aug 29, 2011 at 9:04 AM, s1oliver <s1oliver at ucsd.edu>
wrote:> Hi, beginner to R and was having some problems scraping data from tables in
> html using the XML package. I have included some code below.
>
> I am trying to loop through a series of html pages, each of which contains
a
> single table from which I want to scrape data. However, some of the pages
> are blank - and so it throws me an error message when it gets to
> htmlParse(). The loop then closes out and I get the error message below:
>
> Error in htmlParse(url) :
> ?error in creating parser for
> http://www.szrd.gov.cn/viewcommondbfc.do?id=728
>
> How might be best to go about keeping the loop running so I can parse the
> rest?
>
> ****************************************************
>
> library(XML)
>
> url_root<-"http://www.szrd.gov.cn/viewcommondbfc.do?id="
>
> for(i in 700:750){
> ? ? ? ?url = paste(url_root, i, sep="")
> ? ? ? ?doc = htmlParse(url)
>
> ? ? ? ?tableNodes = getNodeSet(doc, "//table")
> ? ? ? ?tbl = readHTMLTable(tableNodes[[3]])
> }
> ****************************************************
>
> Steve Oliver
> Department of Political Science
> University of California at San Diego
> 9500 Gilman Dr.
> La Jolla, CA 92092
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/reading-tables-from-multiple-HTML-pages-tp3776605p3776605.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Aug 2011 - reading tables from multiple HTML pages

[R] reading tables from multiple HTML pages

[R] reading tables from multiple HTML pages

Seemingly Similar Threads