Hello all, I want to read a table from a given web page. If I do something like> str="http://www...." # this is the web address > aux1 <- url(str,open="rt")# open connection > aux2 <- readLines(aux1) # read web pageaux2 contains the html file. I want to extract the table from the html file. Is there a function html2R, the opposite of R2html? How should I do this? Thanks, Adrian
Duncan Murdoch
2003-May-06 15:17 UTC
[R] how to read a web page and extract an html table?
On Tue, 6 May 2003 07:31:29 -0700 (PDT), you wrote in message <20030506143129.33487.qmail at web12105.mail.yahoo.com>:>I want to extract the table from the html file. >Is there a function html2R, the opposite of R2html? >How should I do this?I don't think there is anything that does that, but the XML package (from CRAN) contains a function called htmlTreeParse should get you partway there. Duncan Murdoch
Pikounis, Bill
2003-May-06 15:24 UTC
[R] how to read a web page and extract an html table?
Adrian,> I want to extract the table from the html file. > Is there a function html2R, the opposite of R2html? > How should I do this?Parsing arbitrary HTML is generally a nontrivial task. I would recommend using something like Perl to convert the HTML to delimited ASCII, and then use read.table() for example. There are specific modules in Perl (for example) that can help with the "HTML-2-ASCII" step, if not do it entirely. I have never used one myself, but I am sure CPAN can be searched for one. Hope that helps, Bill ---------------------------------------- Bill Pikounis, Ph.D. Biometrics Research Department Merck Research Laboratories PO Box 2000, MailDrop RY84-16 126 E. Lincoln Avenue Rahway, New Jersey 07065-0900 USA v_bill_pikounis at merck.com Phone: 732 594 3913 Fax: 732 594 1565> -----Original Message----- > From: Adi Humbert [mailto:adrian_humbert at yahoo.com] > Sent: Tuesday, May 06, 2003 10:31 AM > To: r-help at stat.math.ethz.ch > Cc: adrian_humbert at yahoo.com > Subject: [R] how to read a web page and extract an html table? > > > Hello all, > > I want to read a table from a given web page. > > If I do something like > > str="http://www...." # this is the web address > > aux1 <- url(str,open="rt")# open connection > > aux2 <- readLines(aux1) # read web page > aux2 contains the html file. > > I want to extract the table from the html file. > Is there a function html2R, the opposite of R2html? > How should I do this? > > Thanks, > Adrian > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
Hi! On 06-May-2003 Adi Humbert wrote:> Hello all, > > I want to read a table from a given web page. > > If I do something like >> str="http://www...." # this is the web address >> aux1 <- url(str,open="rt")# open connection >> aux2 <- readLines(aux1) # read web page > aux2 contains the html file. > > I want to extract the table from the html file. > Is there a function html2R, the opposite of R2html? > How should I do this?I think the easiest way is using perl as preprocessor: http://www.devshed.com/Server_Side/Perl/DataMining/page3.html hope this helps, dst> > Thanks, > Adrian > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help"There is no way to peace, peace is the way." -- Ghandi Detlef Steuer --- http://fawn.unibw-hamburg.de/steuer.html ***** Encrypted mail preferred *****