dear R user, I need a function that download the code of web page as html, to further parse it. something like>site="http://www.R-project.com" >code=function(site) >code!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>The R Project for Statistical Computing</title> <link rel="icon" href="favicon.ico" type="image/x-icon"> <link rel="shortcut icon" href="favicon.ico" type="image/x-icon"> <link rel="stylesheet" type="text/css" href="R.css"> </head> <FRAMESET cols="1*, 4*" border=0> <FRAMESET rows="120, 1*"> <FRAME src="logo.html" name="logo" frameborder=0> <FRAME src="navbar.html" name="contents" frameborder=0> </FRAMESET> <FRAME src="main.shtml" name="banner" frameborder=0> <noframes> <h1>The R Project for Statistical Computing</h1> Your browser seems not to support frames, here is the <A href="navbar.html">contents page</A> of the R Project's website. </noframes> </FRAMESET> is there any function that can perform similar task? thank you in advance. John [[alternative HTML version deleted]]
Try this: code <- readLines(site) On 13/01/2008, John Lande <john.lande77 at gmail.com> wrote:> dear R user, > > I need a function that download the code of web page as html, to further > parse it. > > something like > > >site="http://www.R-project.com" > >code=function(site) > >code > > !DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> > <html> > <head> > <title>The R Project for Statistical Computing</title> > <link rel="icon" href="favicon.ico" type="image/x-icon"> > <link rel="shortcut icon" href="favicon.ico" type="image/x-icon"> > <link rel="stylesheet" type="text/css" href="R.css"> > </head> > > <FRAMESET cols="1*, 4*" border=0> > <FRAMESET rows="120, 1*"> > <FRAME src="logo.html" name="logo" frameborder=0> > <FRAME src="navbar.html" name="contents" frameborder=0> > </FRAMESET> > <FRAME src="main.shtml" name="banner" frameborder=0> > <noframes> > <h1>The R Project for Statistical Computing</h1> > > Your browser seems not to support frames, > here is the <A href="navbar.html">contents page</A> of the R Project's > website. > > </noframes> > </FRAMESET> > > is there any function that can perform similar task? thank you in advance. > > John > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
See ?download.file ?url Omegahat package RCurl You do realize that the 'code of web page' is just what you download? E.g. (working example) readLines(url("http://www.r-project.org")) On Sun, 13 Jan 2008, John Lande wrote:> dear R user, > > I need a function that download the code of web page as html, to further > parse it. > > something like > >> site="http://www.R-project.com"Sic! Not a valid address.>> code=function(site) >> code > > !DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> > <html> > <head> > <title>The R Project for Statistical Computing</title> > <link rel="icon" href="favicon.ico" type="image/x-icon"> > <link rel="shortcut icon" href="favicon.ico" type="image/x-icon"> > <link rel="stylesheet" type="text/css" href="R.css"> > </head> > > <FRAMESET cols="1*, 4*" border=0> > <FRAMESET rows="120, 1*"> > <FRAME src="logo.html" name="logo" frameborder=0> > <FRAME src="navbar.html" name="contents" frameborder=0> > </FRAMESET> > <FRAME src="main.shtml" name="banner" frameborder=0> > <noframes> > <h1>The R Project for Statistical Computing</h1> > > Your browser seems not to support frames, > here is the <A href="navbar.html">contents page</A> of the R Project's > website. > > </noframes> > </FRAMESET> > > is there any function that can perform similar task? thank you in advance. > > John > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595