Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon [[alternative HTML version deleted]]
Yes, there are. (Please see and follow the posting guide if you wish to obtain something more specific) Bert Gunter Genetech Nonclinical Statistics -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Am Stat Sent: Wednesday, August 01, 2007 2:19 PM To: r-help at stat.math.ethz.ch Subject: [R] Extracting a website text content using R Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon [[alternative HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
work with it as text. for text mining use: 1- http://wwwpeople.unil.ch/jean-pierre.mueller/ 2- tm by Ingo F. Am Stat wrote:> Dear useR, > > Just wandering whether it is possible that there is any function in R could > let me get the text contents for a certain website. > > Thanks a lot! > > Best, > > Leon > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
>-----Original Message----- >From: r-help-bounces at stat.math.ethz.ch on behalf of Am Stat >Sent: Wed 8/1/2007 2:19 PM >To: r-help at stat.math.ethz.ch >Subject: [R] Extracting a website text content using R>Dear useR,>Just wandering whether it is possible that there is any function in R could >let me get the text contents for a certain website.>Thanks a lot!>Best,>LeonIs this what you had in mind?> foo <- scan(url("http://cran.r-project.org/"), what = "character")Read 69 items> paste(unlist(foo), collapse = " ")[1] "<!DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN > <html> <head> <title>The Comprehensive R Archive Network</title> <link rel=\"icon\" href=\"favicon.ico\" type=\"image/x-icon\"> <link rel=\"shortcut icon\" href=\"favicon.ico\" type=\"image/x-icon\"> <link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\"> </head> <FRAMESET cols=\"1*, 4*\" border=0> <FRAMESET rows=\"120, 1*\"> <FRAME src=\"logo.html\" name=\"logo\" frameborder=0> <FRAME src=\"navbar.html\" name=\"contents\" frameborder=0> </FRAMESET> <FRAME src=\"banner.shtml\" name=\"banner\" frameborder=0> <noframes> <h1>The Comprehensive R Archive Network</h1> Your browser seems not to support frames, here is the <A href=\"navbar.html\">contents page</A> of CRAN. </noframes> </FRAMESET>" Try the search phrase cran scan url in Google for more hits on info about R functions that can deal with URLs. In R try> apropos("URL")[1] "contourLines" "URLdecode" "URLencode" "browseURL" "contrib.url" "main.help.url" "url.show" [8] "loadURL" "read.table.url" "scan.url" "source.url" "url" SteveM ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Perhaps more fun is> library(XML) > res = htmlTreeParse("http://www.omegahat.org/RSXML/", useInternalNodes=TRUE) > xpathApply(res, "//h1", xmlValue)[[1]] [1] "An XML package for the S language" Martin Quoting Steven McKinney <smckinney at bccrc.ca>:> > > >-----Original Message----- > >From: r-help-bounces at stat.math.ethz.ch on behalf of Am Stat > >Sent: Wed 8/1/2007 2:19 PM > >To: r-help at stat.math.ethz.ch > >Subject: [R] Extracting a website text content using R > > >Dear useR, > > >Just wandering whether it is possible that there is any function in R could > >let me get the text contents for a certain website. > > >Thanks a lot! > > >Best, > > >Leon > > > > > Is this what you had in mind? > > > foo <- scan(url("http://cran.r-project.org/"), what = "character") > Read 69 items > > paste(unlist(foo), collapse = " ") > [1] "<!DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN > <html> <head> <title>The > Comprehensive R Archive Network</title> <link rel=\"icon\" > href=\"favicon.ico\" type=\"image/x-icon\"> <link rel=\"shortcut icon\" > href=\"favicon.ico\" type=\"image/x-icon\"> <link rel=\"stylesheet\" > type=\"text/css\" href=\"R.css\"> </head> <FRAMESET cols=\"1*, 4*\" border=0> > <FRAMESET rows=\"120, 1*\"> <FRAME src=\"logo.html\" name=\"logo\" > frameborder=0> <FRAME src=\"navbar.html\" name=\"contents\" frameborder=0> > </FRAMESET> <FRAME src=\"banner.shtml\" name=\"banner\" frameborder=0> > <noframes> <h1>The Comprehensive R Archive Network</h1> Your browser seems > not to support frames, here is the <A href=\"navbar.html\">contents page</A> > of CRAN. </noframes> </FRAMESET>" > > > Try the search phrase > > cran scan url > > in Google for more hits on > info about R functions that > can deal with URLs. > > In R try > > > apropos("URL") > [1] "contourLines" "URLdecode" "URLencode" "browseURL" > "contrib.url" "main.help.url" "url.show" > [8] "loadURL" "read.table.url" "scan.url" "source.url" > "url" > > > SteveM > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >