similar to: Extracting text from html code using the RCurl package.

Displaying 20 results from an estimated 1000 matches similar to: "Extracting text from html code using the RCurl package."

2007 Nov 18
4
Re ad HTML table
You can use htmlTreeParse and xpathApply from the XML library. something like: xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) xmlValue(x)) should do it. Gamma wrote: > > anyone care to explain how to read a html table, it's streaming data > (updated every second) and i am looking for a suitable function. > > The imported html
2008 Nov 04
2
How to suppress errors from htmlTreeParse() function in XML package?
Dear R-help, The following code downloads an html document into variable 'doc' and then stores an internal representation into variable 'html.tree'. Even if the html code is malformed, this still works which is fantastic. However, as in the example below, i do get some ouput from R in the console which i would like to suppress somehow, so i can keep my window a bit cleaner. I
2009 Jan 26
2
RCurl unable to download a particular web page -- what is so special about this web page?
Dear R-help, There seems to be a web page I am unable to download using RCurl. I don't understand why it won't download: > library(RCurl) > my.url <- "http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2" > getURL(my.url) [1] "" Other web pages are ok to download but this is the first time I have been unable to download a
2008 Dec 01
1
[BioC] Rcurl 0.8-1 update for bioconductor 2.7
Hi Patrick, Greetings from !(sunny) Pittsburgh. What's the scoop on RCurl on windows (XP)? I've tried to install RCurl_0.92-0.zip and RCurl_0.9-3.zip, with both R 2.7.2 and R 2.8.0 from the RGUI (utils:::menuInstallLocal), and get the error "Windows binary packages in zipfiles are not supported". which (according to google's one and only hit) comes from a perl script.
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,
2013 Aug 25
2
RCurl cookiejar
R-helpers, When I use cURL in the Terminal: curl --cookie-jar cookie.txt --url "http://corpusdelespanol.org/x.asp" --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20100101 Firefox/23.0" --location --include a cookie file "cookie.txt" is saved to my working directory. However, when I try what I think is the equivalent command R with RCurl:
2008 Oct 13
1
Running R at a specific time - alternative to Sys.sleep() ?
Dear R-Help, Is it possible to set R up to run a particular script at specific times of the day? trivial example: If the time is now 8:59:55am and I wish to run a function at 9am, I do the following: my.function <- function(x) { p1 <- proc.time() Sys.sleep(x) print('Hello R-Help!') proc.time() - p1 } my.function (5) [1] "Hello R-Help!" user system
2008 Dec 09
1
RCurl::postForm() -- how does one determine what the names are of each form element in an online html form?
Dear R-Help, I am looking into using the Open Calais web service (http:// sws.clearforest.com/calaisViewer/) for text mining purposes. I would like to use R to post text into one of the forms on their website. In package RCurl, there is a function called postForm(). This sounds like it would do the job. Unfortunately the URL used in the example is no longer valid (i have emailed the maintainer
2008 Jul 25
1
Installation error for RCurl in Redhat enterrpise 5
I am getting the following error while trying to install the RCurl library. I have checked that the curl and the libcurl.so.3 is already installed in the /usr/bin > install.packages("RCurl") --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done trying URL 'http://cran.hostingzero.net/src/contrib/RCurl_0.9-3.tar.gz' Content type
2016 Jan 18
3
Extraccion de datos de una Web
Buenas tardes, Quiero extraer datos de una web en la que ser relaciona la semana con la puntuaciĆ³n obtenida por un jugador. Ahora mismo llego a obtener elnodo en el que se relacionan la semana con la puntuaciĆ³n obtenida, pero no soy capaz de extraer esa informacion en una tabla de dos columna (semana, puntuacion) teniendo en cuenta que puede que haya semanas que no haya puntuado (en el ejemplo,
2008 Oct 01
1
changing 'https' to 'http' when using download.file(), any side effects or just use RCurl?
Dear R-Help, >From reading the help file, it is my understanding the the download.file() function does not support HTTPS connections. So therefore, understandably, the follow produces an error: ### R Code > url <- "https://stat.ethz.ch/pipermail/r-help/2008-October/thread.html" > destfile <- "//PFO-SBS001/Redirected/tonyb/Desktop/R_web_test/tmp.txt" >
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |
2009 Jun 02
1
Problem downloading webpages using batchfiles and RCurl from command line in Vista Basic - couldn't connect to host
Dear all, I am having a problem downloading webpages through R when i run it in the DOS window under Windows Vista Basic. I have downloaded the batchfiles from http://code.google.com/p/batchfiles/ and have successfully set the PATH. I open up 'Command Prompt' in Vista and type (after the C:\...> stuff): ### START ### C:\Users\Karen>Rscript -e "library(RCurl);
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)
2012 Apr 21
1
how to write html output (webscraped using RCurl package) into file?
i want "http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html",showing information in webpage to be written in .txt file as it is(i don't want any html tag) i am using "RCurl" package >marathi<-htmlTreeParse("http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html") >marathi
2008 Sep 17
2
RCurl compilation error on ubuntu hardy
Dear list members, I encountered this problem and the solution pointed out in a previous thread did not work for me. (e.g. install.packages("RCurl", repos = "http://www.omegahat.org/R") I work with Ubuntu Hardy, and installed R 2.6.2 via apt-get. I really need RCurl in order to use biomaRt ... any help would be greatly appreciated. Best wishes, Emmanuel
2012 Feb 29
2
Using a FOR LOOP to name objects
Hello, I am trying to use a for loop to name objects in each iteraction. As in the following example (which doesn't work quite well) my_list<-c("A","B","C","D","E","F") for(i in c(1:length(my_list))){ url<- "http://finance.yahoo.com" doc = htmlTreeParse(url, useInternalNodes = T) tab_nodes = xpathApply(doc,
2007 Dec 14
6
Analyzing Publications from Pubmed via XML
I would like to track in which journals articles about a particular disease are being published. Creating a pubmed search is trivial. The search provides data but obviously not as an R dataframe. I can get the search to export the data as an xml feed and the xml package seems to be able to read it. xmlTreeParse("
2011 Oct 26
1
Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?
Greetings, I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <-
2009 Nov 16
2
parsing Google search results
Hi, how can I parse Google search results? The following code returns "integer(0)" instead of "1" although the results of the query clearly contain the regex "cran". #### address <- url("http://www.google.com/search?q=cran") open(address) lines <- readLines(address) grep("cran", lines[3]) #### Thanks Philip -- Philip Leifeld Max