thr3ads.net - similar to: "Extracting text from html code using the RCurl package."

Displaying 20 results from an estimated 1000 matches similar to: "Extracting text from html code using the RCurl package."

Re ad HTML table

2007 Nov 18

Re ad HTML table

You can use htmlTreeParse and xpathApply from the XML library. something like: xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) xmlValue(x)) should do it. Gamma wrote: > > anyone care to explain how to read a html table, it's streaming data > (updated every second) and i am looking for a suitable function. > > The imported html

How to suppress errors from htmlTreeParse() function in XML package?

2008 Nov 04

How to suppress errors from htmlTreeParse() function in XML package?

Dear R-help, The following code downloads an html document into variable 'doc' and then stores an internal representation into variable 'html.tree'. Even if the html code is malformed, this still works which is fantastic. However, as in the example below, i do get some ouput from R in the console which i would like to suppress somehow, so i can keep my window a bit cleaner. I

RCurl unable to download a particular web page -- what is so special about this web page?

2009 Jan 26

RCurl unable to download a particular web page -- what is so special about this web page?

Dear R-help, There seems to be a web page I am unable to download using RCurl. I don't understand why it won't download: > library(RCurl) > my.url <- "http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2" > getURL(my.url) [1] "" Other web pages are ok to download but this is the first time I have been unable to download a

[BioC] Rcurl 0.8-1 update for bioconductor 2.7

2008 Dec 01

[BioC] Rcurl 0.8-1 update for bioconductor 2.7

Hi Patrick, Greetings from !(sunny) Pittsburgh. What's the scoop on RCurl on windows (XP)? I've tried to install RCurl_0.92-0.zip and RCurl_0.9-3.zip, with both R 2.7.2 and R 2.8.0 from the RGUI (utils:::menuInstallLocal), and get the error "Windows binary packages in zipfiles are not supported". which (according to google's one and only hit) comes from a perl script.

XML and RCurl: problem with encoding (htmlTreeParse)

2009 Dec 31

XML and RCurl: problem with encoding (htmlTreeParse)

Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,

RCurl cookiejar

2013 Aug 25

RCurl cookiejar

R-helpers, When I use cURL in the Terminal: curl --cookie-jar cookie.txt --url "http://corpusdelespanol.org/x.asp" --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20100101 Firefox/23.0" --location --include a cookie file "cookie.txt" is saved to my working directory. However, when I try what I think is the equivalent command R with RCurl:

Running R at a specific time - alternative to Sys.sleep() ?

2008 Oct 13

Running R at a specific time - alternative to Sys.sleep() ?

Dear R-Help, Is it possible to set R up to run a particular script at specific times of the day? trivial example: If the time is now 8:59:55am and I wish to run a function at 9am, I do the following: my.function <- function(x) { p1 <- proc.time() Sys.sleep(x) print('Hello R-Help!') proc.time() - p1 } my.function (5) [1] "Hello R-Help!" user system

RCurl::postForm() -- how does one determine what the names are of each form element in an online html form?

2008 Dec 09

RCurl::postForm() -- how does one determine what the names are of each form element in an online html form?

Dear R-Help, I am looking into using the Open Calais web service (http:// sws.clearforest.com/calaisViewer/) for text mining purposes. I would like to use R to post text into one of the forms on their website. In package RCurl, there is a function called postForm(). This sounds like it would do the job. Unfortunately the URL used in the example is no longer valid (i have emailed the maintainer

Installation error for RCurl in Redhat enterrpise 5

2008 Jul 25

Installation error for RCurl in Redhat enterrpise 5

I am getting the following error while trying to install the RCurl library. I have checked that the curl and the libcurl.so.3 is already installed in the /usr/bin > install.packages("RCurl") --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done trying URL 'http://cran.hostingzero.net/src/contrib/RCurl_0.9-3.tar.gz' Content type

Extraccion de datos de una Web

2016 Jan 18

Extraccion de datos de una Web

Buenas tardes, Quiero extraer datos de una web en la que ser relaciona la semana con la puntuación obtenida por un jugador. Ahora mismo llego a obtener elnodo en el que se relacionan la semana con la puntuación obtenida, pero no soy capaz de extraer esa informacion en una tabla de dos columna (semana, puntuacion) teniendo en cuenta que puede que haya semanas que no haya puntuado (en el ejemplo,

changing 'https' to 'http' when using download.file(), any side effects or just use RCurl?

2008 Oct 01

changing 'https' to 'http' when using download.file(), any side effects or just use RCurl?

Dear R-Help, >From reading the help file, it is my understanding the the download.file() function does not support HTTPS connections. So therefore, understandably, the follow produces an error: ### R Code > url <- "https://stat.ethz.ch/pipermail/r-help/2008-October/thread.html" > destfile <- "//PFO-SBS001/Redirected/tonyb/Desktop/R_web_test/tmp.txt" >

XML and RCurl: problem with encoding (htmlTreeParse)

2010 Jul 03

XML and RCurl: problem with encoding (htmlTreeParse)

Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |

Problem downloading webpages using batchfiles and RCurl from command line in Vista Basic - couldn't connect to host

2009 Jun 02

Problem downloading webpages using batchfiles and RCurl from command line in Vista Basic - couldn't connect to host

Dear all, I am having a problem downloading webpages through R when i run it in the DOS window under Windows Vista Basic. I have downloaded the batchfiles from http://code.google.com/p/batchfiles/ and have successfully set the PATH. I open up 'Command Prompt' in Vista and type (after the C:\...> stuff): ### START ### C:\Users\Karen>Rscript -e "library(RCurl);

Removing Embedded Null characters from text/html

2009 Oct 15

Removing Embedded Null characters from text/html

Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)

how to write html output (webscraped using RCurl package) into file?

2012 Apr 21

how to write html output (webscraped using RCurl package) into file?

i want "http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html",showing information in webpage to be written in .txt file as it is(i don't want any html tag) i am using "RCurl" package >marathi<-htmlTreeParse("http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html") >marathi

RCurl compilation error on ubuntu hardy

2008 Sep 17

RCurl compilation error on ubuntu hardy

Dear list members, I encountered this problem and the solution pointed out in a previous thread did not work for me. (e.g. install.packages("RCurl", repos = "http://www.omegahat.org/R") I work with Ubuntu Hardy, and installed R 2.6.2 via apt-get. I really need RCurl in order to use biomaRt ... any help would be greatly appreciated. Best wishes, Emmanuel

Using a FOR LOOP to name objects

2012 Feb 29

Using a FOR LOOP to name objects

Hello, I am trying to use a for loop to name objects in each iteraction. As in the following example (which doesn't work quite well) my_list<-c("A","B","C","D","E","F") for(i in c(1:length(my_list))){ url<- "http://finance.yahoo.com" doc = htmlTreeParse(url, useInternalNodes = T) tab_nodes = xpathApply(doc,

Analyzing Publications from Pubmed via XML

2007 Dec 14

Analyzing Publications from Pubmed via XML

I would like to track in which journals articles about a particular disease are being published. Creating a pubmed search is trivial. The search provides data but obviously not as an R dataframe. I can get the search to export the data as an xml feed and the xml package seems to be able to read it. xmlTreeParse("

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

2011 Oct 26

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

Greetings, I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <-

parsing Google search results

2009 Nov 16

parsing Google search results

Hi, how can I parse Google search results? The following code returns "integer(0)" instead of "1" although the results of the query clearly contain the regex "cran". #### address <- url("http://www.google.com/search?q=cran") open(address) lines <- readLines(address) grep("cran", lines[3]) #### Thanks Philip -- Philip Leifeld Max

similar to: Extracting text from html code using the RCurl package.