similar to: extracting tables from web pages?

Displaying 20 results from an estimated 9000 matches similar to: "extracting tables from web pages?"

2012 Sep 19
1
scraping with session cookies
Hi, I am starting coding in r and one of the things that i want to do is to scrape some data from the web. The problem that I am having is that I cannot get passed the disclaimer page (which produces a session cookie). I have been able to collect some ideas and combine them in the code below but I dont get passed the disclaimer page. I am trying to agree the disclaimer with the postForm and write
2010 Aug 04
2
Finding the right url for RCurl
Hi all, I am using RCurl to try and download data from a website, but I'm having trouble finding out what URL to use. Here is the site: http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX See how in the upper right, above the displayed sheet, there's a link to download the data as a .csv file? When I hit "copy url" and paste into getURL in R, it doesn't
2009 Jan 26
2
RCurl unable to download a particular web page -- what is so special about this web page?
Dear R-help, There seems to be a web page I am unable to download using RCurl. I don't understand why it won't download: > library(RCurl) > my.url <- "http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2" > getURL(my.url) [1] "" Other web pages are ok to download but this is the first time I have been unable to download a
2016 Jun 21
2
Problemas con tildes y otros caracteres en R y RStudio
Hola. Tengo algún tipo de problema con las tildes, a la hora de trabajar en R o en RStudio, que no sé resolver. Intentando reproducir en dos PCs distintos, ambos con Windows 7, uno de los últimos ejercicios que ha publicado Carlos Gil Bellosta en su blog ( https://www.datanalytics.com/2016/06/20/6602-767-km-alrededor-de-espana-para-visitar-todas-sus-capitales-de-provincia/), me ocurre que al
2013 Jan 15
1
readHTMLTable (XML package)
Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that. > library(XML) > wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html',1) Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan
2009 Jan 19
3
download/retain text file structure with RCurl/getURL()
Dear list, I'm trying to download a text file directly from the internet using the RCurl package and the command getURL. Duncan Lang graciously helped me solve the first step in this problem using the following command: ################# txtfile <- getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt', ftp.use.epsv = FALSE) #################
2016 Jun 21
2
Problemas con tildes y otros caracteres en R y RStudio
Hola, Carlos. A mí también me resulta raro, porque no me pasa siempre. Es un poco aleatorio. Imagino que habrá alguna razón, y que el código de la página estará relacionado, claro, pero no consigo averiguar a qué se debe. Además las pruebas que hago me dejan todavía más perplejo. Esto Encoding(capitales) me dice que el encoding es "unknow", pero luego esto validUTF8(capitales) me
2013 Jul 23
2
downloading web content
Hello, I am trying to use R to download a bunch of .csv files such as: http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia I have tried the following and neither work: a<- getURL(" http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia") Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : embedded nul in string: and
2009 Sep 17
1
RCurl and Google Scholar's EndNote references
Hi! I've performed a Google Scholar Search using a query, let's say "Frank Harrell", and parsed the links to the EndNote references from the resulting HTML code. Now I'd like to download all the references automatically. For this, I have tried to use RCurl, but I can't seem to get it working: I always get error code "403 Forbidden" from the web server.
2011 Jun 06
1
RCurl and kerberos
Dear list, I would like to call a Kerberos-authenticated web-service from within R. Curl can do it: $ curl --negotiate -u : "http://my.web.service/" so I would expect that RCurl also has the capability, but I have not been able to find the correct options to set. listCurlOptions() does not return anything with negotiate, and searching the source of RCurl, the only thing I found was
2010 Nov 14
1
RCurl and cookies in POST requests
Hello. I know that it's usually possible to write cookies to a cookie file by removing the curl handle and doing a gc() call. I can do this with getURL(), but I just can't obtain the same results with postForm(). If I use: curlHandle <- getCurlHandle(cookiefile=FILE, cookiejar=FILE) and then do: getURL(http://example.com/script.cgi, curl=curlHandle) rm(curlHandle) gc() it's
2016 Jun 21
2
Problemas con tildes y otros caracteres en R y RStudio
Hola, Carlos. Pues, efectivamente, me ha ayudado. Sobre la utilización de la función geocode con ciudades con tilde, me dio Carlos Gil Bellosta anteriormente la idea de utilizar iconv para transformar la cadena de búsqueda a UTF-8, y yo lo utilicé para intentar transformar el output de html_table sin resultado: capitales <- read_html("
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)
2008 Aug 27
1
RCurl: using netrc with curlPerform
Hello, I am having trouble getting the curlPerform function to authenticate using the .netrc file. From the documentation I've read it certainly seems as though this function should be able to authenticate via the .netrc file. The example I am using here comes from the "R as a Web Client- the RCurl package" paper and demonstrates using the .netrc file to access the
2011 Mar 03
6
Developing a web crawler
Hi, I wish to develop a web crawler in R. I have been using the functionalities available under the RCurl package. I am able to extract the html content of the site but i don't know how to go about analyzing the html formatted document. I wish to know the frequency of a word in the document. I am only acquainted with analyzing data sets. So how should i go about analyzing data that is not
2010 Jul 21
1
Command that is conditional upon file retrieval: is it possible?
Hi all, I'm currently working on an R program where I have to access an FTP server to download some of the data I need. However, the people who post up the files I access are at times inconsistent with regards to time posted, if they post at all, etc.... Here's some of the code I use: library(RCurl) url1 = paste("ftp://user:password at a.great.website.com/",
2013 Feb 21
4
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list. Looks like the same issue in Russian: library(RCurl) library(XML) u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1" a = getURL(u) a # Here - the Russian is fine. a2 <- htmlParse(a) a2 # Here it is a mess... None of these seem to fix it: htmlParse(a, encoding = "windows-1251") htmlParse(a, encoding =
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |
2013 Aug 25
2
RCurl cookiejar
R-helpers, When I use cURL in the Terminal: curl --cookie-jar cookie.txt --url "http://corpusdelespanol.org/x.asp" --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20100101 Firefox/23.0" --location --include a cookie file "cookie.txt" is saved to my working directory. However, when I try what I think is the equivalent command R with RCurl:
2009 Jun 02
1
Problem downloading webpages using batchfiles and RCurl from command line in Vista Basic - couldn't connect to host
Dear all, I am having a problem downloading webpages through R when i run it in the DOS window under Windows Vista Basic. I have downloaded the batchfiles from http://code.google.com/p/batchfiles/ and have successfully set the PATH. I open up 'Command Prompt' in Vista and type (after the C:\...> stuff): ### START ### C:\Users\Karen>Rscript -e "library(RCurl);