Displaying 20 results from an estimated 9000 matches similar to: "extracting tables from web pages?"
2012 Sep 19
1
scraping with session cookies
Hi, I am starting coding in r and one of the things that i want to do is to
scrape some data from the web.
The problem that I am having is that I cannot get passed the disclaimer
page (which produces a session cookie). I have been able to collect some
ideas and combine them in the code below but I dont get passed the
disclaimer page.
I am trying to agree the disclaimer with the postForm and write
2010 Aug 04
2
Finding the right url for RCurl
Hi all,
I am using RCurl to try and download data from a website, but I'm having
trouble finding out what URL to use. Here is the site:
http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX
See how in the upper right, above the displayed sheet, there's a link to
download the data as a .csv file? When I hit "copy url" and paste into
getURL in R, it doesn't
2009 Jan 26
2
RCurl unable to download a particular web page -- what is so special about this web page?
Dear R-help,
There seems to be a web page I am unable to download using RCurl. I
don't understand why it won't download:
> library(RCurl)
> my.url <- "http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2"
> getURL(my.url)
[1] ""
Other web pages are ok to download but this is the first time I have
been unable to download a
2016 Jun 21
2
Problemas con tildes y otros caracteres en R y RStudio
Hola.
Tengo algún tipo de problema con las tildes, a la hora de trabajar en R o
en RStudio, que no sé resolver.
Intentando reproducir en dos PCs distintos, ambos con Windows 7, uno de los
últimos ejercicios que ha publicado Carlos Gil Bellosta en su blog (
https://www.datanalytics.com/2016/06/20/6602-767-km-alrededor-de-espana-para-visitar-todas-sus-capitales-de-provincia/),
me ocurre que al
2013 Jan 15
1
readHTMLTable (XML package)
Hi,
I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
> library(XML)
> wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html',1)
Error in htmlParse(doc) :
File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist
Dan
2009 Jan 19
3
download/retain text file structure with RCurl/getURL()
Dear list,
I'm trying to download a text file directly from the internet using the RCurl package and the command getURL. Duncan Lang graciously helped me solve the first step in this problem using the following command:
#################
txtfile <- getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt',
ftp.use.epsv = FALSE)
#################
2016 Jun 21
2
Problemas con tildes y otros caracteres en R y RStudio
Hola, Carlos.
A mí también me resulta raro, porque no me pasa siempre. Es un poco
aleatorio. Imagino que habrá alguna razón, y que el código de la página
estará relacionado, claro, pero no consigo averiguar a qué se debe.
Además las pruebas que hago me dejan todavía más perplejo.
Esto
Encoding(capitales)
me dice que el encoding es "unknow", pero luego esto
validUTF8(capitales)
me
2013 Jul 23
2
downloading web content
Hello,
I am trying to use R to download a bunch of .csv files such as:
http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia
I have tried the following and neither work:
a<- getURL("
http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia")
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string:
and
2009 Sep 17
1
RCurl and Google Scholar's EndNote references
Hi!
I've performed a Google Scholar Search using a query, let's say "Frank
Harrell", and parsed the links to the EndNote references from the resulting
HTML code. Now I'd like to download all the references automatically. For
this, I have tried to use RCurl, but I can't seem to get it working: I
always get error code "403 Forbidden" from the web server.
2011 Jun 06
1
RCurl and kerberos
Dear list,
I would like to call a Kerberos-authenticated web-service from within R.
Curl can do it:
$ curl --negotiate -u : "http://my.web.service/"
so I would expect that RCurl also has the capability, but I have not been able to find the correct options to set.
listCurlOptions() does not return anything with negotiate, and searching the source of RCurl, the only thing I found was
2010 Nov 14
1
RCurl and cookies in POST requests
Hello.
I know that it's usually possible to write cookies to a cookie
file by removing the curl handle and doing a gc() call. I can do
this with getURL(), but I just can't obtain the same results with
postForm().
If I use:
curlHandle <- getCurlHandle(cookiefile=FILE, cookiejar=FILE)
and then do:
getURL(http://example.com/script.cgi, curl=curlHandle)
rm(curlHandle)
gc()
it's
2016 Jun 21
2
Problemas con tildes y otros caracteres en R y RStudio
Hola, Carlos.
Pues, efectivamente, me ha ayudado.
Sobre la utilización de la función geocode con ciudades con tilde, me dio
Carlos Gil Bellosta anteriormente la idea de utilizar iconv para
transformar la cadena de búsqueda a UTF-8, y yo lo utilicé para intentar
transformar el output de html_table sin resultado:
capitales <- read_html("
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi,
I'm trying to download some data from the web and am running into
problems with 'embedded null' characters. These seem to indicate to R
that it should stop processing the page so I'd like to remove them.
I've been looking around and can't seem to identify exactly what the
character is and consequently how to remove it.
# THE CODE WORKS ON THIS PAGE
library(RCurl)
2008 Aug 27
1
RCurl: using netrc with curlPerform
Hello,
I am having trouble getting the curlPerform function to authenticate
using the .netrc file. From the documentation I've read it
certainly seems as though this function should be able to authenticate
via the .netrc file.
The example I am using here comes from the "R as a Web Client- the RCurl
package" paper and demonstrates using the .netrc file to access the
2011 Mar 03
6
Developing a web crawler
Hi,
I wish to develop a web crawler in R. I have been using the functionalities
available under the RCurl package.
I am able to extract the html content of the site but i don't know how to go
about analyzing the html formatted document.
I wish to know the frequency of a word in the document. I am only acquainted
with analyzing data sets.
So how should i go about analyzing data that is not
2010 Jul 21
1
Command that is conditional upon file retrieval: is it possible?
Hi all,
I'm currently working on an R program where I have to access an FTP server
to download some of the data I need. However, the people who post up the
files I access are at times inconsistent with regards to time posted, if
they post at all, etc.... Here's some of the code I use:
library(RCurl)
url1 = paste("ftp://user:password at a.great.website.com/",
2013 Feb 21
4
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list.
Looks like the same issue in Russian:
library(RCurl)
library(XML)
u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1"
a = getURL(u)
a # Here - the Russian is fine.
a2 <- htmlParse(a)
a2 # Here it is a mess...
None of these seem to fix it:
htmlParse(a, encoding = "windows-1251")
htmlParse(a, encoding =
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All,
First method:-
>library(XML)
>theurl <- "http://home.sina.com"
>download.file(theurl, "tmp.html")
>txt <- readLines("tmp.html")
>txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes =
TRUE)
>g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
>head(grep(" ", g, value=T))
[1] " |
2013 Aug 25
2
RCurl cookiejar
R-helpers,
When I use cURL in the Terminal:
curl --cookie-jar cookie.txt --url "http://corpusdelespanol.org/x.asp" --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20100101 Firefox/23.0" --location --include
a cookie file "cookie.txt" is saved to my working directory. However, when I try what I think is the equivalent command R with RCurl:
2009 Jun 02
1
Problem downloading webpages using batchfiles and RCurl from command line in Vista Basic - couldn't connect to host
Dear all,
I am having a problem downloading webpages through R when i run it in
the DOS window under Windows Vista Basic. I have downloaded the
batchfiles from http://code.google.com/p/batchfiles/ and have
successfully set the PATH.
I open up 'Command Prompt' in Vista and type (after the C:\...>
stuff):
### START ###
C:\Users\Karen>Rscript -e "library(RCurl);