Displaying 20 results from an estimated 4000 matches similar to: "R as a web scraping tool using RCurl"
2012 May 14
3
Scraping a web page.
Folks,
I want to scrape a series of web-page sources for strings like the following:
"/en/Ships/A-8605507.html"
"/en/Ships/Aalborg-8122830.html"
which appear in an href inside an <a> tag inside a <div> tag inside a table.
In fact all I want is the (exactly) 7-digit number before ".html".
The good news is that as far as I can tell the the <a>
2012 Sep 19
1
scraping with session cookies
Hi, I am starting coding in r and one of the things that i want to do is to
scrape some data from the web.
The problem that I am having is that I cannot get passed the disclaimer
page (which produces a session cookie). I have been able to collect some
ideas and combine them in the code below but I dont get passed the
disclaimer page.
I am trying to agree the disclaimer with the postForm and write
2012 May 28
1
Rcurl, postForm()
Dear colleagues,
Could I get some assistance using postForm() to scrape the business names and addresses at this website:
http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx
I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic
2013 May 08
1
Dependencies of Imports not attached?
Encountered an error in scripting, which can be reproduced using Rscript as
follows:
$ Rscript -e "library(httr); handle('http://cran.r-project.org')"
Error in getCurlHandle(cookiefile = cookie_path, .defaults = list()) :
could not find function "getClass"
Calls: handle -> getCurlHandle
or by starting R without the methods package attached:
$
2011 May 06
0
My First Attempt at Screen Scraping with R
Hello Folks,
I'm working on trying to scrape my first web site and ran into a issue
because I'm really
don't know anything about regular expressions in R.
library(XML)
library(RCurl)
site <- "http://thisorthat.com/leader/month"
site.doc <- htmlParse(site, ?, xmlValue)
At the ?, I realize that I need to insert a regex command which will
decipher the contents of the
2012 Apr 16
1
grep and XML
Hi all:
I struggle a lot scraping web data. I still haven't got a handle on the XML package.
I'd like to get particular exchange rates from this table:
https://raw.github.com/currencybot/open-exchange-rates/master/latest.json
This is the code that I'm working with:
library(RCurl)
library(XML)
2007 Mar 21
2
problem with RCurl install on Unix
I am having trouble getting an install of RCurl to work properly on a
Unix server. The steps I have taken are:
1. installed cUrl from source without difficulty
2. installed RCurl from source using the command
~/R_HOME/R-devel/bin/R CMD INSTALL -l ~/R_HOME/R-devel/library
~/RCurl_0.8-0.tar.gz I received no errors during this install
3. when I go back to R and require(RCurl), I get
>
2007 Mar 21
2
problem with RCurl install on Unix
I am having trouble getting an install of RCurl to work properly on a
Unix server. The steps I have taken are:
1. installed cUrl from source without difficulty
2. installed RCurl from source using the command
~/R_HOME/R-devel/bin/R CMD INSTALL -l ~/R_HOME/R-devel/library
~/RCurl_0.8-0.tar.gz I received no errors during this install
3. when I go back to R and require(RCurl), I get
>
2008 Dec 01
1
[BioC] Rcurl 0.8-1 update for bioconductor 2.7
Hi Patrick,
Greetings from !(sunny) Pittsburgh.
What's the scoop on RCurl on windows (XP)?
I've tried to install RCurl_0.92-0.zip and RCurl_0.9-3.zip,
with both R 2.7.2 and R 2.8.0 from the RGUI (utils:::menuInstallLocal),
and get the error
"Windows binary packages in zipfiles are not supported".
which (according to google's one and only hit) comes from a perl script.
2006 Jan 27
1
Caching from screen scraping
Hi all,
I need to do some screen scraping from my rails app. Given an ethernet
(MAC) adress, I scrape results from an internal web page that returns
location and hostname. How can I cache the result from that screen
scraping as to be polite to the scrapee? I would like to expire the
results daily. In perl, I would use Cache::File. Can I use rails caching
for this? What''s the best
2008 Jul 25
1
Installation error for RCurl in Redhat enterrpise 5
I am getting the following error while trying to install the RCurl library. I have checked that the curl and the libcurl.so.3
is already installed in the /usr/bin
> install.packages("RCurl")
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
trying URL 'http://cran.hostingzero.net/src/contrib/RCurl_0.9-3.tar.gz'
Content type
2011 Apr 03
1
problem in install RCurl in R (Ubuntu Linux)
I have some problem in running R-cran's Demography package.
The hmd.mx function need Rcurl. I tried to install RCurl, but meet the following error:
*********************************************************************
...
* installing *source* package ?RCurl? ...
checking for curl-config... no
Cannot find curl-config
ERROR: configuration failed for package ?RCurl?
* removing
2011 Jun 06
1
RCurl and kerberos
Dear list,
I would like to call a Kerberos-authenticated web-service from within R.
Curl can do it:
$ curl --negotiate -u : "http://my.web.service/"
so I would expect that RCurl also has the capability, but I have not been able to find the correct options to set.
listCurlOptions() does not return anything with negotiate, and searching the source of RCurl, the only thing I found was
2014 Jan 02
2
Installing RCurl -
Dear all,
I am trying to install RCurl (because I want to install devtools) and to do so I've been informed that I must install one of the packages
libcurl4-openssl-dev
libcurl4-nss-dev
No matter which one I install I get the following error from R:
* installing *source* package ‘RCurl’ ...
** package ‘RCurl’ successfully unpacked and MD5 sums checked
checking for curl-config...
2012 Jun 07
1
How to set cookies in RCurl
Hi,
I am trying to access a website and read its content. The website is a
restricted access website that I access through a proxy server (which
therefore requires me to enable cookies). I have problems in allowing Rcurl
to receive and send cookies.
The following lines give me:
library(RCurl)
library(XML)
url <- "http://www.theurl.com"
content <- readHTMLTable(url)
content
2010 Dec 03
1
Problem installing RCurl
I have 64-bit R 2 12 0 installed on Solaris 10 of Sun Sparc. When I tried to install RCurl, it failed with the following lines,
...............
Version has CURLOPT_SSL_SESSIONID_CACHE
libcurl version: libcurl 7.19.6
configure: creating ./config.status
config.status: creating src/Makevars
** libs
cc -xc99 -m64 -xarch=sparcvis2 -I/apps/sparcv9/R-2.12.0/lib/R/include -I/opt/csw/include
2008 Sep 17
2
RCurl compilation error on ubuntu hardy
Dear list members,
I encountered this problem and the solution pointed out in a previous
thread did not work for me.
(e.g. install.packages("RCurl", repos = "http://www.omegahat.org/R")
I work with Ubuntu Hardy, and installed R 2.6.2 via apt-get.
I really need RCurl in order to use biomaRt ... any help would be
greatly appreciated.
Best wishes,
Emmanuel
2008 Aug 27
1
RCurl: using netrc with curlPerform
Hello,
I am having trouble getting the curlPerform function to authenticate
using the .netrc file. From the documentation I've read it
certainly seems as though this function should be able to authenticate
via the .netrc file.
The example I am using here comes from the "R as a Web Client- the RCurl
package" paper and demonstrates using the .netrc file to access the
2009 Feb 26
2
ftp fetch using RCurl?
Hi everyone,
I have to fetch about 300 to 500 zipped archives from a remote ftp server.
Each of the archive is about 1Mb. I know I can get it done by using
download.file() in R, but I am curious that is there a faster way to do this
using RCurl. For example, are there some parameters that I can set so that
the connection does not need to be rebuilt....etc.
A even simpler question is, how can I
2007 Oct 16
1
problem with RCurl 0.8-1 installation on Debian Etch
Dear R-Users,
I am having some trouble getting an installation of RCurl 0.8-1 to work
properly on a Debian (Etch) machine.
The command 'R CMD INSTALL RCurl_0.8-1.tar.gz' yields the following error:
Installing *source* package 'RCurl' ...
checking for curl-config... no
Cannot find curl-config
ERROR: configuration failed for package 'RCurl'
I do know that a file is