Displaying 15 results from an estimated 15 matches for "geturlcont".
2009 Feb 26
2
ftp fetch using RCurl?
Hi everyone,
I have to fetch about 300 to 500 zipped archives from a remote ftp server.
Each of the archive is about 1Mb. I know I can get it done by using
download.file() in R, but I am curious that is there a faster way to do this
using RCurl. For example, are there some parameters that I can set so that
the connection does not need to be rebuilt....etc.
A even simpler question is, how can I
2012 May 28
1
Rcurl, postForm()
...his is probably a pretty basic question, but I need some help regardless. Yours, Simon Kiss
library(XML)
library(RCurl)
library(scrapeR)
library(RHTMLForms)
#Set URL
bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx')
#Scrape URL
orig<-getURLContent(url=bus)
#Parse doc
doc<-htmlParse(orig[[1]], asText=TRUE)
#Get The forms
forms<-getNodeSet(doc, "//form")
forms[[1]]
#These are the input nodes
getNodeSet(forms[[1]], ".//input")
#These are the select nodes
getNodeSet(forms[[1]], ".//select")
**************...
2012 Oct 18
1
Keep loop running after webpage times out?
Hi
I have created a loop to obtain data from several webpages
but the loop keeps crashing with the error
"Error in function (type, msg, asError = TRUE) :
Operation timed out after 5000 milliseconds with 9196 bytes received"
Page = getURLContent(page[i], followlocation=TRUE, curl = curl,.opts=list(
verbose = TRUE, timeout=5))
I am not sure how to keep the loop running after that error, so any help
would be appreciated
ps: I have played with the timeout option but it eventually crashes..
Thanks
--
View this message in context: ht...
2013 Jul 23
2
downloading web content
Hello,
I am trying to use R to download a bunch of .csv files such as:
http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia
I have tried the following and neither work:
a<- getURL("
http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia")
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string:
and
2009 Sep 17
1
RCurl and Google Scholar's EndNote references
...error code "403 Forbidden" from the web server.
Initially I tried to do this without using cookies:
library(RCurl)
getURL("
http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&output=citation&hl=fi&oe=ASCII&ct=citation&cd=0
")
or
getURLContent("
http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&output=citation&hl=fi&oe=ASCII&ct=citation&cd=0
")
Error: Forbidden
and then with cookies:
getURL("
http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&...
2010 Nov 04
3
postForm() in RCurl and library RHTMLForms
Hi RUsers,
Suppose I want to see the data on the website
url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
for the index "S&P CNX NIFTY" for
dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
then read the html table from the page using readHTMLtable()
I am using this code
webpage <-
2012 Oct 17
0
postForm() in RCurl and library RHTMLForms
...r="2012" ,month="August"
> I am not getting the table values.
>
> thanking you
> veepsirtt
> options(RCurlOptions = list(useragent = "R"))
> library(RCurl)
> url <- "http://www.bseindia.com/histdata/categorywise_turnover.asp"
> wp = getURLContent(url)
>
> library(RHTMLForms)
> library(XML)
> doc = htmlParse(wp, asText = TRUE)
> form = getHTMLFormDescription(doc)[[1]]
> fun = createFunction(form)
> o = fun(mmm = "9", yyy = "2012",url="
> http://www.bseindia.com/histdata/categorywise_turnov...
2010 Nov 10
3
RGoogleDocs stopped working
...or 0 to exit
1: getGoogleDocsConnection(login = gd.login, password = gd.password, service
= "wise", error = FALSE)
2: getGoogleAuth(..., error = error)
3: getForm("https://www.google.com/accounts/ClientLogin", accountType =
"HOSTED_OR_GOOGLE", Email = login, Passw
4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary = binary,
curl = curl)
5: stop.if.HTTP.error(http.header)
Selection: 4
Called from: eval(expr, envir, enclos)
Browse[1]> http.header
Content-Type
Cache-control Pragma
"text/p...
2011 Jan 25
1
Using open calais in R
I am using calais api in R for text analysis.
But im facing a some problem when fetching the rdf from the server.
I'm using the getToHost() method for the api call but i get just a null
string.
The same url in browser returns an RDF document.
>getToHost("www.api.opencalais.com",/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269g&content=Home¶msXML=")
>[1]
2012 Jun 07
1
How to set cookies in RCurl
Hi,
I am trying to access a website and read its content. The website is a
restricted access website that I access through a proxy server (which
therefore requires me to enable cookies). I have problems in allowing Rcurl
to receive and send cookies.
The following lines give me:
library(RCurl)
library(XML)
url <- "http://www.theurl.com"
content <- readHTMLTable(url)
content
2013 Mar 20
1
htmlParse (from XML library) working sporadically in the same code
I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse isĀ
http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0
Sometimes the following code works
n<-readHTMLTable(htmlParse(url))
But most of the
2012 Sep 19
1
scraping with session cookies
Hi, I am starting coding in r and one of the things that i want to do is to
scrape some data from the web.
The problem that I am having is that I cannot get passed the disclaimer
page (which produces a session cookie). I have been able to collect some
ideas and combine them in the code below but I dont get passed the
disclaimer page.
I am trying to agree the disclaimer with the postForm and write
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi,
I'm trying to download some data from the web and am running into
problems with 'embedded null' characters. These seem to indicate to R
that it should stop processing the page so I'd like to remove them.
I've been looking around and can't seem to identify exactly what the
character is and consequently how to remove it.
# THE CODE WORKS ON THIS PAGE
library(RCurl)
2012 May 14
3
Scraping a web page.
Folks,
I want to scrape a series of web-page sources for strings like the following:
"/en/Ships/A-8605507.html"
"/en/Ships/Aalborg-8122830.html"
which appear in an href inside an <a> tag inside a <div> tag inside a table.
In fact all I want is the (exactly) 7-digit number before ".html".
The good news is that as far as I can tell the the <a>
2010 Nov 24
0
4. Rexcel (Luis Felipe Parra)-how to run a code from excel
...ot;wise", error = FALSE)
>> >>> 2: getGoogleAuth(..., error = error)
>> >>> 3: getForm("https://www.google.com/accounts/ClientLogin", accountType
>> =
>> >>> "HOSTED_OR_GOOGLE", Email = login, Passw
>> >>> 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary =
>> >>> binary, curl = curl)
>> >>> 5: stop.if.HTTP.error(http.header)
>> >>>
>> >>> Selection: 4
>> >>> Called from: eval(expr, envir, enclos)
>> >>> Brow...