search for: geturlcont

Displaying 15 results from an estimated 15 matches for "geturlcont".

2009 Feb 26
2
ftp fetch using RCurl?
Hi everyone, I have to fetch about 300 to 500 zipped archives from a remote ftp server. Each of the archive is about 1Mb. I know I can get it done by using download.file() in R, but I am curious that is there a faster way to do this using RCurl. For example, are there some parameters that I can set so that the connection does not need to be rebuilt....etc. A even simpler question is, how can I
2012 May 28
1
Rcurl, postForm()
...his is probably a pretty basic question, but I need some help regardless. Yours, Simon Kiss library(XML) library(RCurl) library(scrapeR) library(RHTMLForms) #Set URL bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx') #Scrape URL orig<-getURLContent(url=bus) #Parse doc doc<-htmlParse(orig[[1]], asText=TRUE) #Get The forms forms<-getNodeSet(doc, "//form") forms[[1]] #These are the input nodes getNodeSet(forms[[1]], ".//input") #These are the select nodes getNodeSet(forms[[1]], ".//select") **************...
2012 Oct 18
1
Keep loop running after webpage times out?
Hi I have created a loop to obtain data from several webpages but the loop keeps crashing with the error "Error in function (type, msg, asError = TRUE) : Operation timed out after 5000 milliseconds with 9196 bytes received" Page = getURLContent(page[i], followlocation=TRUE, curl = curl,.opts=list( verbose = TRUE, timeout=5)) I am not sure how to keep the loop running after that error, so any help would be appreciated ps: I have played with the timeout option but it eventually crashes.. Thanks -- View this message in context: ht...
2013 Jul 23
2
downloading web content
Hello, I am trying to use R to download a bunch of .csv files such as: http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia I have tried the following and neither work: a<- getURL(" http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia") Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : embedded nul in string: and
2009 Sep 17
1
RCurl and Google Scholar's EndNote references
...error code "403 Forbidden" from the web server. Initially I tried to do this without using cookies: library(RCurl) getURL(" http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&output=citation&hl=fi&oe=ASCII&ct=citation&cd=0 ") or getURLContent(" http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&output=citation&hl=fi&oe=ASCII&ct=citation&cd=0 ") Error: Forbidden and then with cookies: getURL(" http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&amp...
2010 Nov 04
3
postForm() in RCurl and library RHTMLForms
Hi RUsers, Suppose I want to see the data on the website url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm" for the index "S&P CNX NIFTY" for dates "FromDate"="01-11-2010","ToDate"="02-11-2010" then read the html table from the page using readHTMLtable() I am using this code webpage <-
2012 Oct 17
0
postForm() in RCurl and library RHTMLForms
...r="2012" ,month="August" > I am not getting the table values. > > thanking you > veepsirtt > options(RCurlOptions = list(useragent = "R")) > library(RCurl) > url <- "http://www.bseindia.com/histdata/categorywise_turnover.asp" > wp = getURLContent(url) > > library(RHTMLForms) > library(XML) > doc = htmlParse(wp, asText = TRUE) > form = getHTMLFormDescription(doc)[[1]] > fun = createFunction(form) > o = fun(mmm = "9", yyy = "2012",url=" > http://www.bseindia.com/histdata/categorywise_turnov...
2010 Nov 10
3
RGoogleDocs stopped working
...or 0 to exit 1: getGoogleDocsConnection(login = gd.login, password = gd.password, service = "wise", error = FALSE) 2: getGoogleAuth(..., error = error) 3: getForm("https://www.google.com/accounts/ClientLogin", accountType = "HOSTED_OR_GOOGLE", Email = login, Passw 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary = binary, curl = curl) 5: stop.if.HTTP.error(http.header) Selection: 4 Called from: eval(expr, envir, enclos) Browse[1]> http.header Content-Type Cache-control Pragma "text/p...
2011 Jan 25
1
Using open calais in R
I am using calais api in R for text analysis. But im facing a some problem when fetching the rdf from the server. I'm using the getToHost() method for the api call but i get just a null string. The same url in browser returns an RDF document. >getToHost("www.api.opencalais.com",/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269g&content=Home&paramsXML=") >[1]
2012 Jun 07
1
How to set cookies in RCurl
Hi, I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url <- "http://www.theurl.com" content <- readHTMLTable(url) content
2013 Mar 20
1
htmlParse (from XML library) working sporadically in the same code
I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse isĀ  http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0 Sometimes the following code works n<-readHTMLTable(htmlParse(url)) But most of the
2012 Sep 19
1
scraping with session cookies
Hi, I am starting coding in r and one of the things that i want to do is to scrape some data from the web. The problem that I am having is that I cannot get passed the disclaimer page (which produces a session cookie). I have been able to collect some ideas and combine them in the code below but I dont get passed the disclaimer page. I am trying to agree the disclaimer with the postForm and write
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)
2012 May 14
3
Scraping a web page.
Folks, I want to scrape a series of web-page sources for strings like the following: "/en/Ships/A-8605507.html" "/en/Ships/Aalborg-8122830.html" which appear in an href inside an <a> tag inside a <div> tag inside a table. In fact all I want is the (exactly) 7-digit number before ".html". The good news is that as far as I can tell the the <a>
2010 Nov 24
0
4. Rexcel (Luis Felipe Parra)-how to run a code from excel
...ot;wise", error = FALSE) >> >>> 2: getGoogleAuth(..., error = error) >> >>> 3: getForm("https://www.google.com/accounts/ClientLogin", accountType >> = >> >>> "HOSTED_OR_GOOGLE", Email = login, Passw >> >>> 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary = >> >>> binary, curl = curl) >> >>> 5: stop.if.HTTP.error(http.header) >> >>> >> >>> Selection: 4 >> >>> Called from: eval(expr, envir, enclos) >> >>> Brow...