thr3ads.net - search: "geturlcont"

Displaying 15 results from an estimated 15 matches for "geturlcont".

2009 Feb 26

ftp fetch using RCurl?

Hi everyone, I have to fetch about 300 to 500 zipped archives from a remote ftp server. Each of the archive is about 1Mb. I know I can get it done by using download.file() in R, but I am curious that is there a faster way to do this using RCurl. For example, are there some parameters that I can set so that the connection does not need to be rebuilt....etc. A even simpler question is, how can I

Rcurl, postForm()

2012 May 28

Rcurl, postForm()

...his is probably a pretty basic question, but I need some help regardless. Yours, Simon Kiss library(XML) library(RCurl) library(scrapeR) library(RHTMLForms) #Set URL bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx') #Scrape URL orig<-getURLContent(url=bus) #Parse doc doc<-htmlParse(orig[[1]], asText=TRUE) #Get The forms forms<-getNodeSet(doc, "//form") forms[[1]] #These are the input nodes getNodeSet(forms[[1]], ".//input") #These are the select nodes getNodeSet(forms[[1]], ".//select") **************...

Keep loop running after webpage times out?

2012 Oct 18

Keep loop running after webpage times out?

Hi I have created a loop to obtain data from several webpages but the loop keeps crashing with the error "Error in function (type, msg, asError = TRUE) : Operation timed out after 5000 milliseconds with 9196 bytes received" Page = getURLContent(page[i], followlocation=TRUE, curl = curl,.opts=list( verbose = TRUE, timeout=5)) I am not sure how to keep the loop running after that error, so any help would be appreciated ps: I have played with the timeout option but it eventually crashes.. Thanks -- View this message in context: ht...

downloading web content

2013 Jul 23

downloading web content

Hello, I am trying to use R to download a bunch of .csv files such as: http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia I have tried the following and neither work: a<- getURL(" http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia") Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : embedded nul in string: and

RCurl and Google Scholar's EndNote references

2009 Sep 17

RCurl and Google Scholar's EndNote references

...error code "403 Forbidden" from the web server. Initially I tried to do this without using cookies: library(RCurl) getURL(" http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&output=citation&hl=fi&oe=ASCII&ct=citation&cd=0 ") or getURLContent(" http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&output=citation&hl=fi&oe=ASCII&ct=citation&cd=0 ") Error: Forbidden and then with cookies: getURL(" http://scholar.google.fi/scholar.enw?q=info:U6Gfb4QPVFMJ:scholar.google.com/&amp...

postForm() in RCurl and library RHTMLForms

2010 Nov 04

postForm() in RCurl and library RHTMLForms

Hi RUsers, Suppose I want to see the data on the website url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm" for the index "S&P CNX NIFTY" for dates "FromDate"="01-11-2010","ToDate"="02-11-2010" then read the html table from the page using readHTMLtable() I am using this code webpage <-

postForm() in RCurl and library RHTMLForms

2012 Oct 17

postForm() in RCurl and library RHTMLForms

...r="2012" ,month="August" > I am not getting the table values. > > thanking you > veepsirtt > options(RCurlOptions = list(useragent = "R")) > library(RCurl) > url <- "http://www.bseindia.com/histdata/categorywise_turnover.asp" > wp = getURLContent(url) > > library(RHTMLForms) > library(XML) > doc = htmlParse(wp, asText = TRUE) > form = getHTMLFormDescription(doc)[[1]] > fun = createFunction(form) > o = fun(mmm = "9", yyy = "2012",url=" > http://www.bseindia.com/histdata/categorywise_turnov...

RGoogleDocs stopped working

2010 Nov 10

RGoogleDocs stopped working

...or 0 to exit 1: getGoogleDocsConnection(login = gd.login, password = gd.password, service = "wise", error = FALSE) 2: getGoogleAuth(..., error = error) 3: getForm("https://www.google.com/accounts/ClientLogin", accountType = "HOSTED_OR_GOOGLE", Email = login, Passw 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary = binary, curl = curl) 5: stop.if.HTTP.error(http.header) Selection: 4 Called from: eval(expr, envir, enclos) Browse[1]> http.header Content-Type Cache-control Pragma "text/p...

Using open calais in R

2011 Jan 25

Using open calais in R

I am using calais api in R for text analysis. But im facing a some problem when fetching the rdf from the server. I'm using the getToHost() method for the api call but i get just a null string. The same url in browser returns an RDF document. >getToHost("www.api.opencalais.com",/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269g&content=Home&paramsXML=") >[1]

How to set cookies in RCurl

2012 Jun 07

How to set cookies in RCurl

Hi, I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url <- "http://www.theurl.com" content <- readHTMLTable(url) content

htmlParse (from XML library) working sporadically in the same code

2013 Mar 20

htmlParse (from XML library) working sporadically in the same code

I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse is http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0 Sometimes the following code works n<-readHTMLTable(htmlParse(url)) But most of the

scraping with session cookies

2012 Sep 19

scraping with session cookies

Hi, I am starting coding in r and one of the things that i want to do is to scrape some data from the web. The problem that I am having is that I cannot get passed the disclaimer page (which produces a session cookie). I have been able to collect some ideas and combine them in the code below but I dont get passed the disclaimer page. I am trying to agree the disclaimer with the postForm and write

Removing Embedded Null characters from text/html

2009 Oct 15

Removing Embedded Null characters from text/html

Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)

Scraping a web page.

2012 May 14

Scraping a web page.

Folks, I want to scrape a series of web-page sources for strings like the following: "/en/Ships/A-8605507.html" "/en/Ships/Aalborg-8122830.html" which appear in an href inside an <a> tag inside a <div> tag inside a table. In fact all I want is the (exactly) 7-digit number before ".html". The good news is that as far as I can tell the the <a>

4. Rexcel (Luis Felipe Parra)-how to run a code from excel

2010 Nov 24

4. Rexcel (Luis Felipe Parra)-how to run a code from excel

...ot;wise", error = FALSE) >> >>> 2: getGoogleAuth(..., error = error) >> >>> 3: getForm("https://www.google.com/accounts/ClientLogin", accountType >> = >> >>> "HOSTED_OR_GOOGLE", Email = login, Passw >> >>> 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary = >> >>> binary, curl = curl) >> >>> 5: stop.if.HTTP.error(http.header) >> >>> >> >>> Selection: 4 >> >>> Called from: eval(expr, envir, enclos) >> >>> Brow...

search for: geturlcont