Hi RUsers,
Suppose I want to see the data on the website
url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
for the index "S&P CNX NIFTY" for
dates
"FromDate"="01-11-2010","ToDate"="02-11-2010"
then read the html table from the page using readHTMLtable()
I am using this code
webpage <- postForm(url,.params=list(
"FromDate"="01-11-2010",
"ToDate"="02-11-2010",
"IndexType"="S&P CNX NIFTY",
"Indicesdata"="Get Details"),
.opts=list(useragent = getOption("HTTPUserAgent")))
But it doesn't give me desired result
Also I was trying to use the function getHTMLFormDescription from the
package RHTMLForms but there we can't use the argument
.opts=list(useragent = getOption("HTTPUserAgent")) which is needed for
this
particular website
Thanks and Regards
Sayan Dasgupta
[[alternative HTML version deleted]]
I don?t have the implementation in the way you want it
. Sorry
but
someone here will definitely know
The group showed me to do it this way though
.
library(zoo)
library("RCurl")
sNiftyURL
"http://nseindia.com/content/indices/histdata/S&P%20CNX%20NIFTY01-01-2000-02
-11-2010.csv"
Nifty_Dat = getURLContent(sNiftyURL, verbose = TRUE, useragent
getOption("HTTPUserAgent"))
tblNifty <- read.csv(textConnection(Nifty_Dat))
tblNifty <- subset(tblNifty,select=c(Date,Close))
tblNifty$Date <- as.Date(tblNifty$Date, format ="%d-%b-%Y")
tblNifty <-read.zoo((tblNifty))
closeAllConnections()
HTH.
S
From: sayan dasgupta [mailto:kittudg at gmail.com]
Sent: 04 November 2010 15:09
To: r-help at r-project.org
Cc: duncan at wald.ucdavis.edu; santosh.srinivas at gmail.com
Subject: postForm() in RCurl and library RHTMLForms
Hi RUsers,
Suppose I want to see the data on the website?
url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
for the index?"S&P CNX NIFTY" for
dates?"FromDate"="01-11-2010","ToDate"="02-11-2010"
then read the html table from the page using readHTMLtable()
I am using this code?
webpage <- postForm(url,.params=list(
?? ? ? ? ? ? ? ? ? ? ? "FromDate"="01-11-2010",
?? ? ? ? ? ? ? ? ? ? ? "ToDate"="02-11-2010",
?? ? ? ? ? ? ? ? ? ? ? "IndexType"="S&P CNX NIFTY",
?? ? ? ? ? ? ? ? ? ? ? "Indicesdata"="Get Details"),
?? ? ? ? ? ? ? ? .opts=list(useragent = getOption("HTTPUserAgent")))
But it doesn't give me desired result?
Also I was trying to use the function?getHTMLFormDescription from the
package?RHTMLForms but there we can't use the argument?
.opts=list(useragent = getOption("HTTPUserAgent")) which is needed for
this
particular website?
Thanks and Regards
Sayan Dasgupta
On 11/4/10 2:39 AM, sayan dasgupta wrote:> Hi RUsers, > > Suppose I want to see the data on the website > url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm" > > for the index "S&P CNX NIFTY" for > dates "FromDate"="01-11-2010","ToDate"="02-11-2010" > > then read the html table from the page using readHTMLtable() > > I am using this code > webpage <- postForm(url,.params=list( > "FromDate"="01-11-2010", > "ToDate"="02-11-2010", > "IndexType"="S&P CNX NIFTY", > "Indicesdata"="Get Details"), > .opts=list(useragent = getOption("HTTPUserAgent"))) > > But it doesn't give me desired resultYou need to be more specific about how it fails to give the desired result. You are in fact posting to the wrong URL. The form is submitted to a different URL - http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp> > Also I was trying to use the function getHTMLFormDescription from the > package RHTMLForms but there we can't use the argument > .opts=list(useragent = getOption("HTTPUserAgent")) which is needed for this > particular websiteThat's not the case. The function RHTMLForms will generate for you does support the .opts parameter. What you want is something along the lines: # Set default options for RCurl # requests options(RCurlOptions = list(useragent = "R")) library(RCurl) # Read the HTML page since we cannot use htmlParse() directly # as it does not specify the user agent or an # Accept:*.* url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm" wp = getURLContent(url) # Now that we have the page, parse it and use the RHTMLForms # package to create an R function that will act as an interface # to the form. library(RHTMLForms) library(XML) doc = htmlParse(wp, asText = TRUE) # need to set the URL for this document since we read it from # text, rather than from the URL directly docName(doc) = url # Create the form description and generate the R # function "call" the form = getHTMLFormDescription(doc)[[1]] fun = createFunction(form) # now we can invoke the form from R. We only need 2 # inputs - FromDate and ToDate o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010") # Having looked at the tables, I think we want the the 3rd # one. table = readHTMLTable(htmlParse(o, asText = TRUE), which = 3, header = TRUE, stringsAsFactors = FALSE) table Yes it is marginally involved. But that is because we cannot simply read the HTML document directly from htmlParse() because the lack of Accept(& useragent) HTTP header.> > > Thanks and Regards > Sayan Dasgupta > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Why I am getting this error? Error in getHTMLFormDescription(docNifty)[[1]] : subscript out of bounds -- View this message in context: http://r.789695.n4.nabble.com/postForm-in-RCurl-and-library-RHTMLForms-tp3026742p4650636.html Sent from the R help mailing list archive at Nabble.com.