On 11/4/10 2:39 AM, sayan dasgupta wrote:> Hi RUsers,
>
> Suppose I want to see the data on the website
> url <-
"http://www.nseindia.com/content/indices/ind_histvalues.htm"
>
> for the index "S&P CNX NIFTY" for
> dates
"FromDate"="01-11-2010","ToDate"="02-11-2010"
>
> then read the html table from the page using readHTMLtable()
>
> I am using this code
> webpage <- postForm(url,.params=list(
> "FromDate"="01-11-2010",
> "ToDate"="02-11-2010",
> "IndexType"="S&P CNX NIFTY",
> "Indicesdata"="Get Details"),
> .opts=list(useragent =
getOption("HTTPUserAgent")))
>
> But it doesn't give me desired result
You need to be more specific about how it fails to give the desired result.
You are in fact posting to the wrong URL. The form is submitted to a different
URL - http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp
>
> Also I was trying to use the function getHTMLFormDescription from the
> package RHTMLForms but there we can't use the argument
> .opts=list(useragent = getOption("HTTPUserAgent")) which is
needed for this
> particular website
That's not the case. The function RHTMLForms will generate for you does
support
the .opts parameter.
What you want is something along the lines:
# Set default options for RCurl
# requests
options(RCurlOptions = list(useragent = "R"))
library(RCurl)
# Read the HTML page since we cannot use htmlParse() directly
# as it does not specify the user agent or an
# Accept:*.*
url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
wp = getURLContent(url)
# Now that we have the page, parse it and use the RHTMLForms
# package to create an R function that will act as an interface
# to the form.
library(RHTMLForms)
library(XML)
doc = htmlParse(wp, asText = TRUE)
# need to set the URL for this document since we read it from
# text, rather than from the URL directly
docName(doc) = url
# Create the form description and generate the R
# function "call" the
form = getHTMLFormDescription(doc)[[1]]
fun = createFunction(form)
# now we can invoke the form from R. We only need 2
# inputs - FromDate and ToDate
o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010")
# Having looked at the tables, I think we want the the 3rd
# one.
table = readHTMLTable(htmlParse(o, asText = TRUE),
which = 3,
header = TRUE,
stringsAsFactors = FALSE)
table
Yes it is marginally involved. But that is because we cannot simply read
the HTML document directly from htmlParse() because the lack of Accept(&
useragent)
HTTP header.
>
>
> Thanks and Regards
> Sayan Dasgupta
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.