Hi all,
I am trying to download some tab separated data from the internet. The data
is not available directly at the URL that could be known apriori. There is
an intermediate form where start and end dates have to be given to get to
the required page.
For example, I want to download data for a station 03015795. The form for
this station is at:
http://ida.water.usgs.gov/ida/available_records.cfm?sn=03015795
I could get the start date and end date from this form using:
#
# Specifying station and reading from the opening form
stn<-"03015795"
myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="")
mypage1 = readLines(myurl)
# Getting the start and end dates
mypattern = '<td align="center">([^<]*)</td>'
datalines = grep(mypattern, mypage1[124], value=TRUE)
getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
gg = gregexpr(mypattern, datalines)
matches = mapply(getexpr,datalines,gg)
result = gsub(mypattern,'\\1',matches)
names(result)=NULL
mydates<-result[1:2]
I want to know how I can feed these start and end dates to the form and
execute the button to go to the data page and then to download the data,
either as displayed in the browser or by saving as a file.
Any help on this is most appreciated.
Thanks.
HC
--
View this message in context:
http://r.789695.n4.nabble.com/Downloading-tab-separated-data-from-internet-tp4152318p4152318.html
Sent from the R help mailing list archive at Nabble.com.
AFAICS what you mean is 'how can I fill in an HTML form using R'. Answer: use package RCurl. Do study the posting guide: none of the 'at a minimum' information was given here. On 03/12/2011 04:47, HC wrote:> Hi all, > > I am trying to download some tab separated data from the internet. The data > is not available directly at the URL that could be known apriori. There is > an intermediate form where start and end dates have to be given to get to > the required page. > > For example, I want to download data for a station 03015795. The form for > this station is at: > > http://ida.water.usgs.gov/ida/available_records.cfm?sn=03015795 > > I could get the start date and end date from this form using: > > # > # Specifying station and reading from the opening form > stn<-"03015795" > myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="") > mypage1 = readLines(myurl) > > # Getting the start and end dates > mypattern = '<td align="center">([^<]*)</td>' > datalines = grep(mypattern, mypage1[124], value=TRUE) > getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1) > gg = gregexpr(mypattern, datalines) > matches = mapply(getexpr,datalines,gg) > result = gsub(mypattern,'\\1',matches) > names(result)=NULL > mydates<-result[1:2] > > I want to know how I can feed these start and end dates to the form and > execute the button to go to the data page and then to download the data, > either as displayed in the browser or by saving as a file. > > Any help on this is most appreciated. > > Thanks. > HC-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks for your reply.
I tried to use the postForm function of RCurl as below but do not have much
clue as to how to go further with what it does.
library(RCurl)
stn<-"03015795"
myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="")
mypage1 = readLines(myurl)
# Getting the start and end dates
mypattern = '<td align="center">([^<]*)</td>'
datalines = grep(mypattern, mypage1[124], value=TRUE)
getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
gg = gregexpr(mypattern, datalines)
matches = mapply(getexpr,datalines,gg)
result = gsub(mypattern,'\\1',matches)
names(result)=NULL
mydates<-result[1:2]
result = postForm(myurl,fromdate=mydates[1], todate=mydates[2],rtype="Save
to File", submit1="Retrieve Data")
I tried to read the RCurl's documentation but do not know what functions I
should be using. Are there any examples available that could be helpful.
Could you point me to those please.
Thanks for the help.
HC
--
View this message in context:
http://r.789695.n4.nabble.com/Downloading-tab-separated-data-from-internet-tp4152318p4153063.html
Sent from the R help mailing list archive at Nabble.com.