Hi all, I am trying to download some tab separated data from the internet. The data is not available directly at the URL that could be known apriori. There is an intermediate form where start and end dates have to be given to get to the required page. For example, I want to download data for a station 03015795. The form for this station is at: http://ida.water.usgs.gov/ida/available_records.cfm?sn=03015795 I could get the start date and end date from this form using: # # Specifying station and reading from the opening form stn<-"03015795" myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="") mypage1 = readLines(myurl) # Getting the start and end dates mypattern = '<td align="center">([^<]*)</td>' datalines = grep(mypattern, mypage1[124], value=TRUE) getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1) gg = gregexpr(mypattern, datalines) matches = mapply(getexpr,datalines,gg) result = gsub(mypattern,'\\1',matches) names(result)=NULL mydates<-result[1:2] I want to know how I can feed these start and end dates to the form and execute the button to go to the data page and then to download the data, either as displayed in the browser or by saving as a file. Any help on this is most appreciated. Thanks. HC -- View this message in context: http://r.789695.n4.nabble.com/Downloading-tab-separated-data-from-internet-tp4152318p4152318.html Sent from the R help mailing list archive at Nabble.com.
AFAICS what you mean is 'how can I fill in an HTML form using R'. Answer: use package RCurl. Do study the posting guide: none of the 'at a minimum' information was given here. On 03/12/2011 04:47, HC wrote:> Hi all, > > I am trying to download some tab separated data from the internet. The data > is not available directly at the URL that could be known apriori. There is > an intermediate form where start and end dates have to be given to get to > the required page. > > For example, I want to download data for a station 03015795. The form for > this station is at: > > http://ida.water.usgs.gov/ida/available_records.cfm?sn=03015795 > > I could get the start date and end date from this form using: > > # > # Specifying station and reading from the opening form > stn<-"03015795" > myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="") > mypage1 = readLines(myurl) > > # Getting the start and end dates > mypattern = '<td align="center">([^<]*)</td>' > datalines = grep(mypattern, mypage1[124], value=TRUE) > getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1) > gg = gregexpr(mypattern, datalines) > matches = mapply(getexpr,datalines,gg) > result = gsub(mypattern,'\\1',matches) > names(result)=NULL > mydates<-result[1:2] > > I want to know how I can feed these start and end dates to the form and > execute the button to go to the data page and then to download the data, > either as displayed in the browser or by saving as a file. > > Any help on this is most appreciated. > > Thanks. > HC-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks for your reply. I tried to use the postForm function of RCurl as below but do not have much clue as to how to go further with what it does. library(RCurl) stn<-"03015795" myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="") mypage1 = readLines(myurl) # Getting the start and end dates mypattern = '<td align="center">([^<]*)</td>' datalines = grep(mypattern, mypage1[124], value=TRUE) getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1) gg = gregexpr(mypattern, datalines) matches = mapply(getexpr,datalines,gg) result = gsub(mypattern,'\\1',matches) names(result)=NULL mydates<-result[1:2] result = postForm(myurl,fromdate=mydates[1], todate=mydates[2],rtype="Save to File", submit1="Retrieve Data") I tried to read the RCurl's documentation but do not know what functions I should be using. Are there any examples available that could be helpful. Could you point me to those please. Thanks for the help. HC -- View this message in context: http://r.789695.n4.nabble.com/Downloading-tab-separated-data-from-internet-tp4152318p4153063.html Sent from the R help mailing list archive at Nabble.com.