zack holden
2009-Jan-19 18:26 UTC
[R] download/retain text file structure with RCurl/getURL()
Dear list, I'm trying to download a text file directly from the internet using the RCurl package and the command getURL. Duncan Lang graciously helped me solve the first step in this problem using the following command: ################# txtfile <- getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt', ftp.use.epsv = FALSE) ################# This brings the text file into R in a single long character string. I've spent many hours now trying to bring this text file into R into a sensible form. I've tried every variant of different commands in getURL help file, as well as different strsplit() commands to try to break this character string into a sensible rows and columns, to no avail. Can anyone suggest a solution for doing this? I suspect there is a getURL command I'm missing. Alternatively, do I really have to break this long character string into rows and columns that I can then assemble into a table? I'd be grateful for any advice. Thanks in advance, Zack
Gabor Grothendieck
2009-Jan-19 18:38 UTC
[R] download/retain text file structure with RCurl/getURL()
If you are having problems with the default download.file method you can try method = "wget": f <- "ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt" download.file(f, basename(f), method = "wget") On Mon, Jan 19, 2009 at 1:26 PM, zack holden <zack_holden at hotmail.com> wrote:> > Dear list, > > I'm trying to download a text file directly from the internet using the RCurl package and the command getURL. Duncan Lang graciously helped me solve the first step in this problem using the following command: > > ################# > txtfile <- getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt', > ftp.use.epsv = FALSE) > ################# > > This brings the text file into R in a single long character string. I've spent many hours now trying to bring this text file into R into a sensible form. I've tried every variant of different commands in getURL help file, as well as different > strsplit() commands to try to break this character string into a sensible rows and columns, to no avail. > > Can anyone suggest a solution for doing this? I suspect there is a getURL command I'm missing. Alternatively, do I really have to break this long character string into rows and columns that I can then assemble into a table? > > I'd be grateful for any advice. > > Thanks in advance, > > Zack > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
David Winsemius
2009-Jan-19 19:52 UTC
[R] download/retain text file structure with RCurl/getURL()
It's a fixed width format, with irregular entries, perhaps something along the lines of: read.fwf(textConnection(txtfile), skip = 8, # skips the header widths = <column widths vector>, colnames= <colnames> , nrows=48 ) #drops the trailing summary text perhaps : widths = c(2, -1, 1, -1 ,4, -1, 3 .... the rest # the -col entries drop the white-space names = c("year","card", "Jan.date", "Jan.dep" ..... the rest Just the first few columns seem to come in acceptably, although the lines with all NA's will need to be deleted: > read.fwf(textConnection(txtfile), skip = 8, # skips the header + widths = c(2, -1, 1, -1 ,4, -1, 3), # the -col entries drop the white-space + col.names = c("year","card", "Jan.date", "Jan.dep"), nrows=48 ) year card Jan.date Jan.dep 1 61 1 E/ST NA 2 62 1 E/ST NA 3 63 1 K/31 15 4 64 1 K/30 12 5 NA NA <NA> NA 6 65 1 E/ST NA 7 66 1 1/07 17 8 67 1 E/ST NA 9 68 1 K/28 12 10 69 1 K/31 22 11 NA NA <NA> NA 12 70 1 K/30 16 13 71 1 K/29 28 14 72 1 K/28 32 15 73 1 1/02 16 snip -- David Winsemius On Jan 19, 2009, at 1:26 PM, zack holden wrote:> > Dear list, > > I'm trying to download a text file directly from the internet using > the RCurl package and the command getURL. Duncan Lang graciously > helped me solve the first step in this problem using the following > command: > > ################# > txtfile <- getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt' > , > ftp.use.epsv = FALSE) > ################# > > This brings the text file into R in a single long character string. > I've spent many hours now trying to bring this text file into R into > a sensible form. I've tried every variant of different commands in > getURL help file, as well as different > strsplit() commands to try to break this character string into a > sensible rows and columns, to no avail. > > Can anyone suggest a solution for doing this? I suspect there is a > getURL command I'm missing. Alternatively, do I really have to break > this long character string into rows and columns that I can then > assemble into a table? > > I'd be grateful for any advice. > > Thanks in advance, > > Zack > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
zack holden
2009-Jan-21 15:53 UTC
[R] download/retain text file structure with RCurl/getURL(): Solution
Dear list, I'm posting the solution to my problem in case others may find this useful. This code was sent to me by Phil Spector. With a bit of cleaning, it can easily be converted to a usable format. Thanks to Gabor Grothendieck, David winsemius and Martin Morgan for also sending possible solutions. Thank you all for taking the time to help, I would not have solved this on my own. ############################################### require(RCurl) txtfile = getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt',ftp.use.epsv = FALSE) txtvec = strsplit(txtfile,'\n')[[1]] widths = c(4,rep(c(5,4,6),6)) res = read.fwf(textConnection(txtvec[9:65]),widths=widths,stringsAsFactors=FALSE) nums = c(3,4,6,7,9,10,12,13,15,16,18,19) res[,nums] = sapply(res[,nums],as.numeric) ################################################ Best, Zack ----------------------------------------> Date: Mon, 19 Jan 2009 11:08:48 -0800 > From: spector at stat.berkeley.edu > To: zack_holden at hotmail.com > Subject: Re: [R] download/retain text file structure with RCurl/getURL() > > Zack - > Here's a start: > > txtfile = getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt',ftp.use.epsv = FALSE) > txtvec = strsplit(txtfile,'\n')[[1]] > widths = c(4,rep(c(5,4,6),6)) > res = read.fwf(textConnection(txtvec[9:65]),widths=widths,stringsAsFactors=FALSE) > nums = c(3,4,6,7,9,10,12,13,15,16,18,19) > res[,nums] = sapply(res[,nums],as.numeric) > > Hope this helps. > - Phil Spector > Statistical Computing Facility > Department of Statistics > UC Berkeley > spector at stat.berkeley.edu > > > > On Mon, 19 Jan 2009, zack holden wrote: > >> >> Dear list, >> >> I'm trying to download a text file directly from the internet using the RCurl package and the command getURL. Duncan Lang graciously helped me solve the first step in this problem using the following command: >> >> ################# >> txtfile <- getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt', >> ftp.use.epsv = FALSE) >> ################# >> >> This brings the text file into R in a single long character string. I've spent many hours now trying to bring this text file into R into a sensible form. I've tried every variant of different commands in getURL help file, as well as different >> strsplit() commands to try to break this character string into a sensible rows and columns, to no avail. >> >> Can anyone suggest a solution for doing this? I suspect there is a getURL command I'm missing. Alternatively, do I really have to break this long character string into rows and columns that I can then assemble into a table? >> >> I'd be grateful for any advice. >> >> Thanks in advance, >> >> Zack >> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>