zack holden
2009-Jan-19 18:26 UTC
[R] download/retain text file structure with RCurl/getURL()
Dear list,
I'm trying to download a text file directly from the internet using the
RCurl package and the command getURL. Duncan Lang graciously helped me solve the
first step in this problem using the following command:
#################
txtfile <-
getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt',
ftp.use.epsv = FALSE)
#################
This brings the text file into R in a single long character string. I've
spent many hours now trying to bring this text file into R into a sensible form.
I've tried every variant of different commands in getURL help file, as well
as different
strsplit() commands to try to break this character string into a sensible rows
and columns, to no avail.
Can anyone suggest a solution for doing this? I suspect there is a getURL
command I'm missing. Alternatively, do I really have to break this long
character string into rows and columns that I can then assemble into a table?
I'd be grateful for any advice.
Thanks in advance,
Zack
Gabor Grothendieck
2009-Jan-19 18:38 UTC
[R] download/retain text file structure with RCurl/getURL()
If you are having problems with the default download.file method you can try method = "wget": f <- "ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt" download.file(f, basename(f), method = "wget") On Mon, Jan 19, 2009 at 1:26 PM, zack holden <zack_holden at hotmail.com> wrote:> > Dear list, > > I'm trying to download a text file directly from the internet using the RCurl package and the command getURL. Duncan Lang graciously helped me solve the first step in this problem using the following command: > > ################# > txtfile <- getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt', > ftp.use.epsv = FALSE) > ################# > > This brings the text file into R in a single long character string. I've spent many hours now trying to bring this text file into R into a sensible form. I've tried every variant of different commands in getURL help file, as well as different > strsplit() commands to try to break this character string into a sensible rows and columns, to no avail. > > Can anyone suggest a solution for doing this? I suspect there is a getURL command I'm missing. Alternatively, do I really have to break this long character string into rows and columns that I can then assemble into a table? > > I'd be grateful for any advice. > > Thanks in advance, > > Zack > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
David Winsemius
2009-Jan-19 19:52 UTC
[R] download/retain text file structure with RCurl/getURL()
It's a fixed width format, with irregular entries, perhaps something
along the lines of:
read.fwf(textConnection(txtfile), skip = 8, # skips the header
widths = <column widths vector>,
colnames= <colnames> ,
nrows=48 ) #drops the trailing summary text
perhaps :
widths = c(2, -1, 1, -1 ,4, -1, 3 .... the rest # the -col
entries drop the white-space
names = c("year","card", "Jan.date",
"Jan.dep" .....
the rest
Just the first few columns seem to come in acceptably, although the
lines with all NA's will need to be deleted:
> read.fwf(textConnection(txtfile), skip = 8, # skips the header
+ widths = c(2, -1, 1, -1 ,4, -1, 3), # the -col entries drop
the white-space
+ col.names = c("year","card", "Jan.date",
"Jan.dep"),
nrows=48 )
year card Jan.date Jan.dep
1 61 1 E/ST NA
2 62 1 E/ST NA
3 63 1 K/31 15
4 64 1 K/30 12
5 NA NA <NA> NA
6 65 1 E/ST NA
7 66 1 1/07 17
8 67 1 E/ST NA
9 68 1 K/28 12
10 69 1 K/31 22
11 NA NA <NA> NA
12 70 1 K/30 16
13 71 1 K/29 28
14 72 1 K/28 32
15 73 1 1/02 16
snip
--
David Winsemius
On Jan 19, 2009, at 1:26 PM, zack holden wrote:
>
> Dear list,
>
> I'm trying to download a text file directly from the internet using
> the RCurl package and the command getURL. Duncan Lang graciously
> helped me solve the first step in this problem using the following
> command:
>
> #################
> txtfile <-
getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt'
> ,
> ftp.use.epsv = FALSE)
> #################
>
> This brings the text file into R in a single long character string.
> I've spent many hours now trying to bring this text file into R into
> a sensible form. I've tried every variant of different commands in
> getURL help file, as well as different
> strsplit() commands to try to break this character string into a
> sensible rows and columns, to no avail.
>
> Can anyone suggest a solution for doing this? I suspect there is a
> getURL command I'm missing. Alternatively, do I really have to break
> this long character string into rows and columns that I can then
> assemble into a table?
>
> I'd be grateful for any advice.
>
> Thanks in advance,
>
> Zack
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
zack holden
2009-Jan-21 15:53 UTC
[R] download/retain text file structure with RCurl/getURL(): Solution
Dear list,
I'm posting the solution to my problem in case others may find this useful.
This code was sent to me by Phil Spector. With a bit of cleaning, it can easily
be converted to a usable format. Thanks to Gabor Grothendieck, David winsemius
and Martin Morgan for also sending possible solutions. Thank you all for taking
the time to help, I would not have solved this on my own.
###############################################
require(RCurl)
txtfile =
getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt',ftp.use.epsv
= FALSE)
txtvec = strsplit(txtfile,'\n')[[1]]
widths = c(4,rep(c(5,4,6),6))
res =
read.fwf(textConnection(txtvec[9:65]),widths=widths,stringsAsFactors=FALSE)
nums = c(3,4,6,7,9,10,12,13,15,16,18,19)
res[,nums] = sapply(res[,nums],as.numeric)
################################################
Best,
Zack
----------------------------------------> Date: Mon, 19 Jan 2009 11:08:48 -0800
> From: spector at stat.berkeley.edu
> To: zack_holden at hotmail.com
> Subject: Re: [R] download/retain text file structure with RCurl/getURL()
>
> Zack -
> Here's a start:
>
> txtfile =
getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt',ftp.use.epsv
= FALSE)
> txtvec = strsplit(txtfile,'\n')[[1]]
> widths = c(4,rep(c(5,4,6),6))
> res =
read.fwf(textConnection(txtvec[9:65]),widths=widths,stringsAsFactors=FALSE)
> nums = c(3,4,6,7,9,10,12,13,15,16,18,19)
> res[,nums] = sapply(res[,nums],as.numeric)
>
> Hope this helps.
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spector at stat.berkeley.edu
>
>
>
> On Mon, 19 Jan 2009, zack holden wrote:
>
>>
>> Dear list,
>>
>> I'm trying to download a text file directly from the internet using
the RCurl package and the command getURL. Duncan Lang graciously helped me solve
the first step in this problem using the following command:
>>
>> #################
>> txtfile <-
getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt',
>> ftp.use.epsv = FALSE)
>> #################
>>
>> This brings the text file into R in a single long character string.
I've spent many hours now trying to bring this text file into R into a
sensible form. I've tried every variant of different commands in getURL help
file, as well as different
>> strsplit() commands to try to break this character string into a
sensible rows and columns, to no avail.
>>
>> Can anyone suggest a solution for doing this? I suspect there is a
getURL command I'm missing. Alternatively, do I really have to break this
long character string into rows and columns that I can then assemble into a
table?
>>
>> I'd be grateful for any advice.
>>
>> Thanks in advance,
>>
>> Zack
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>