Does anyone else have any insights to this issue:
Henrick, thank you for your very quick response. I've examined the readBin
help file with respect to endian and I'm still not sure I'm getting this
correct.
Here is what I'm coding:
con <- file(file.choose(), open="rb")
Year66 <- readBin(con, what=integer(), signed = TRUE, size = 2,
endian="little", n = 40374840) # define endian= "little"
length(Year66)
close(con)
# convert millimeters to inches
Year66.in <- Year66 * 0.039370
describe(Year66.in)
Year66.in
n missing unique Mean .05 .10 .25 .50 .75
.90 .95
8185584 0 65511 -21.56 -650.1 -650.1 -162.2 0.0 0.0
636.5 639.1
lowest : -1290 -1290 -1290 -1290 -1290, highest: 1290 1290 1290 1290
1290
# establish cut points using inches
bins <- cut(Year66.in, breaks=30)
barplot(table(bins))
length(Year66.in) # this returns a value representing the number of
records read as 8185584 or 20.2% (see next line) of the records that I'm
expecting.
length(Year66.in) / (419*264*365) # returns proportion of records expected
in one year
#### here I will introduce code to classify the summary statistics using
both a clustering and a non-metric scaling function. These procedures will
hopefully enable differentiation of #### cluster-groupings, associating
the initial input annual year values with a separate (not-shown) calculated
index.
What I eventually want to accomplish is a statistical summary for each of
the 37 years in the binary file. Reading in the file on a year to year
basis (n=40374840) should give me the all of the records for just the first
year, not all of the records in the binary file. I also therefore need to
better understand how to read a set of records for year 2, 3, 4, ... 37.
Any ideas ?
Thanks for your assistance
Steve
Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034
Steve_Friedman at nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147
Henrik Bengtsson
<hb at stat.berkeley
.edu> To
Sent by: Steve_Friedman at nps.gov
henrik.bengtsson@ cc
gmail.com r-help at r-project.org
Subject
Re: [R] Reading Binary Files
02/11/2009 09:20
AM PST
Argument 'size' is what you are looking for, cf. help(readBin).
Whenever reading binary files this way, I strongly recommend that you
are explicit about all arguments of readBin(), e.g.
readBin(con, what=integer(), size=2, signed=TRUE, endian="little",
n=n);
For instance, you probably do not want 'endian' to be dependent on the
platform (see help) you run on, but instead be specific to the file
format you are reading.
/Henrik
On Wed, Feb 11, 2009 at 8:04 AM, <Steve_Friedman at nps.gov>
wrote:>
> Hello
>
> I'm encountering some difficulty correctly reading binary files. The
binary> files store data as "short" rather than "double" ,
"int", or any of the
> other modes of the vector being read.
>
> The data represents a regular grid of size 419 rows by 264 columns, to
make> it more interesting, the data are daily records, for a total of 37 years.
> The file size is therefore 419(rows) * 264(columns) * 365(days) *
37(years)> long.
>
> The product of these dimensions is 1493869080 records.
>
> I'm using the following code to read these into R (windows 2.8.1 )
>
> con <- file(file.choose(), open="rb")
> Year66 <- readBin(con, integer, signed=TRUE, n = 40374840)
> close(con)
>
> length(Year66)
>
> returns 2046396
>
> I'm betting that I'm defining the "what" incorrectly, but
after numerous
> attempts with different choices I'm wondering if readBin can handle
"short"> values?
>
> Any help is greatly appreciated.
>
> Steve
>
>
> Steve Friedman Ph. D.
> Spatial Statistical Analyst
> Everglades and Dry Tortugas National Park
> 950 N Krome Ave (3rd Floor)
> Homestead, Florida 33034
>
> Steve_Friedman at nps.gov
> Office (305) 224 - 4282
> Fax (305) 224 - 4147
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
>
Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034
Steve_Friedman at nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147