thr3ads.net - R help - [R] read the middle of a file [Aug 2010]

If this information is useful, please help other people find it:
Share via:

R Heberto Ghezzo, Dr

2010-Aug-02 19:33 UTC

[R] read the middle of a file

Hello,
The other day Justin Peter presented a mini program to plot a topographic map
with an overlay of the worldHires. I seemed interesting so I checked the ETOPO5
site and find that there is a new file ETOPO1 with a 1 minute grid. I downloaded
it and tried a similar procedure. Now the ETOPO1.gz is 1 Gb and the uncompressed
file is 5 Gb. They do not fit into my laptop. I tried the following program that
works, but
it takes too long to read the file,
Now the question: is there a faster way to read a piece of a file in the middle
of it?
My program:
 #
 lon.min <-  140.0
 lon.max <-  150.0
 lat.min <- -45.0
 lat.max <- -40.0
 #
 library(maps)
 library(mapdata)
 map('worldHires',xlim=c(lon.min,lon.max),ylim=c(lat.min,lat.max),lwd=2)
# to check is OK
 #
 con <- file("c:/temp/Etopo/ETOPO1_Bed_c_int.xyz.gz","r")
 la  <- (90  - lat.max) * 60             # top
 nla <- -(la - (90 - lat.min) * 60)      # length in latitud
 lo  <- (180 + lon.min) * 60             # first longitud
 nlo <- (180 + lon.max)*60 - lo          # length in longitud
#
cat("from Long ",lo/60-180," to ",(lo+nlo)/60-180," and
Lat ",lat.min," to ",lat.max," degrees \n")
cat(" top discard ",la*360*60+lo," records, Are
",la*60," segments of ",nlo," usefull,discard
",360*60-nlo,"\n")
ti0 <- proc.time()
for(i in 1:360) {
  b <- readLines(con,n=la*60)     # discard top by degrees
# without this loop the vector b does not fit in RAM
  rm(b)
}
 b <- readLines(con,n=lo-1)        # discard top-to long
 rm(b) 
ti1 <- proc.time() 
 b <- NULL
 for(i in 1:nla) {
   b <- c(b,readLines(con,n=nlo+2))       # read line and accumulate
   readLines(con,n=360 * 60 - nlo-2)      # discard rest of the line
 }  
#
ti2 <- proc.time()
bbst <- strsplit(b,",")  # b is list of records (strings), change
to reals
bbstu <- unlist(bbst)
bbstun <- as.numeric(bbstu)
bbstum <- matrix(bbstun,ncol=3,byrow=TRUE)
dim(bbstum)
z <- matrix(bbstum[,3],ncol=nlo+2 ,byrow=TRUE)
x <- rev(unique(bbstum[,2]))
y <- unique(bbstum[,1])
#
sz <- apply(z,2,rev)
image(y,x,t(sz),col=topo.colors(20))
contour(y,x,t(sz),nlevels=20,add=TRUE)
map('worldHires',xlim=range(y),ylim=range(x),add=TRUE,lwd=2)
#
ti3 <- proc.time()
ti1 - ti0
ti2 - ti1
ti3 - ti2
and the timings are:>   tasmania ti1 - ti0
   user  system elapsed
 563.06    1.40  578.02> ti2 - ti1   user  system elapsed
  26.74    0.08   26.97> ti3 - ti2   user  system elapsed
   2.76    1.42    4.21> #So it takes me 10 minutes to read and discard the top of the file and half a
minute to read the usefull latitudes and discard the unusefull longitudes
Thanks for any help to speed-up/ improve this program
Heberto Ghezzo
Montreal

Michael Sumner

2010-Aug-03 02:56 UTC

head link

[R] read the middle of a file

Try readGDAL in the rgdal package - best to use the GeoTIFF version of
Etopo1 though - the GDAL functions are optimized for binary versions
of large rasters like this, and with arguments to readGDAL you can
read sub-windows or decimated versions.

Cheers, Mike.

On Tue, Aug 3, 2010 at 5:33 AM, R Heberto Ghezzo, Dr
<heberto.ghezzo at mcgill.ca> wrote:> Hello,
> The other day Justin Peter presented a mini program to plot a topographic
map with an overlay of the worldHires. I seemed interesting so I checked the
ETOPO5 site and find that there is a new file ETOPO1 with a 1 minute grid. I
downloaded it and tried a similar procedure. Now the ETOPO1.gz is 1 Gb and the
uncompressed file is 5 Gb. They do not fit into my laptop. I tried the following
program that works, but
> it takes too long to read the file,
> Now the question: is there a faster way to read a piece of a file in the
middle of it?
> My program:
> ?#
> ?lon.min <- ?140.0
> ?lon.max <- ?150.0
> ?lat.min <- -45.0
> ?lat.max <- -40.0
> ?#
> ?library(maps)
> ?library(mapdata)
>
?map('worldHires',xlim=c(lon.min,lon.max),ylim=c(lat.min,lat.max),lwd=2)
# to check is OK
> ?#
> ?con <-
file("c:/temp/Etopo/ETOPO1_Bed_c_int.xyz.gz","r")
> ?la ?<- (90 ?- lat.max) * 60 ? ? ? ? ? ? # top
> ?nla <- -(la - (90 - lat.min) * 60) ? ? ?# length in latitud
> ?lo ?<- (180 + lon.min) * 60 ? ? ? ? ? ? # first longitud
> ?nlo <- (180 + lon.max)*60 - lo ? ? ? ? ?# length in longitud
> #
> cat("from Long ",lo/60-180," to
",(lo+nlo)/60-180," and Lat ",lat.min," to
",lat.max," degrees \n")
> cat(" top discard ",la*360*60+lo," records, Are
",la*60," segments of ",nlo," usefull,discard
",360*60-nlo,"\n")
> ti0 <- proc.time()
> for(i in 1:360) {
> ?b <- readLines(con,n=la*60) ? ? # discard top by degrees
> # without this loop the vector b does not fit in RAM
> ?rm(b)
> }
> ?b <- readLines(con,n=lo-1) ? ? ? ?# discard top-to long
> ?rm(b)
> ti1 <- proc.time()
> ?b <- NULL
> ?for(i in 1:nla) {
> ? b <- c(b,readLines(con,n=nlo+2)) ? ? ? # read line and accumulate
> ? readLines(con,n=360 * 60 - nlo-2) ? ? ?# discard rest of the line
> ?}
> #
> ti2 <- proc.time()
> bbst <- strsplit(b,",") ?# b is list of records (strings),
change to reals
> bbstu <- unlist(bbst)
> bbstun <- as.numeric(bbstu)
> bbstum <- matrix(bbstun,ncol=3,byrow=TRUE)
> dim(bbstum)
> z <- matrix(bbstum[,3],ncol=nlo+2 ,byrow=TRUE)
> x <- rev(unique(bbstum[,2]))
> y <- unique(bbstum[,1])
> #
> sz <- apply(z,2,rev)
> image(y,x,t(sz),col=topo.colors(20))
> contour(y,x,t(sz),nlevels=20,add=TRUE)
> map('worldHires',xlim=range(y),ylim=range(x),add=TRUE,lwd=2)
> #
> ti3 <- proc.time()
> ti1 - ti0
> ti2 - ti1
> ti3 - ti2
> and the timings are:
>> ? tasmania
> ?ti1 - ti0
> ? user ?system elapsed
> ?563.06 ? ?1.40 ?578.02
>> ti2 - ti1
> ? user ?system elapsed
> ?26.74 ? ?0.08 ? 26.97
>> ti3 - ti2
> ? user ?system elapsed
> ? 2.76 ? ?1.42 ? ?4.21
>> #
> So it takes me 10 minutes to read and discard the top of the file and half
a minute to read the usefull latitudes and discard the unusefull longitudes
> Thanks for any help to speed-up/ improve this program
> Heberto Ghezzo
> Montreal
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsumner at gmail.com

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Aug 2010 - read the middle of a file

[R] read the middle of a file

[R] read the middle of a file

Possibly Parallel Threads