Hemant Chowdhary
2016-Jul-02 22:43 UTC
[R] Extracting matrix from netCDF file using ncdf4 package
I am working with a 3-dimensional netCDF file having dimensions of X=100, Y=200, T=365. My objective is to extract time vectors of a few specific grids that may not be contiguous on X and/or Y. For example, I want to extract a 5x365 matrix where 5 rows are each vectors of length 365 of 5 specific X,Y combinations. For this, I am currently using the following reqX = c(35,35,40,65,95); reqY = c(2,5,10,112,120,120); nD = length(reqX) for(i in 1:nD){ idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), count=c(1,1)) if(i==1){dX = idX} else {dX = rbind(dX,idX)} } Is there more elegant/faster way other than to using a For Loop like this? It seems very slow when I may have to get much larger matrix where nD can be more than 1000. Thank you HC [[alternative HTML version deleted]]
Bert Gunter
2016-Jul-02 23:00 UTC
[R] Extracting matrix from netCDF file using ncdf4 package
I know nothing about netCDF files, but if you can download the file and make it an array, extraction via indexing takes no time at all:> ex <-array(rnorm(2*1e4*365, mean = 10), dim = c(100,200,365))> system.time(test <-ex[35,2,])user system elapsed 0 0 0> length(test)[1] 365 If this can't be done, sorry for the noise. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Jul 2, 2016 at 3:43 PM, Hemant Chowdhary via R-help <r-help at r-project.org> wrote:> I am working with a 3-dimensional netCDF file having dimensions of X=100, Y=200, T=365. > My objective is to extract time vectors of a few specific grids that may not be contiguous on X and/or Y. > > For example, I want to extract a 5x365 matrix where 5 rows are each vectors of length 365 of 5 specific X,Y combinations. > > For this, I am currently using the following > > reqX = c(35,35,40,65,95); > reqY = c(2,5,10,112,120,120); > nD = length(reqX) > for(i in 1:nD){ > idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), count=c(1,1)) > if(i==1){dX = idX} else {dX = rbind(dX,idX)} > } > > Is there more elegant/faster way other than to using a For Loop like this? It seems very slow when I may have to get much larger matrix where nD can be more than 1000. > > Thank you HC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Roy Mendelssohn - NOAA Federal
2016-Jul-02 23:26 UTC
[R] Extracting matrix from netCDF file using ncdf4 package
Sending this to Hemant a second time as i forgot to reply to list. Hi Hemant: Well technically the code you give below shouldn?t work, because ?start? and ?count? are suppose to be of the same dimensions as the variables. I guess Pierce?s code must be very forgiving if that is working. One thing you can do to speed things up is pre-allocate the array you want to create, say> dX <- array(NA_real_, dim=c(5,365))and then have the ncvar_get call write directly to the array:> dX[i,] <- ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i],1), count=c(1,1,-1))The second thing you can do, is to use ?lapply? instead of the ?for? loop, but I don?t know how much faster that will make your code. The fastest however, if you have the memory, is to just read the array into memory:> dX <- ncvar_get(nc=myNC, varid=?myVar?)and then use R?s subsetting abilities. You can do fancier subsetting of arrays in memory than you can to arrays on disk. HTH, -Roy> On Jul 2, 2016, at 3:43 PM, Hemant Chowdhary via R-help <r-help at r-project.org> wrote: > > I am working with a 3-dimensional netCDF file having dimensions of X=100, Y=200, T=365. > My objective is to extract time vectors of a few specific grids that may not be contiguous on X and/or Y. > > For example, I want to extract a 5x365 matrix where 5 rows are each vectors of length 365 of 5 specific X,Y combinations. > > For this, I am currently using the following > > reqX = c(35,35,40,65,95); > reqY = c(2,5,10,112,120,120); > nD = length(reqX) > for(i in 1:nD){ > idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), count=c(1,1)) > if(i==1){dX = idX} else {dX = rbind(dX,idX)} > } > > Is there more elegant/faster way other than to using a For Loop like this? It seems very slow when I may have to get much larger matrix where nD can be more than 1000. > > Thank you HC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.> On Jul 2, 2016, at 3:43 PM, Hemant Chowdhary via R-help <r-help at r-project.org> wrote: > > I am working with a 3-dimensional netCDF file having dimensions of X=100, Y=200, T=365. > My objective is to extract time vectors of a few specific grids that may not be contiguous on X and/or Y. > > For example, I want to extract a 5x365 matrix where 5 rows are each vectors of length 365 of 5 specific X,Y combinations. > > For this, I am currently using the following > > reqX = c(35,35,40,65,95); > reqY = c(2,5,10,112,120,120); > nD = length(reqX) > for(i in 1:nD){ > idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), count=c(1,1)) > if(i==1){dX = idX} else {dX = rbind(dX,idX)} > } > > Is there more elegant/faster way other than to using a For Loop like this? It seems very slow when I may have to get much larger matrix where nD can be more than 1000. > > Thank you HC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: Roy.Mendelssohn at noaa.gov www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
Hemant Chowdhary
2016-Jul-03 14:27 UTC
[R] Extracting matrix from netCDF file using ncdf4 package
Thank you both. Yes, this is basically the issue of able to subset an array rather than extracting from the netCDF file. The dX = ncvar_get(nc=myNC, varid="myVar")command already results in the array. And one can subset that array using indices. In turn the problem can be stated as follows:Let us say dX is a 3D array with dimensions 100x200x365. The objective is to extract five specific vectors of 365 each corresponding to reqX = c(35,35,40,65,95); and?reqY = c(2,5,10,112,120); dX2 = dX[reqX, reqY,]results again in an array of 5x5x365, i.e., corresponding to all 25 combinations of reqX and reqY. Somehow, I was expecting that there is a subsetting function that can result in a matrix of 5x365 directly. If there is none than one can extract one grid at a time and fill the pre-defined matrix as you have suggested. Thank you againHC On Saturday, 2 July 2016 7:26 PM, Roy Mendelssohn - NOAA Federal <roy.mendelssohn at noaa.gov> wrote: Sending this to Hemant a second time as i forgot to reply to list. Hi Hemant: Well technically the code you give below shouldn?t work, because ?start? and ?count? are suppose to be of the same dimensions as the variables.? I guess Pierce?s code must be very forgiving if that is working.? One thing you can do to speed things up is pre-allocate the array you want to create, say> dX <- array(NA_real_, dim=c(5,365))and then have the ncvar_get call write directly to the array:> dX[i,] <- ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i],1), count=c(1,1,-1))The second thing you can do, is to use ?lapply? instead of the ?for? loop, but I don?t know how much faster that will make your code.? The fastest however, if you have the memory, is to just read the array into memory:> dX <-? ncvar_get(nc=myNC, varid=?myVar?)and then use R?s subsetting abilities. You can do fancier subsetting of arrays in memory than you can to arrays on disk. HTH, -Roy> On Jul 2, 2016, at 3:43 PM, Hemant Chowdhary via R-help <r-help at r-project.org> wrote: > > I am working with a 3-dimensional netCDF file having dimensions of X=100, Y=200, T=365. > My objective is to extract time vectors of a few specific grids that may not be contiguous on X and/or Y. > > For example, I want to extract a 5x365 matrix where 5 rows are each vectors of length 365 of 5 specific X,Y combinations. > > For this, I am currently using the following > > reqX = c(35,35,40,65,95); > reqY = c(2,5,10,112,120,120); > nD = length(reqX) > for(i in 1:nD){ > idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), count=c(1,1)) > if(i==1){dX = idX} else {dX = rbind(dX,idX)} > } > > Is there more elegant/faster way other than to using a For Loop like this? It seems very slow when I may have to get much larger matrix where nD can be more than 1000. > > Thank you HC > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.> On Jul 2, 2016, at 3:43 PM, Hemant Chowdhary via R-help <r-help at r-project.org> wrote: > > I am working with a 3-dimensional netCDF file having dimensions of X=100, Y=200, T=365. > My objective is to extract time vectors of a few specific grids that may not be contiguous on X and/or Y. > > For example, I want to extract a 5x365 matrix where 5 rows are each vectors of length 365 of 5 specific X,Y combinations. > > For this, I am currently using the following > > reqX = c(35,35,40,65,95); > reqY = c(2,5,10,112,120,120); > nD = length(reqX) > for(i in 1:nD){ > idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), count=c(1,1)) > if(i==1){dX = idX} else {dX = rbind(dX,idX)} > } > > Is there more elegant/faster way other than to using a For Loop like this? It seems very slow when I may have to get much larger matrix where nD can be more than 1000. > > Thank you HC > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: Roy.Mendelssohn at noaa.gov www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. [[alternative HTML version deleted]]
Bert Gunter
2016-Jul-03 15:38 UTC
[R] Extracting matrix from netCDF file using ncdf4 package
Well, yes, ... but no: there is no need to pre-define the matrix. The following is still a (interpreted) loop, but it is fast and short. ## ex is the downloaded array, here filled with random numbers reqX = c(35,35,40,65,95) reqY = c(2,5,10,112,120) out <-sapply(seq_along(reqX), function(i)ex[reqX[i],reqY[i],] )> dim(out)[1] 365 5 You might find it useful to go through a (web) tutorial or two to learn more about such R functionality. Useful suggestions can be found here: https://www.rstudio.com/online-learning/#R Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Jul 2, 2016 at 3:43 PM, Hemant Chowdhary via R-help <r-help at r-project.org> wrote:> I am working with a 3-dimensional netCDF file having dimensions of X=100, Y=200, T=365. > My objective is to extract time vectors of a few specific grids that may not be contiguous on X and/or Y. > > For example, I want to extract a 5x365 matrix where 5 rows are each vectors of length 365 of 5 specific X,Y combinations. > > For this, I am currently using the following > > reqX = c(35,35,40,65,95); > reqY = c(2,5,10,112,120,120); > nD = length(reqX) > for(i in 1:nD){ > idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), count=c(1,1)) > if(i==1){dX = idX} else {dX = rbind(dX,idX)} > } > > Is there more elegant/faster way other than to using a For Loop like this? It seems very slow when I may have to get much larger matrix where nD can be more than 1000. > > Thank you HC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hemant Chowdhary
2016-Jul-04 12:29 UTC
[R] Extracting matrix from netCDF file using ncdf4 package
Thank you Bert. Yes, your suggestion is correct and there is no need to pre-define the matrix and the sapply function works quite fast. This resolves my issue. Thank you both againHC On Sunday, 3 July 2016 11:38 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote: Well, yes, ... but no: there is no need to pre-define the matrix. The following is still a (interpreted) loop, but it is fast and short. ## ex is the downloaded array, here filled with random numbers reqX = c(35,35,40,65,95) reqY = c(2,5,10,112,120) out <-sapply(seq_along(reqX), function(i)ex[reqX[i],reqY[i],] )> dim(out)[1] 365? 5 You might find it useful to go through a (web) tutorial or two to learn more about such R functionality. Useful suggestions can be found here: https://www.rstudio.com/online-learning/#R Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Jul 2, 2016 at 3:43 PM, Hemant Chowdhary via R-help <r-help at r-project.org> wrote:>? I am working with a 3-dimensional netCDF file having dimensions of X=100, Y=200, T=365. > My objective is to extract time vectors of a few specific grids that may not be contiguous on X and/or Y. > > For example, I want to extract a 5x365 matrix where 5 rows are each vectors of length 365 of 5 specific X,Y combinations. > > For this, I am currently using the following > > reqX = c(35,35,40,65,95); > reqY = c(2,5,10,112,120,120); > nD = length(reqX) > for(i in 1:nD){ > idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), count=c(1,1)) > if(i==1){dX = idX} else {dX = rbind(dX,idX)} > } > > Is there more elegant/faster way other than to using a For Loop like this? It seems very slow when I may have to get much larger matrix where nD can be more than 1000. > > Thank you HC > >? ? ? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]