Roy Mendelssohn - NOAA Federal
2016-Mar-18 21:28 UTC
[R] Reshaping an array - how does it work in R
Hi All: I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: myData[noLat, no Lon, noTime]. It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: myData[noLat*noLon, noTime]. Normally this would be easy: myData<- array(myData,dim=c(noLat*noLon,noTime)) My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. Thanks, -Roy ********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: Roy.Mendelssohn at noaa.gov www: pfeg.noaa.gov "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
Hi Roy, R (usually) makes a copy if the dimensionality of an array is modified, even if you use this syntax: x <- array(1:24, c(2, 3, 4)) dim(x) <- c(6, 4) See also ?tracemem, ?data.table::address, ?pryr::address and other tools to trace if an internal copy is done. Workaround: use data.table::setattr or bit::setattr to modify the dimensions in place (i.e., without making a copy). Risk: if you modify an object by reference, all other objects which point to the same memory address will be modified silently, too. HTH, Denes On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:> Hi All: > > I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: > > myData[noLat, no Lon, noTime]. > > It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: > > myData[noLat*noLon, noTime]. Normally this would be easy: > > myData<- array(myData,dim=c(noLat*noLon,noTime)) > > My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. > > Thanks, > > -Roy > > > > ********************** > "The contents of this message do not reflect any position of the U.S. Government or NOAA." > ********************** > Roy Mendelssohn > Supervisory Operations Research Analyst > NOAA/NMFS > Environmental Research Division > Southwest Fisheries Science Center > ***Note new address and phone*** > 110 Shaffer Road > Santa Cruz, CA 95060 > Phone: (831)-420-3666 > Fax: (831) 420-3980 > e-mail: Roy.Mendelssohn at noaa.gov www: pfeg.noaa.gov > > "Old age and treachery will overcome youth and skill." > "From those who have been given much, much will be expected" > "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
arrays are vectors stored in column major order. So the answer is: reindexing. Does this make it clear:> v <- array(1:24,dim=2:4) > as.vector(v)[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24> v, , 1 [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 , , 2 [,1] [,2] [,3] [1,] 7 9 11 [2,] 8 10 12 , , 3 [,1] [,2] [,3] [1,] 13 15 17 [2,] 14 16 18 , , 4 [,1] [,2] [,3] [1,] 19 21 23 [2,] 20 22 24> w <- array(as.vector(v),dim=c(6,4)) ## you would use v instead of w for the assignment > w[,1] [,2] [,3] [,4] [1,] 1 7 13 19 [2,] 2 8 14 20 [3,] 3 9 15 21 [4,] 4 10 16 22 [5,] 5 11 17 23 [6,] 6 12 18 24> identical(as.vector(w), as.vector(v))[1] TRUE However copying may occur anyway as part of R's semantics. Others will have to help you on that, as the details here are beyond me. Cheers, Bert Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Mar 18, 2016 at 2:28 PM, Roy Mendelssohn - NOAA Federal <roy.mendelssohn at noaa.gov> wrote:> Hi All: > > I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: > > myData[noLat, no Lon, noTime]. > > It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: > > myData[noLat*noLon, noTime]. Normally this would be easy: > > myData<- array(myData,dim=c(noLat*noLon,noTime)) > > My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. > > Thanks, > > -Roy > > > > ********************** > "The contents of this message do not reflect any position of the U.S. Government or NOAA." > ********************** > Roy Mendelssohn > Supervisory Operations Research Analyst > NOAA/NMFS > Environmental Research Division > Southwest Fisheries Science Center > ***Note new address and phone*** > 110 Shaffer Road > Santa Cruz, CA 95060 > Phone: (831)-420-3666 > Fax: (831) 420-3980 > e-mail: Roy.Mendelssohn at noaa.gov www: pfeg.noaa.gov > > "Old age and treachery will overcome youth and skill." > "From those who have been given much, much will be expected" > "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Roy Mendelssohn - NOAA Federal
2016-Mar-18 22:11 UTC
[R] Reshaping an array - how does it work in R
> On Mar 18, 2016, at 2:56 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > However copying may occur anyway as part of R's semantics. Others will > have to help you on that, as the details here are beyond me. > > Cheers, > BertHi Bert: Thanks for your response. The only part I was concerned with is whether a copy was made, that is what my memory usage would be. Sorry if that wasn?t clear in the original. -Roy ********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: Roy.Mendelssohn at noaa.gov www: pfeg.noaa.gov "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
R always makes a copy for this kind of operation. There are some operations that don't make copies, but I don't think this one qualifies. -- Sent from my phone. Please excuse my brevity. On March 18, 2016 2:28:35 PM PDT, Roy Mendelssohn - NOAA Federal <roy.mendelssohn at noaa.gov> wrote:>Hi All: > >I am working with a very large array. if noLat is the number of >latitudes, noLon the number of longitudes and noTime the number of >time periods, the array is of the form: > >myData[noLat, no Lon, noTime]. > >It is read in this way because that is how it is stored in a (series) >of netcdf files. For the analysis I need to do, I need instead the >array: > >myData[noLat*noLon, noTime]. Normally this would be easy: > >myData<- array(myData,dim=c(noLat*noLon,noTime)) > >My question is how does this command work in R - does it make a copy of >the existing array, with different indices for the dimensions, or does >it just redo the indices and leave the given array as is? The reason >for this question is my array is 30GB in memory, and I don?t have >enough space to have a copy of the array in memory. If the latter I >will have to figure out a work around to bring in only part of the data >at a time and put it into the proper locations. > >Thanks, > >-Roy > > > >********************** >"The contents of this message do not reflect any position of the U.S. >Government or NOAA." >********************** >Roy Mendelssohn >Supervisory Operations Research Analyst >NOAA/NMFS >Environmental Research Division >Southwest Fisheries Science Center >***Note new address and phone*** >110 Shaffer Road >Santa Cruz, CA 95060 >Phone: (831)-420-3666 >Fax: (831) 420-3980 >e-mail: Roy.Mendelssohn at noaa.gov www: pfeg.noaa.gov > >"Old age and treachery will overcome youth and skill." >"From those who have been given much, much will be expected" >"the arc of the moral universe is long, but it bends toward justice" >-MLK Jr. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Roy Mendelssohn - NOAA Federal
2016-Mar-18 22:15 UTC
[R] Reshaping an array - how does it work in R
Thanks. That is what I needed to know. I don?t want to play around with some of the other suggestions, as I don?t totally understand what they do, and don?t want to risk messing up something and not be aware of it. There is a way to read in the data chunks at a time and reshape it and put, it into the (reshaped) larger array, harder to program but probably worth the pain to be certain of what I am doing. I had a feeling a copy was made, just wanted to make certain of it. Thanks again, -Roy> On Mar 18, 2016, at 2:56 PM, D?nes T?th <toth.denes at ttk.mta.hu> wrote: > > Hi Roy, > > R (usually) makes a copy if the dimensionality of an array is modified, even if you use this syntax: > x <- array(1:24, c(2, 3, 4)) > dim(x) <- c(6, 4) > > See also ?tracemem, ?data.table::address, ?pryr::address and other tools to trace if an internal copy is done. > > Workaround: use data.table::setattr or bit::setattr to modify the dimensions in place (i.e., without making a copy). Risk: if you modify an object by reference, all other objects which point to the same memory address will be modified silently, too. > > HTH, > Denes > > > > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote: >> Hi All: >> >> I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: >> >> myData[noLat, no Lon, noTime]. >> >> It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: >> >> myData[noLat*noLon, noTime]. Normally this would be easy: >> >> myData<- array(myData,dim=c(noLat*noLon,noTime)) >> >> My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. >> >> Thanks, >> >> -Roy >> >> >> >> ********************** >> "The contents of this message do not reflect any position of the U.S. Government or NOAA." >> ********************** >> Roy Mendelssohn >> Supervisory Operations Research Analyst >> NOAA/NMFS >> Environmental Research Division >> Southwest Fisheries Science Center >> ***Note new address and phone*** >> 110 Shaffer Road >> Santa Cruz, CA 95060 >> Phone: (831)-420-3666 >> Fax: (831) 420-3980 >> e-mail: Roy.Mendelssohn at noaa.gov www: pfeg.noaa.gov >> >> "Old age and treachery will overcome youth and skill." >> "From those who have been given much, much will be expected" >> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: Roy.Mendelssohn at noaa.gov www: pfeg.noaa.gov "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
>>>>> D?nes T?th <toth.denes at ttk.mta.hu> >>>>> on Fri, 18 Mar 2016 22:56:23 +0100 writes:> Hi Roy, > R (usually) makes a copy if the dimensionality of an array is modified, > even if you use this syntax: > x <- array(1:24, c(2, 3, 4)) > dim(x) <- c(6, 4) > See also ?tracemem, ?data.table::address, ?pryr::address and other tools > to trace if an internal copy is done. Well, without using strange (;-) packages, indeed standard R's tracemem(), notably the help page is a good pointer. According to the help page memory tracing is enabled in the default R binaries for Windows and OS X. For Linux (where I, as R developer, compile R myself anyway), one needs to configure with --enable-memory-profiling . Now, let's try: > x <- array(rnorm(47), dim = c(1000,50, 40)) > tracemem(x) [1] "<0x7f79a498a010>" > dim(x) <- c(1000* 50, 40) > x[5] <- pi > tracemem(x) [1] "<0x7f79a498a010>" > So, *BOTH* the re-dimensioning *AND* the sub-assignment did *NOT* make a copy. Indeed, R has become much smarter in these things in recent years ... not thanks to me, but very much thanks to Luke Tierney (from R-core), and also thanks to contributions from "outside", notably Tomas Kalibera. And hence: *NO* such strange workarounds are needed in this specific case: > Workaround: use data.table::setattr or bit::setattr to modify the > dimensions in place (i.e., without making a copy). Risk: if you modify > an object by reference, all other objects which point to the same memory > address will be modified silently, too. Martin Maechler, ETH Zurich (and R-core) > HTH, > Denes (generally, your contributions help indeed, Denes, thank you!) > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote: >> Hi All: >> >> I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: >> >> myData[noLat, no Lon, noTime]. >> >> It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: >> >> myData[noLat*noLon, noTime]. Normally this would be easy: >> >> myData<- array(myData,dim=c(noLat*noLon,noTime)) >> >> My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. >> >> Thanks, >> >> -Roy