>>>>> D?nes T?th <toth.denes at ttk.mta.hu> >>>>> on Fri, 18 Mar 2016 22:56:23 +0100 writes:> Hi Roy, > R (usually) makes a copy if the dimensionality of an array is modified, > even if you use this syntax: > x <- array(1:24, c(2, 3, 4)) > dim(x) <- c(6, 4) > See also ?tracemem, ?data.table::address, ?pryr::address and other tools > to trace if an internal copy is done. Well, without using strange (;-) packages, indeed standard R's tracemem(), notably the help page is a good pointer. According to the help page memory tracing is enabled in the default R binaries for Windows and OS X. For Linux (where I, as R developer, compile R myself anyway), one needs to configure with --enable-memory-profiling . Now, let's try: > x <- array(rnorm(47), dim = c(1000,50, 40)) > tracemem(x) [1] "<0x7f79a498a010>" > dim(x) <- c(1000* 50, 40) > x[5] <- pi > tracemem(x) [1] "<0x7f79a498a010>" > So, *BOTH* the re-dimensioning *AND* the sub-assignment did *NOT* make a copy. Indeed, R has become much smarter in these things in recent years ... not thanks to me, but very much thanks to Luke Tierney (from R-core), and also thanks to contributions from "outside", notably Tomas Kalibera. And hence: *NO* such strange workarounds are needed in this specific case: > Workaround: use data.table::setattr or bit::setattr to modify the > dimensions in place (i.e., without making a copy). Risk: if you modify > an object by reference, all other objects which point to the same memory > address will be modified silently, too. Martin Maechler, ETH Zurich (and R-core) > HTH, > Denes (generally, your contributions help indeed, Denes, thank you!) > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote: >> Hi All: >> >> I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: >> >> myData[noLat, no Lon, noTime]. >> >> It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: >> >> myData[noLat*noLon, noTime]. Normally this would be easy: >> >> myData<- array(myData,dim=c(noLat*noLon,noTime)) >> >> My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. >> >> Thanks, >> >> -Roy
Hi Martin, On 03/22/2016 10:20 AM, Martin Maechler wrote:>>>>>> >>>>>D?nes T?th<toth.denes at ttk.mta.hu> >>>>>> >>>>> on Fri, 18 Mar 2016 22:56:23 +0100 writes: > > Hi Roy, > > R (usually) makes a copy if the dimensionality of an array is modified, > > even if you use this syntax: > > > x <- array(1:24, c(2, 3, 4)) > > dim(x) <- c(6, 4) > > > See also ?tracemem, ?data.table::address, ?pryr::address and other tools > > to trace if an internal copy is done. > > Well, without using strange (;-) packages, indeed standard R's > tracemem(), notably the help page is a good pointer. > > According to the help page memory tracing is enabled in the > default R binaries for Windows and OS X. > For Linux (where I, as R developer, compile R myself anyway), > one needs to configure with --enable-memory-profiling . > > Now, let's try: > > > x <- array(rnorm(47), dim = c(1000,50, 40)) > > tracemem(x) > [1] "<0x7f79a498a010>" > > dim(x) <- c(1000* 50, 40) > > x[5] <- pi > > tracemem(x) > [1] "<0x7f79a498a010>" > > > > So,*BOTH* the re-dimensioning*AND* the sub-assignment did > *NOT* make a copy.This is interesting. First I wanted to demonstrate to Roy that recent R versions are smart enough not to make any copy during reshaping an array. Then I put together an example (similar to yours) and realized that after several reshapes, R starts to copy the array. So I had to modify my suggestion... And now, I realized that this was an RStudio-issue. At least on Linux, a standard R terminal behaves as you described, however, RStudio (version 0.99.862, which is not the very latest) tends to create copies (quite randomly, at least to me). If I have time I will test this more thoroughly and file a report to RStudio if it turns out to be a bug. Denes> > Indeed, R has become much smarter in these things in recent > years ... not thanks to me, but very much thanks to > Luke Tierney (from R-core), and also thanks to contributions from "outside", > notably Tomas Kalibera. > > And hence:*NO* such strange workarounds are needed in this specific case: > > > Workaround: use data.table::setattr or bit::setattr to modify the > > dimensions in place (i.e., without making a copy). Risk: if you modify > > an object by reference, all other objects which point to the same memory > > address will be modified silently, too. > > Martin Maechler, ETH Zurich (and R-core) > > > HTH, > > Denes > > (generally, your contributions help indeed, Denes, thank you!) > > > > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote: > >> Hi All: > >> > >> I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: > >> > >> myData[noLat, no Lon, noTime]. > >> > >> It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: > >> > >> myData[noLat*noLon, noTime]. Normally this would be easy: > >> > >> myData<- array(myData,dim=c(noLat*noLon,noTime)) > >> > >> My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. > >> > >> Thanks, > >> > >> -Roy >
Roy Mendelssohn - NOAA Federal
2016-Mar-22 14:21 UTC
[R] Reshaping an array - how does it work in R
Thanks all. This is interesting, and for what I am doing worthwhile and helpful. I have to be careful in each operation whether a copy is made or not, and knowing this allows me to test on small examples what any command will do before I use, Thanks again, I appreciate all the help. I will have a related question, but will put it under a different heading. -Roy> On Mar 22, 2016, at 2:55 AM, D?nes T?th <toth.denes at ttk.mta.hu> wrote: > > > Hi Martin, > > > On 03/22/2016 10:20 AM, Martin Maechler wrote: >>>>>>> >>>>>D?nes T?th<toth.denes at ttk.mta.hu> >>>>>>> >>>>> on Fri, 18 Mar 2016 22:56:23 +0100 writes: >> > Hi Roy, >> > R (usually) makes a copy if the dimensionality of an array is modified, >> > even if you use this syntax: >> >> > x <- array(1:24, c(2, 3, 4)) >> > dim(x) <- c(6, 4) >> >> > See also ?tracemem, ?data.table::address, ?pryr::address and other tools >> > to trace if an internal copy is done. >> >> Well, without using strange (;-) packages, indeed standard R's >> tracemem(), notably the help page is a good pointer. >> >> According to the help page memory tracing is enabled in the >> default R binaries for Windows and OS X. >> For Linux (where I, as R developer, compile R myself anyway), >> one needs to configure with --enable-memory-profiling . >> >> Now, let's try: >> >> > x <- array(rnorm(47), dim = c(1000,50, 40)) >> > tracemem(x) >> [1] "<0x7f79a498a010>" >> > dim(x) <- c(1000* 50, 40) >> > x[5] <- pi >> > tracemem(x) >> [1] "<0x7f79a498a010>" >> > >> >> So,*BOTH* the re-dimensioning*AND* the sub-assignment did >> *NOT* make a copy. > > This is interesting. First I wanted to demonstrate to Roy that recent R versions are smart enough not to make any copy during reshaping an array. Then I put together an example (similar to yours) and realized that after several reshapes, R starts to copy the array. So I had to modify my suggestion... And now, I realized that this was an RStudio-issue. At least on Linux, a standard R terminal behaves as you described, however, RStudio (version 0.99.862, which is not the very latest) tends to create copies (quite randomly, at least to me). If I have time I will test this more thoroughly and file a report to RStudio if it turns out to be a bug. > > Denes > >> >> Indeed, R has become much smarter in these things in recent >> years ... not thanks to me, but very much thanks to >> Luke Tierney (from R-core), and also thanks to contributions from "outside", >> notably Tomas Kalibera. >> >> And hence:*NO* such strange workarounds are needed in this specific case: >> >> > Workaround: use data.table::setattr or bit::setattr to modify the >> > dimensions in place (i.e., without making a copy). Risk: if you modify >> > an object by reference, all other objects which point to the same memory >> > address will be modified silently, too. >> >> Martin Maechler, ETH Zurich (and R-core) >> >> > HTH, >> > Denes >> >> (generally, your contributions help indeed, Denes, thank you!) >> >> >> > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote: >> >> Hi All: >> >> >> >> I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: >> >> >> >> myData[noLat, no Lon, noTime]. >> >> >> >> It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: >> >> >> >> myData[noLat*noLon, noTime]. Normally this would be easy: >> >> >> >> myData<- array(myData,dim=c(noLat*noLon,noTime)) >> >> >> >> My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. >> >> >> >> Thanks, >> >> >> >> -Roy********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: Roy.Mendelssohn at noaa.gov www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
>>>>> D?nes T?th <toth.denes at ttk.mta.hu> >>>>> on Tue, 22 Mar 2016 10:55:58 +0100 writes:> Hi Martin, > On 03/22/2016 10:20 AM, Martin Maechler wrote: >>>>>>> >>>>>D?nes T?th<toth.denes at ttk.mta.hu> >>>>>>> >>>>> on Fri, 18 Mar 2016 22:56:23 +0100 writes: >> > Hi Roy, >> > R (usually) makes a copy if the dimensionality of an array is modified, >> > even if you use this syntax: >> >> > x <- array(1:24, c(2, 3, 4)) >> > dim(x) <- c(6, 4) >> >> > See also ?tracemem, ?data.table::address, ?pryr::address and other tools >> > to trace if an internal copy is done. >> >> Well, without using strange (;-) packages, indeed standard R's >> tracemem(), notably the help page is a good pointer. >> >> According to the help page memory tracing is enabled in the >> default R binaries for Windows and OS X. >> For Linux (where I, as R developer, compile R myself anyway), >> one needs to configure with --enable-memory-profiling . >> >> Now, let's try: >> >> > x <- array(rnorm(47), dim = c(1000,50, 40)) >> > tracemem(x) >> [1] "<0x7f79a498a010>" >> > dim(x) <- c(1000* 50, 40) >> > x[5] <- pi >> > tracemem(x) >> [1] "<0x7f79a498a010>" >> > >> >> So,*BOTH* the re-dimensioning*AND* the sub-assignment did >> *NOT* make a copy. > This is interesting. First I wanted to demonstrate to Roy that recent R > versions are smart enough not to make any copy during reshaping an > array. Then I put together an example (similar to yours) and realized > that after several reshapes, R starts to copy the array. So I had to > modify my suggestion... And now, I realized that this was an > RStudio-issue. At least on Linux, a standard R terminal behaves as you > described, however, RStudio (version 0.99.862, which is not the very > latest) tends to create copies (quite randomly, at least to me). If I > have time I will test this more thoroughly and file a report to RStudio > if it turns out to be a bug. Interesting, indeed. I can confirm the bugous Rstudio behavior using the latest version of Rstudio (64 bit Linux, Fedora 22) RStudio Version 0.99.891 ? ? 2009-2016 RStudio, Inc. The attached small R script is very transparent in demonstrating the problem. If you have a tracemem-enabled version of R, the output is even more revealing, inside Rstudio it gives> showAdr <- function(x) {+ if(capabilities("profmem")) { + tracemem(x) + } else { + cat("R version not configured for memory tracing\n") + .Internal(inspect(x))# also works w/o tracemem + } + }> x <- array(rnorm(47), dim = c(1000, 50, 40)) > showAdr(x)[1] "<0x7fad78b37010>"> dim(x) <- c(1000*50, 40) # *no* copyingtracemem[0x7fad78b37010 -> 0x7fad77bf4010]:> showAdr(x) # Rstudio "fails" and has copied x[1] "<0x7fad77bf4010>"> x[3] <- pitracemem[0x7fad77bf4010 -> 0x1ad05f50]:> showAdr(x)[1] "<0x1ad05f50>"> ## in R, R CMD BATCH, also from ESS: there is *no* copying > ## However, in Rstudio copying has happened! >Martin > Denes >> >> Indeed, R has become much smarter in these things in recent >> years ... not thanks to me, but very much thanks to >> Luke Tierney (from R-core), and also thanks to contributions from "outside", >> notably Tomas Kalibera. >> >> And hence:*NO* such strange workarounds are needed in this specific case: >> >> > Workaround: use data.table::setattr or bit::setattr to modify the >> > dimensions in place (i.e., without making a copy). Risk: if you modify >> > an object by reference, all other objects which point to the same memory >> > address will be modified silently, too. >> >> Martin Maechler, ETH Zurich (and R-core) >> >> > HTH, >> > Denes >> >> (generally, your contributions help indeed, Denes, thank you!) >> >> >> > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote: >> >> Hi All: >> >> >> >> I am working with a very large array. if noLat is the number of latitudes, noLon the number of longitudes and noTime the number of time periods, the array is of the form: >> >> >> >> myData[noLat, no Lon, noTime]. >> >> >> >> It is read in this way because that is how it is stored in a (series) of netcdf files. For the analysis I need to do, I need instead the array: >> >> >> >> myData[noLat*noLon, noTime]. Normally this would be easy: >> >> >> >> myData<- array(myData,dim=c(noLat*noLon,noTime)) >> >> >> >> My question is how does this command work in R - does it make a copy of the existing array, with different indices for the dimensions, or does it just redo the indices and leave the given array as is? The reason for this question is my array is 30GB in memory, and I don?t have enough space to have a copy of the array in memory. If the latter I will have to figure out a work around to bring in only part of the data at a time and put it into the proper locations. >> >> >> >> Thanks, >> >> >> >> -Roy