abelikoff at gmail.com
2009-Dec-09 07:35 UTC
[Rd] reshape() makes R run out of memory (PR#14121)
Full_Name: Alexander L. Belikoff Version: 2.8.1 OS: Ubuntu 9.04 (x86_64) Submission from: (NULL) (67.244.71.200) I'm trying to reshape the following data frame: ID DATE1 DATE2 VALUE_TYPE VALUE 'abcd1233' 2009-11-12 2009-12-23 'TYPE1' 123.45 ... VALUE_TYPE is a string and is a factor with only 2 values (say TYPE1 and TYPE2). I need to transform it into the following data frame ("wide" transpose) based on common ID and DATEs: ID DATE1 DATE2 VALUE.TYPE1 VALUE.TYPE2 'abcd1233' 2009-11-12 2009-12-23 123.45 NA ... Using stock reshape() as follows: tbl2 <- reshape(tbl, direction = "wide", idvar = c("ID", "DATE1", "DATE2"), timevar = "VALUE_TYPE"); On a toy data frame this works fine. On a real one with 4.7 million entries (although about 70% of VALUEs are NA) it runs out of memory: Error: cannot allocate vector of size 4.8 Gb When the real data frame is loaded the R process takes about 200Mb of virtual memory. The machine has 4 Gb of RAM. I've posted a .Rdata file with the data frame in question at http://belikoff.net/stuff/other/reshape_test.Rdata.gz P.S. Just checked R 2.10.0 using an Intel PC with 2Gb RAM running Xp Pro (32 bit):> tbl2 <- reshape(tbl, direction = "wide", idvar = c("ID", "DATE1", "DATE2"),timevar = "VALUE_TYPE"); Error: cannot allocate vector of size 53.9 Mb In addition: Warning messages: 1: In format.POSIXlt(as.POSIXlt(x), ...) : Reached total allocation of 1535Mb: see help(memory.size) 2: In format.POSIXlt(as.POSIXlt(x), ...) : Reached total allocation of 1535Mb: see help(memory.size) 3: In format.POSIXlt(as.POSIXlt(x), ...) : Reached total allocation of 1535Mb: see help(memory.size) 4: In format.POSIXlt(as.POSIXlt(x), ...) : Reached total allocation of 1535Mb: see help(memory.size) 5: In format.POSIXlt(as.POSIXlt(x), ...) : Reached total allocation of 1535Mb: see help(memory.size) 6: In format.POSIXlt(as.POSIXlt(x), ...) : Reached total allocation of 1535Mb: see help(memory.size)>
abelikoff at gmail.com wrote:> Full_Name: Alexander L. Belikoff > Version: 2.8.1 > OS: Ubuntu 9.04 (x86_64) > Submission from: (NULL) (67.244.71.200) > > > I'm trying to reshape the following data frame: > > ID DATE1 DATE2 VALUE_TYPE VALUE > 'abcd1233' 2009-11-12 2009-12-23 'TYPE1' 123.45 > ... > > VALUE_TYPE is a string and is a factor with only 2 values (say TYPE1 and TYPE2). > I need to transform it into the following data frame ("wide" transpose) based on > common ID and DATEs: > > ID DATE1 DATE2 VALUE.TYPE1 VALUE.TYPE2 > 'abcd1233' 2009-11-12 2009-12-23 123.45 NA > ... > > Using stock reshape() as follows: > > tbl2 <- reshape(tbl, direction = "wide", idvar = c("ID", "DATE1", "DATE2"), > timevar = "VALUE_TYPE"); > > On a toy data frame this works fine. On a real one with 4.7 million entries > (although about 70% of VALUEs are NA) it runs out of memory: > > Error: cannot allocate vector of size 4.8 Gb > > When the real data frame is loaded the R process takes about 200Mb of virtual > memory. The machine has 4 Gb of RAM. > > I've posted a .Rdata file with the data frame in question at > http://belikoff.net/stuff/other/reshape_test.Rdata.gz > > > P.S. Just checked R 2.10.0 using an Intel PC with 2Gb RAM running Xp Pro (32 > bit): > >> tbl2 <- reshape(tbl, direction = "wide", idvar = c("ID", "DATE1", "DATE2")c("ID", "DATE1", "DATE2"), > timevar = "VALUE_TYPE"); > Error: cannot allocate vector of size 53.9 Mb > In addition: Warning messages:.... Yes. The culprit would seem to be interaction(), as in > x <- y <- z <- 1:999 > i <- interaction(x,y,z, drop=TRUE) Error: cannot allocate vector of size 3.7 Gb which is happening due to the occurrence of three idvar variables. This works basically as interaction(x,y,z)[,drop=TRUE], i.e. it first creates a factor with 999^3 levels, and removes the empty levels afterward. In the absense of a better interaction(), you might try making your own single idvar as do.call("paste",tbl[,c("ID", "DATE1", "DATE2")]) or so. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907