Hi, In the example below why is d 10 times bigger than m, according to object.size ? It also takes around 10 times as long to create, which fits with object.size() being truthful. gcinfo(TRUE) also indicates a great deal more garbage collector activity caused by data.frame() than matrix(). $ R --vanilla ....> nr = 1000000 > system.time(m<<-matrix(integer(1), nrow=nr, ncol=2))[1] 0.22 0.01 0.23 0.00 0.00> system.time(d<<-data.frame(a=integer(nr), b=integer(nr)))[1] 2.81 0.20 3.01 0.00 0.00 # 10 times longer> dim(m)[1] 1000000 2> dim(d)[1] 1000000 2 # same dimensions> storage.mode(m)[1] "integer"> sapply(d, storage.mode)a b "integer" "integer" # same storage.mode> object.size(m)/1024^2[1] 7.629616> object.size(d)/1024^2[1] 76.29482 # but 10 times bigger> sum(sapply(d, object.size))/1024^2[1] 7.629501 # or is it ? If its not really 10 times bigger, why 10 times longer above ?> versionplatform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 1.1 year 2005 month 06 day 20 language R Many thanks in advance! Matthew [[alternative HTML version deleted]]
Matthew Dowle <mdowle at concordiafunds.com> writes:> Hi, > > In the example below why is d 10 times bigger than m, according to > object.size ? It also takes around 10 times as long to create, which fits > with object.size() being truthful. gcinfo(TRUE) also indicates a great deal > more garbage collector activity caused by data.frame() than matrix(). > > $ R --vanilla > .... > > nr = 1000000 > > system.time(m<<-matrix(integer(1), nrow=nr, ncol=2)) > [1] 0.22 0.01 0.23 0.00 0.00 > > system.time(d<<-data.frame(a=integer(nr), b=integer(nr))) > [1] 2.81 0.20 3.01 0.00 0.00 # 10 times longer > > > dim(m) > [1] 1000000 2 > > dim(d) > [1] 1000000 2 # same dimensions > > > storage.mode(m) > [1] "integer" > > sapply(d, storage.mode) > a b > "integer" "integer" # same storage.mode > > > object.size(m)/1024^2 > [1] 7.629616 > > object.size(d)/1024^2 > [1] 76.29482 # but 10 times bigger > > > sum(sapply(d, object.size))/1024^2 > [1] 7.629501 # or is it ? If its not > really 10 times bigger, why 10 times longer above ?Row names!!> r <- as.character(1:1e6) > object.size(r)[1] 72000056> object.size(r)/1024^2[1] 68.6646 'nuff said? -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
That explains it. Thanks. I don't need rownames though, as I'll only ever use integer subscripts. Is there anyway to drop them, or even better not create them in the first place? The memory saved (90%) by not having them and 10 times speed up would be very useful. I think I need a data.frame rather than a matrix because I have columns of different types in real life.> rownames(d) = NULLError in "dimnames<-.data.frame"(`*tmp*`, value = list(NULL, c("a", "b" : invalid 'dimnames' given for data frame -----Original Message----- From: pd at pubhealth.ku.dk [mailto:pd at pubhealth.ku.dk] On Behalf Of Peter Dalgaard Sent: 08 December 2005 18:57 To: Matthew Dowle Cc: 'r-help at stat.math.ethz.ch' Subject: Re: [R] data.frame() size Matthew Dowle <mdowle at concordiafunds.com> writes:> Hi, > > In the example below why is d 10 times bigger than m, according to > object.size ? It also takes around 10 times as long to create, which > fits with object.size() being truthful. gcinfo(TRUE) also indicates a > great deal more garbage collector activity caused by data.frame() than > matrix(). > > $ R --vanilla > .... > > nr = 1000000 > > system.time(m<<-matrix(integer(1), nrow=nr, ncol=2)) > [1] 0.22 0.01 0.23 0.00 0.00 > > system.time(d<<-data.frame(a=integer(nr), b=integer(nr))) > [1] 2.81 0.20 3.01 0.00 0.00 # 10 times longer > > > dim(m) > [1] 1000000 2 > > dim(d) > [1] 1000000 2 # same dimensions > > > storage.mode(m) > [1] "integer" > > sapply(d, storage.mode) > a b > "integer" "integer" # same storage.mode > > > object.size(m)/1024^2 > [1] 7.629616 > > object.size(d)/1024^2 > [1] 76.29482 # but 10 times bigger > > > sum(sapply(d, object.size))/1024^2 > [1] 7.629501 # or is it ? If its not > really 10 times bigger, why 10 times longer above ?Row names!!> r <- as.character(1:1e6) > object.size(r)[1] 72000056> object.size(r)/1024^2[1] 68.6646 'nuff said? -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907