Tim Hesterberg
2009-Jul-14 22:51 UTC
[Rd] Faster as.data.frame & save copy by doing names(x) <- NULL only if needed
A number of as.data.frame methods do
names(x) <- NULL
Replacing that with
if(!is.null(names(x)))
names(x) <- NULL
appears to save making one copy of the data
(based on tracemem and Rprofmem in a copy of R compiled
with --enable-memory-profiling)
and gives a modest but consistent boost in speed, e.g.:
# old new
# user system elapsed user system elapsed
# integer 3.412 0.060 3.472 2.788 0.020 2.809
# numeric 6.212 0.160 6.374 4.852 0.080 5.132
# logical 3.484 0.052 3.699 2.808 0.028 2.834
# factor 4.433 0.020 4.547 2.929 0.020 2.964
These visible methods can be modified as noted above:
as.data.frame.Date
as.data.frame.POSIXct
as.data.frame.complex
as.data.frame.difftime
as.data.frame.factor
as.data.frame.integer
as.data.frame.logical
as.data.frame.numeric
as.data.frame.numeric_version
as.data.frame.ordered
as.data.frame.raw
as.data.frame.vector
Here's the timing code (run in a copy of R without memory profiling):
x <- 1:10^4 # integer
system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE)
x <- x + 0.0 # numeric
system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE)
x <- rep(c(TRUE,FALSE), length = 10^4) # logical
system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE)
x <- factor(rep(letters[1:10], length=10^4)) # factor
system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE)
I have not done timings where the inputs have names;
that is rare in my experience.
[[alternative HTML version deleted]]
