Tim Hesterberg
2009-Jul-14 22:51 UTC
[Rd] Faster as.data.frame & save copy by doing names(x) <- NULL only if needed
A number of as.data.frame methods do names(x) <- NULL Replacing that with if(!is.null(names(x))) names(x) <- NULL appears to save making one copy of the data (based on tracemem and Rprofmem in a copy of R compiled with --enable-memory-profiling) and gives a modest but consistent boost in speed, e.g.: # old new # user system elapsed user system elapsed # integer 3.412 0.060 3.472 2.788 0.020 2.809 # numeric 6.212 0.160 6.374 4.852 0.080 5.132 # logical 3.484 0.052 3.699 2.808 0.028 2.834 # factor 4.433 0.020 4.547 2.929 0.020 2.964 These visible methods can be modified as noted above: as.data.frame.Date as.data.frame.POSIXct as.data.frame.complex as.data.frame.difftime as.data.frame.factor as.data.frame.integer as.data.frame.logical as.data.frame.numeric as.data.frame.numeric_version as.data.frame.ordered as.data.frame.raw as.data.frame.vector Here's the timing code (run in a copy of R without memory profiling): x <- 1:10^4 # integer system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE) x <- x + 0.0 # numeric system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE) x <- rep(c(TRUE,FALSE), length = 10^4) # logical system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE) x <- factor(rep(letters[1:10], length=10^4)) # factor system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE) I have not done timings where the inputs have names; that is rare in my experience. [[alternative HTML version deleted]]