Peng Yu
2009-Dec-09 04:28 UTC
[R] Significant performance difference between split of a data.frame and split of vectors
I have the following code, which tests the split on a data.frame and the split on each column (as vector) separately. The runtimes are of 10 time difference. When m and k increase, the difference become even bigger. I'm wondering why the performance on data.frame is so bad. Is it a bug in R? Can it be improved?> system.time(split(as.data.frame(x),f))user system elapsed 1.700 0.010 1.786> > system.time(lapply(+ 1:dim(x)[[2]] + , function(i) { + split(x[,i],f) + } + ) + ) user system elapsed 0.170 0.000 0.167 ########### m=30000 n=6 k=3000 set.seed(0) x=replicate(n,rnorm(m)) f=sample(1:k, size=m, replace=T) system.time(split(as.data.frame(x),f)) system.time(lapply( 1:dim(x)[[2]] , function(i) { split(x[,i],f) } ) )
David Winsemius
2009-Dec-09 04:37 UTC
[R] Significant performance difference between split of a data.frame and split of vectors
On Dec 8, 2009, at 11:28 PM, Peng Yu wrote:> I have the following code, which tests the split on a data.frame and > the split on each column (as vector) separately. The runtimes are of > 10 time difference. When m and k increase, the difference become even > bigger. > > I'm wondering why the performance on data.frame is so bad. Is it a bug > in R? Can it be improved?You might want to look at the data.table package. The author calinms significant speed improvements over dta.frames -- David.> >> system.time(split(as.data.frame(x),f)) > user system elapsed > 1.700 0.010 1.786 >> >> system.time(lapply( > + 1:dim(x)[[2]] > + , function(i) { > + split(x[,i],f) > + } > + ) > + ) > user system elapsed > 0.170 0.000 0.167 > > ########### > m=30000 > n=6 > k=3000 > > set.seed(0) > x=replicate(n,rnorm(m)) > f=sample(1:k, size=m, replace=T) > > system.time(split(as.data.frame(x),f)) > > system.time(lapply( > 1:dim(x)[[2]] > , function(i) { > split(x[,i],f) > } > ) > ) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT