Hi, As far as I can tell data.frame class adds two features to those of lists: * matrix structure via [,] and [,]<- operators (well, I know these are actually "["(i, j, ...), not "[,]"). * row names attribute. It seems that the overhead of the support for the row names, both computational and RAM-wise, is rather non-trivial. I frequently subscript from a data.frame, i.e. use [,] on data frames, and my timing shows that the equivalent list operation is about 7 times faster, see below. On the other hand, at least in my usage pattern, I really rarely benefit from the row names attribute, so as far as I am concerned row names is just an overhead. (Of course the speed difference may be due to other factors, the only thing I can tell is that subscripting is very slow in data frames relative to in lists). I thought of writing a new class, say lightweight.data.frame, that would be polymorphic with the existing data.frame class. The class would inherit from "list" and implement [,], [,]<- operators. It would also implement the "rownames" function that would return seq(nrow(x)), etc. It should also implement as.data.frame to avoid the overhead of conversion to a full-blown data.frame in calls like lm(y ~ x, data=myLightweightDataframe). Has anyone thought of this? Can you see any potential problems? Thanks, Vadim P.S. These are the timing results comparing data.frame operations to those of lists # make a 1e6 * 5 list> system.time(x <- lapply(seq(5), function(x) rnorm(1e6)))[1] 4.46 0.10 4.57 0.00 0.00 # convert it to a data.frame> system.time(y <- as.data.frame(x))[1] 49.17 1.25 50.61 0.00 0.00 # do an equivalent of x[-1,] on the list> i <- seq(2, nrow(y)); system.time(x.sub <- lapply(x, function(x)x[i])) [1] 0.19 0.15 0.35 0.00 0.00 # do an equivalent of x[-1,] on the data.frame> i <- seq(2, nrow(y)); system.time(y.sub <- y[i,])[1] 2.08 0.56 2.64 0.00 0.00> 2.64/0.35[1] 7.542857
Vadim Ogranovich <vograno <at> evafunds.com> writes: : : Hi, : : As far as I can tell data.frame class adds two features to those of : lists: : * matrix structure via [,] and [,]<- operators (well, I know these are : actually "["(i, j, ...), not "[,]"). : * row names attribute. : : It seems that the overhead of the support for the row names, both : computational and RAM-wise, is rather non-trivial. I frequently : subscript from a data.frame, i.e. use [,] on data frames, and my timing : shows that the equivalent list operation is about 7 times faster, see : below. : : On the other hand, at least in my usage pattern, I really rarely benefit : from the row names attribute, so as far as I am concerned row names is : just an overhead. (Of course the speed difference may be due to other : factors, the only thing I can tell is that subscripting is very slow in : data frames relative to in lists). : : I thought of writing a new class, say lightweight.data.frame, that would : be polymorphic with the existing data.frame class. The class would : inherit from "list" and implement [,], [,]<- operators. It would also : implement the "rownames" function that would return seq(nrow(x)), etc. : It should also implement as.data.frame to avoid the overhead of : conversion to a full-blown data.frame in calls like lm(y ~ x, : data=myLightweightDataframe). The next version of zoo (currently in test) supports lists in the data argument of lm and can also merge zoo series into a list (or to another zoo series, as it does now). Would that be a sufficient alternative?
Don't know whether it will suffice. Lm() was just an example. Are you going to re-write lm(), e.g. lm.zoo(), to accept lists? I am more thinking of a general purpose class that would pass wherever data.frame is expected. Probably I need to wait until the new version of zoo comes out. At the very least it could be a good prototype for what I have in mind. Thanks for the info, Vadim> -----Original Message----- > From: r-devel-bounces@stat.math.ethz.ch > [mailto:r-devel-bounces@stat.math.ethz.ch] On Behalf Of Gabor > Grothendieck > Sent: Thursday, November 25, 2004 7:42 PM > To: r-devel@stat.math.ethz.ch > Subject: Re: [Rd] Lightweight data frame class > > Vadim Ogranovich <vograno <at> evafunds.com> writes: > > : > : Hi, > : > : As far as I can tell data.frame class adds two features to those of > : lists: > : * matrix structure via [,] and [,]<- operators (well, I > know these are > : actually "["(i, j, ...), not "[,]"). > : * row names attribute. > : > : It seems that the overhead of the support for the row names, both > : computational and RAM-wise, is rather non-trivial. I frequently > : subscript from a data.frame, i.e. use [,] on data frames, > and my timing > : shows that the equivalent list operation is about 7 times > faster, see > : below. > : > : On the other hand, at least in my usage pattern, I really > rarely benefit > : from the row names attribute, so as far as I am concerned > row names is > : just an overhead. (Of course the speed difference may be > due to other > : factors, the only thing I can tell is that subscripting is > very slow in > : data frames relative to in lists). > : > : I thought of writing a new class, say > lightweight.data.frame, that would > : be polymorphic with the existing data.frame class. The class would > : inherit from "list" and implement [,], [,]<- operators. It > would also > : implement the "rownames" function that would return > seq(nrow(x)), etc. > : It should also implement as.data.frame to avoid the overhead of > : conversion to a full-blown data.frame in calls like lm(y ~ x, > : data=myLightweightDataframe). > > The next version of zoo (currently in > test) supports lists in the data argument of lm and can also > merge zoo series into a list (or to another zoo series, as it > does now). > Would that be a sufficient alternative? > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >