Nick Sabbe
2010-Dec-07 09:03 UTC
[R] Dataframe from list of similar lists: not _a_ way, but _the best_ way
Hi All. I often find myself in this situation: . Based on some vector (or list) of values, I need to calculate a few new values for each of them, where some of the new values are numbers, but some are more of descriptive nature (so: character strings) . So I use e.g. sapply, passing a custom function that returns a list with all the calculated values . The result of this is: a list (=the return value of sapply) of lists, that all have the same kind of named values A silly example: list.of.lists<-sapply(1:10, function(nr){list(org=nr, chr=as.character(nr))}) It seems rather obvious that the result would be better structured as a dataframe. Now I know a few ways to do this (using do.call), but I fear most of these are rather bad in performance: I suspect all the data is being repetitively copied which may be slow. So, my question to the specialists: . Is the above way of working reasonable for this kind of problem? Or would you suggest otherwise? . What would be the best (as in: quickest) way of transforming this list of lists to a dataframe? The answer to this is probably based upon knowledge of the inner workings of R? Or is there any way in which this depends on the specifics of my function (for nontrivial functions and list sizes)? Thanks! Nick Sabbe -- ping: nick.sabbe@ugent.be link: <http://biomath.ugent.be/> http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]]
Brian Diggs
2010-Dec-07 19:18 UTC
[R] Dataframe from list of similar lists: not _a_ way, but _the best_ way
On 12/7/2010 1:03 AM, Nick Sabbe wrote:> Hi All. > > I often find myself in this situation: > > . Based on some vector (or list) of values, I need to calculate a > few new values for each of them, where some of the new values are numbers, > but some are more of descriptive nature (so: character strings) > > . So I use e.g. sapply, passing a custom function that returns a > list with all the calculated values > > . The result of this is: a list (=the return value of sapply) of > lists, that all have the same kind of named values > > A silly example: > > list.of.lists<-sapply(1:10, function(nr){list(org=nr, > chr=as.character(nr))})Actually, this is not a list of lists, but rather a list of vectors with dimensions. I didn't know such a thing existed, but obviously it does.> It seems rather obvious that the result would be better structured as a > dataframe. > > Now I know a few ways to do this (using do.call), but I fear most of these > are rather bad in performance: I suspect all the data is being repetitively > copied which may be slow. > > So, my question to the specialists: > > . Is the above way of working reasonable for this kind of problem? > Or would you suggest otherwise? > > . What would be the best (as in: quickest) way of transforming this > list of lists to a dataframe? The answer to this is probably based upon > knowledge of the inner workings of R? Or is there any way in which this > depends on the specifics of my function (for nontrivial functions and list > sizes)?I don't know that this is best (in terms of fastest and/or least memory usage), but to me the following is "best" in that it hands off the problem to a package that is designed to handle such problems, so presumably does a better job than any one-off approach. library("plyr") DF <- ldply(1:10, function(nr){data.frame(org=nr, chr=as.character(nr))}) Note that the internal function returns a data.frame rather than a list, and the *dply functions automatically stitch the individual data.frames together. Check out the documentation to the plyr package.> Thanks! > > Nick Sabbe > > -- > ping: nick.sabbe at ugent.be > link:<http://biomath.ugent.be/> http://biomath.ugent.be > wink: A1.056, Coupure Links 653, 9000 Gent > ring: 09/264.59.36 > -- Do Not Disapprove-- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University