Frederic F
2011-Aug-10 00:58 UTC
[R] How to quickly convert a data.frame into a structure of lists
Hello, This is my first project in R, so I'm trying to work 'the R way', but it still feels awkward sometimes. The problem that I'm facing right now is that I need to convert a data.frame into a structure of lists. The data.frame has columns in the order of tens (I need to focus on only three of them) and rows in the order of millions. So it's quite a big dataset. Let say that the columns of interest are A, B and C. I need to take the data.frame and construct a structure of list where I have a list for every level of A, those list all contain lists for every levels of B, and the 'b-lists' contains all the values of C that match the corresponding levels of A and B. So, I should be able to write something like this:> MyData at list_structure$x_level_of_A$y_level_of_Band get a vector of the values of C that were on rows where A=x_level_of_A and B=y_level_of_B. My first attempt was to use two imbricated "lapply" functions running something like this: list_structure<-lapply(levels(A) function(x) { as.character(x) = lapply( levels(B), function(y) { as.character(y) = C[A==x & B==y] }) }) The real code was not quite as simple, but I managed to have it work, and it worked well on my first dataset (where A and B had only few levels). I was quite happy... but the imbricated loops killed me on a second dataset where A had several thousand levels. So I tried something else. My second attempt was to go through every row of the data.frame and append the value to the appropriate vector. I first initialized a structure of lists ending with NULL vector, then I did something like this: for (i in 1:nrow(DataFrame)) { eval( substitute( append(MyData at list_structure$a_value$b_value, c_value), list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]), c_value=as.character(DF$C[i])) ) ) } This works... but way too slowly for my purpose. I would like to know if there is a better road to take to do this transformation. Or, if there is a way of speeding one of the two solutions that I have tried. Thank you very much for your help! (And in your replies, please remember that this is my first project in R, so don't hesitate to state the obvious if it seems like I am missing it!) Frederic -- View this message in context: http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html Sent from the R help mailing list archive at Nabble.com.
Duncan Mackay
2011-Aug-10 06:15 UTC
[R] How to quickly convert a data.frame into a structure of lists
Hi Something to get you started ? as.list a data.frame can be regarded as a 2 dimensional array of list vectors df = data.frame(a=1:2,b=2:1,c=4:5,d=9:10) as.list(df[,1:3]) $a [1] 1 2 $b [1] 2 1 $c [1] 4 5 see also http://cran.ms.unimelb.edu.au/doc/contrib/Burns-unwilling_S.pdf Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home mackay at northnet.com.au At 10:58 10/08/2011, you wrote:>Hello, > >This is my first project in R, so I'm trying to work 'the R way', but it >still feels awkward sometimes. > >The problem that I'm facing right now is that I need to convert a data.frame >into a structure of lists. The data.frame has columns in the order of tens >(I need to focus on only three of them) and rows in the order of millions. >So it's quite a big dataset. >Let say that the columns of interest are A, B and C. I need to take the >data.frame and construct a structure of list where I have a list for every >level of A, those list all contain lists for every levels of B, and the >'b-lists' contains all the values of C that match the corresponding levels >of A and B. >So, I should be able to write something like this: > > MyData at list_structure$x_level_of_A$y_level_of_B >and get a vector of the values of C that were on rows where A=x_level_of_A >and B=y_level_of_B. > >My first attempt was to use two imbricated "lapply" functions running >something like this: > >list_structure<-lapply(levels(A) function(x) { > as.character(x) = lapply( levels(B), function(y) { > as.character(y) = C[A==x & B==y] > }) >}) > >The real code was not quite as simple, but I managed to have it work, and it >worked well on my first dataset (where A and B had only few levels). I was >quite happy... but the imbricated loops killed me on a second dataset where >A had several thousand levels. So I tried something else. > >My second attempt was to go through every row of the data.frame and append >the value to the appropriate vector. > >I first initialized a structure of lists ending with NULL vector, then I did >something like this: > >for (i in 1:nrow(DataFrame)) { > eval( > substitute( > append(MyData at list_structure$a_value$b_value, c_value), > list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]), >c_value=as.character(DF$C[i])) > ) > ) >} > >This works... but way too slowly for my purpose. > >I would like to know if there is a better road to take to do this >transformation. Or, if there is a way of speeding one of the two solutions >that I have tried. > >Thank you very much for your help! > >(And in your replies, please remember that this is my first project in R, so >don't hesitate to state the obvious if it seems like I am missing it!) > >Frederic > >-- >View this message in context: >http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html >Sent from the R help mailing list archive at Nabble.com. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Duncan Murdoch
2011-Aug-10 11:08 UTC
[R] How to quickly convert a data.frame into a structure of lists
I would use the tapply function (which is designed for the case in which data exists for most pairs of the levels of A and B) or the reshape::sparseby function, or something else in the reshape package. These won't give you exactly the structure you were asking for, but they will separate the data properly. By the way, it's a good idea when posting a question to post a simple example; then other solutions can be illustrated on the same example. It doesn't need to contain millions of rows. Duncan Murdoch On 11-08-09 8:58 PM, Frederic F wrote: > Hello, > > This is my first project in R, so I'm trying to work 'the R way', but it > still feels awkward sometimes. > > The problem that I'm facing right now is that I need to convert a data.frame > into a structure of lists. The data.frame has columns in the order of tens > (I need to focus on only three of them) and rows in the order of millions. > So it's quite a big dataset. > Let say that the columns of interest are A, B and C. I need to take the > data.frame and construct a structure of list where I have a list for every > level of A, those list all contain lists for every levels of B, and the > 'b-lists' contains all the values of C that match the corresponding levels > of A and B. > So, I should be able to write something like this: >> MyData at list_structure$x_level_of_A$y_level_of_B > and get a vector of the values of C that were on rows where A=x_level_of_A > and B=y_level_of_B. > > My first attempt was to use two imbricated "lapply" functions running > something like this: > > list_structure<-lapply(levels(A) function(x) { > as.character(x) = lapply( levels(B), function(y) { > as.character(y) = C[A==x& B==y] > }) > }) > > The real code was not quite as simple, but I managed to have it work, and it > worked well on my first dataset (where A and B had only few levels). I was > quite happy... but the imbricated loops killed me on a second dataset where > A had several thousand levels. So I tried something else. > > My second attempt was to go through every row of the data.frame and append > the value to the appropriate vector. > > I first initialized a structure of lists ending with NULL vector, then I did > something like this: > > for (i in 1:nrow(DataFrame)) { > eval( > substitute( > append(MyData at list_structure$a_value$b_value, c_value), > list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]), > c_value=as.character(DF$C[i])) > ) > ) > } > > This works... but way too slowly for my purpose. > > I would like to know if there is a better road to take to do this > transformation. Or, if there is a way of speeding one of the two solutions > that I have tried. > > Thank you very much for your help! > > (And in your replies, please remember that this is my first project in R, so > don't hesitate to state the obvious if it seems like I am missing it!) > > Frederic > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dennis Murphy
2011-Aug-10 13:36 UTC
[R] How to quickly convert a data.frame into a structure of lists
To borrow shamelessly from one of the prominent helpers on this list: "What is the problem you're trying to solve?" (attribution: Jim Holtman) I have the sense you want to do something over many subsets of your data frame. If so, breaking things up into lists of lists of lists is not necessarily productive, nor may it be necessary to use loops explicitly, depending on the nature of what you want to do. If you're more explicit about the nature of your task, it's entirely possible that there may be a nice 'R way' to do it. Read the posting guide and if at all possible, provide a small, reproducible example that demonstrates what you want to accomplish. (See ?dput to learn how to transmit data by e-mail.) HTH, Dennis On Tue, Aug 9, 2011 at 5:58 PM, Frederic F <fournier.frederic at gmail.com> wrote:> Hello, > > This is my first project in R, so I'm trying to work 'the R way', but it > still feels awkward sometimes. > > The problem that I'm facing right now is that I need to convert a data.frame > into a structure of lists. The data.frame has columns in the order of tens > (I need to focus on only three of them) and rows in the order of millions. > So it's quite a big dataset. > Let say that the columns of interest are A, B and C. I need to take the > data.frame and construct a structure of list where I have a list for every > level of A, those list all contain lists for every levels of B, and the > 'b-lists' contains all the values of C that match the corresponding levels > of A and B. > So, I should be able to write something like this: >> MyData at list_structure$x_level_of_A$y_level_of_B > and get a vector of the values of C that were on rows where A=x_level_of_A > and B=y_level_of_B. > > My first attempt was to use two imbricated "lapply" functions running > something like this: > > list_structure<-lapply(levels(A) function(x) { > ?as.character(x) = lapply( levels(B), function(y) { > ? ?as.character(y) = C[A==x & B==y] > ?}) > }) > > The real code was not quite as simple, but I managed to have it work, and it > worked well on my first dataset (where A and B had only few levels). I was > quite happy... but the imbricated loops killed me on a second dataset where > A had several thousand levels. So I tried something else. > > My second attempt was to go through every row of the data.frame and append > the value to the appropriate vector. > > I first initialized a structure of lists ending with NULL vector, then I did > something like this: > > for (i in 1:nrow(DataFrame)) { > ?eval( > ? ?substitute( > ? ? ?append(MyData at list_structure$a_value$b_value, c_value), > ? ? ?list(a_value=as.character(DF$A[i]), b_value=as.character(DF$B[i]), > c_value=as.character(DF$C[i])) > ? ?) > ?) > } > > This works... but way too slowly for my purpose. > > I would like to know if there is a better road to take to do this > transformation. Or, if there is a way of speeding one of the two solutions > that I have tried. > > Thank you very much for your help! > > (And in your replies, please remember that this is my first project in R, so > don't hesitate to state the obvious if it seems like I am missing it!) > > Frederic > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-quickly-convert-a-data-frame-into-a-structure-of-lists-tp3731746p3731746.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >