Hello, my data is contained in nested lists (which seems not necessarily to be the best approach). What I need is a fast way to get subsets from the data. An example: test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6), list(a = 7, b = 8, c = 9)) Now I would like to have all values in the named variables "a", that is the vector c(1, 4, 7). The best I could come up with is: val <- sapply(1:3, function (i) {test[[i]]$a}) which is unfortunately not very fast. According to R-inferno this is due to the fact that apply and its derivates do looping in R rather than rely on C-subroutines as the common [-operator. Does someone now a trick to do the same as above with the faster built-in subsetting? Something like: test[<somesubsettingmagic>] Thank you for your advice Alex
On Tue, Dec 7, 2010 at 9:47 AM, Alexander Senger <senger at physik.hu-berlin.de> wrote:> Hello, > > > my data is contained in nested lists (which seems not necessarily to be > the best approach). What I need is a fast way to get subsets from the data. > > An example: > > test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6), > list(a = 7, b = 8, c = 9)) > > Now I would like to have all values in the named variables "a", that is > the vector c(1, 4, 7). The best I could come up with is: > > val <- sapply(1:3, function (i) {test[[i]]$a}) > > which is unfortunately not very fast. According to R-inferno this is due > to the fact that apply and its derivates do looping in R rather than > rely on C-subroutines as the common [-operator. > > Does someone now a trick to do the same as above with the faster > built-in subsetting? Something like: > > test[<somesubsettingmagic>] > >This does not involve apply. You could time it to see if its any faster:> test.un <- unlist(test) > unname(test.un[names(test.un) == "a"])[1] 1 4 7 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Hello, Alexander, does utest <- unlist(test) utest[ names( utest) == "a"] come close to what you need? Hth, Gerrit On Tue, 7 Dec 2010, Alexander Senger wrote:> Hello, > > > my data is contained in nested lists (which seems not necessarily to be > the best approach). What I need is a fast way to get subsets from the data. > > An example: > > test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6), > list(a = 7, b = 8, c = 9)) > > Now I would like to have all values in the named variables "a", that is > the vector c(1, 4, 7). The best I could come up with is: > > val <- sapply(1:3, function (i) {test[[i]]$a}) > > which is unfortunately not very fast. According to R-inferno this is due > to the fact that apply and its derivates do looping in R rather than > rely on C-subroutines as the common [-operator. > > Does someone now a trick to do the same as above with the faster > built-in subsetting? Something like: > > test[<somesubsettingmagic>] > > > Thank you for your advice > > > Alex > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hello Gerrit, Gabor, thank you for your suggestion. Unfortunately unlist seems to be rather expensive. A short test with one of my datasets gives 0.01s for an extraction based on my approach and 5.6s for unlist alone. The reason seems to be that unlist relies on lapply internally and does so recursively? Maybe there is still another way to go? Alex Am 07.12.2010 15:59, schrieb Gerrit Eichner:> Hello, Alexander, > > does > > utest <- unlist(test) > utest[ names( utest) == "a"] > > come close to what you need? > > Hth, > > Gerrit > > > On Tue, 7 Dec 2010, Alexander Senger wrote: > >> Hello, >> >> >> my data is contained in nested lists (which seems not necessarily to be >> the best approach). What I need is a fast way to get subsets from the >> data. >> >> An example: >> >> test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6), >> list(a = 7, b = 8, c = 9)) >> >> Now I would like to have all values in the named variables "a", that is >> the vector c(1, 4, 7). The best I could come up with is: >> >> val <- sapply(1:3, function (i) {test[[i]]$a}) >> >> which is unfortunately not very fast. According to R-inferno this is due >> to the fact that apply and its derivates do looping in R rather than >> rely on C-subroutines as the common [-operator. >> >> Does someone now a trick to do the same as above with the faster >> built-in subsetting? Something like: >> >> test[<somesubsettingmagic>] >> >> >> Thank you for your advice >> >> >> Alex >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>
I tried to hide the gory details as the structure of my datasets is rather complicated. Basically its a long list of lists which in turn contain character vectors, dates, numerics and dataframes, all named. While the hierarchy is fixed neither the number of elements nor their ordering is. But if I try to access a certain element, then I know it is there and contains sensible data. For a typical day of measurements the whole package weights around 1 GiB. How often and what I need to extract varies as the analyses is rather dynamic. As far as I can see a thorough refactoring of the datasets so that everything is contained in one large dataframe might be a solution. But I wouldn't be too unhappy if I could avoid this rather tedious work. Alex Am 07.12. 18:26, schrieb William Dunlap:> To find the fastest method you need to tell more > about the constraints on your problem. > Do you always have a list of lists of scalars > or are the lists buried at various depths > or do the numeric vectors at the leaves have > various lengths? > If you always have a list of lists of scalars, > do the names always come in the same order? > (It may be faster to select by numeric position > than by name). > Do all the lists of numeric vectors contain an > element by the given name? > What is a typical size for the problem? How > many times do you typically need to repeat > the solution? > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> -----Original Message----- >> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On Behalf Of Alexander Senger >> Sent: Tuesday, December 07, 2010 9:12 AM >> To: r-help at r-project.org >> Subject: Re: [R] fast subsetting of lists in lists >> >> Hello Gerrit, Gabor, >> >> >> thank you for your suggestion. >> >> Unfortunately unlist seems to be rather expensive. A short >> test with one >> of my datasets gives 0.01s for an extraction based on my approach and >> 5.6s for unlist alone. The reason seems to be that unlist relies on >> lapply internally and does so recursively? >> >> Maybe there is still another way to go? >> >> Alex >> >> Am 07.12.2010 15:59, schrieb Gerrit Eichner: >>> Hello, Alexander, >>> >>> does >>> >>> utest <- unlist(test) >>> utest[ names( utest) == "a"] >>> >>> come close to what you need? >>> >>> Hth, >>> >>> Gerrit >>> >>> >>> On Tue, 7 Dec 2010, Alexander Senger wrote: >>> >>>> Hello, >>>> >>>> >>>> my data is contained in nested lists (which seems not >> necessarily to be >>>> the best approach). What I need is a fast way to get >> subsets from the >>>> data. >>>> >>>> An example: >>>> >>>> test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6), >>>> list(a = 7, b = 8, c = 9)) >>>> >>>> Now I would like to have all values in the named variables >> "a", that is >>>> the vector c(1, 4, 7). The best I could come up with is: >>>> >>>> val <- sapply(1:3, function (i) {test[[i]]$a}) >>>> >>>> which is unfortunately not very fast. According to >> R-inferno this is due >>>> to the fact that apply and its derivates do looping in R >> rather than >>>> rely on C-subroutines as the common [-operator. >>>> >>>> Does someone now a trick to do the same as above with the faster >>>> built-in subsetting? Something like: >>>> >>>> test[<somesubsettingmagic>] >>>> >>>> >>>> Thank you for your advice >>>> >>>> >>>> Alex >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>
First, subset 'test' once, e.g. testT <- test[1:3]; and then use sapply() on that, e.g. val <- sapply(testT, FUN=function (x) { x$a }) Then you can avoid one level of function calls, by val <- sapply(testT, FUN="[[", "a") Second, there is some overhead in "[[", "$" etc. You can use .subset2() to avoid this, e.g. val <- sapply(testT, FUN=.subset2, "a") Third, it may be that using sapply() to structure you results is a bit overkill. If you know that the 'a' element is always of the same dimension, you can do it yourself, e.g. val <- lapply(testT, FUN=.subset2, "a") val <- unlist(val, use.names=FALSE) # use.names=FALSE is much faster than TRUE See what that does /Henrik On Tue, Dec 7, 2010 at 6:47 AM, Alexander Senger <senger at physik.hu-berlin.de> wrote:> Hello, > > > my data is contained in nested lists (which seems not necessarily to be > the best approach). What I need is a fast way to get subsets from the data. > > An example: > > test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6), > list(a = 7, b = 8, c = 9)) > > Now I would like to have all values in the named variables "a", that is > the vector c(1, 4, 7). The best I could come up with is: > > val <- sapply(1:3, function (i) {test[[i]]$a}) > > which is unfortunately not very fast. According to R-inferno this is due > to the fact that apply and its derivates do looping in R rather than > rely on C-subroutines as the common [-operator. > > Does someone now a trick to do the same as above with the faster > built-in subsetting? Something like: > > test[<somesubsettingmagic>] > > > Thank you for your advice > > > Alex > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >