thr3ads.net - R help - [R] fast subsetting of lists in lists [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Alexander Senger

2010-Dec-07 14:47 UTC

[R] fast subsetting of lists in lists

Hello,


my data is contained in nested lists (which seems not necessarily to be
the best approach). What I need is a fast way to get subsets from the data.

An example:

test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6),
list(a = 7, b = 8, c = 9))

Now I would like to have all values in the named variables "a", that
is
the vector c(1, 4, 7). The best I could come up with is:

val <- sapply(1:3, function (i) {test[[i]]$a})

which is unfortunately not very fast. According to R-inferno this is due
to the fact that apply and its derivates do looping in R rather than
rely on C-subroutines as the common [-operator.

Does someone now a trick to do the same as above with the faster
built-in subsetting? Something like:

test[<somesubsettingmagic>]


Thank you for your advice


Alex

Gabor Grothendieck

2010-Dec-07 14:54 UTC

head link

[R] fast subsetting of lists in lists

On Tue, Dec 7, 2010 at 9:47 AM, Alexander Senger
<senger at physik.hu-berlin.de> wrote:> Hello,
>
>
> my data is contained in nested lists (which seems not necessarily to be
> the best approach). What I need is a fast way to get subsets from the data.
>
> An example:
>
> test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6),
> list(a = 7, b = 8, c = 9))
>
> Now I would like to have all values in the named variables "a",
that is
> the vector c(1, 4, 7). The best I could come up with is:
>
> val <- sapply(1:3, function (i) {test[[i]]$a})
>
> which is unfortunately not very fast. According to R-inferno this is due
> to the fact that apply and its derivates do looping in R rather than
> rely on C-subroutines as the common [-operator.
>
> Does someone now a trick to do the same as above with the faster
> built-in subsetting? Something like:
>
> test[<somesubsettingmagic>]
>
>
This does not involve apply.  You could time it to see if its any faster:
> test.un <- unlist(test)
> unname(test.un[names(test.un) == "a"])[1] 1 4 7

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Gerrit Eichner

2010-Dec-07 14:59 UTC

head link

[R] fast subsetting of lists in lists

Hello, Alexander,

does

utest <- unlist(test)
utest[ names( utest) == "a"]

come close to what you need?

Hth,

Gerrit


On Tue, 7 Dec 2010, Alexander Senger wrote:
> Hello,
>
>
> my data is contained in nested lists (which seems not necessarily to be
> the best approach). What I need is a fast way to get subsets from the data.
>
> An example:
>
> test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6),
> list(a = 7, b = 8, c = 9))
>
> Now I would like to have all values in the named variables "a",
that is
> the vector c(1, 4, 7). The best I could come up with is:
>
> val <- sapply(1:3, function (i) {test[[i]]$a})
>
> which is unfortunately not very fast. According to R-inferno this is due
> to the fact that apply and its derivates do looping in R rather than
> rely on C-subroutines as the common [-operator.
>
> Does someone now a trick to do the same as above with the faster
> built-in subsetting? Something like:
>
> test[<somesubsettingmagic>]
>
>
> Thank you for your advice
>
>
> Alex
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Alexander Senger

2010-Dec-07 17:12 UTC

head link

[R] fast subsetting of lists in lists

Hello Gerrit, Gabor,


thank you for your suggestion.

Unfortunately unlist seems to be rather expensive. A short test with one
of my datasets gives 0.01s for an extraction based on my approach and
5.6s for unlist alone. The reason seems to be that unlist relies on
lapply internally and does so recursively?

Maybe there is still another way to go?

Alex

Am 07.12.2010 15:59, schrieb Gerrit Eichner:> Hello, Alexander,
> 
> does
> 
> utest <- unlist(test)
> utest[ names( utest) == "a"]
> 
> come close to what you need?
> 
> Hth,
> 
> Gerrit
> 
> 
> On Tue, 7 Dec 2010, Alexander Senger wrote:
> 
>> Hello,
>>
>>
>> my data is contained in nested lists (which seems not necessarily to be
>> the best approach). What I need is a fast way to get subsets from the
>> data.
>>
>> An example:
>>
>> test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6),
>> list(a = 7, b = 8, c = 9))
>>
>> Now I would like to have all values in the named variables
"a", that is
>> the vector c(1, 4, 7). The best I could come up with is:
>>
>> val <- sapply(1:3, function (i) {test[[i]]$a})
>>
>> which is unfortunately not very fast. According to R-inferno this is
due
>> to the fact that apply and its derivates do looping in R rather than
>> rely on C-subroutines as the common [-operator.
>>
>> Does someone now a trick to do the same as above with the faster
>> built-in subsetting? Something like:
>>
>> test[<somesubsettingmagic>]
>>
>>
>> Thank you for your advice
>>
>>
>> Alex
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

Alexander Senger

2010-Dec-07 17:47 UTC

head link

[R] fast subsetting of lists in lists

I tried to hide the gory details as the structure of my datasets is
rather complicated. Basically its a long list of lists which in turn
contain character vectors, dates, numerics and dataframes, all named.
While the hierarchy is fixed neither the number of elements nor their
ordering is. But if I try to access a certain element, then I know it is
there and contains sensible data.
For a typical day of measurements the whole package weights around 1
GiB. How often and what I need to extract varies as the analyses is
rather dynamic.

As far as I can see a thorough refactoring of the datasets so that
everything is contained in one large dataframe might be a solution. But
I wouldn't be too unhappy if I could avoid this rather tedious work.

Alex


Am 07.12. 18:26, schrieb William Dunlap:> To find the fastest method you need to tell more
> about the constraints on your problem.
>    Do you always have a list of lists of scalars
>       or are the lists buried at various depths
>       or do the numeric vectors at the leaves have
>       various lengths?
>    If you always have a list of lists of scalars,
>       do the names always come in the same order?
>       (It may be faster to select by numeric position
>       than by name).
>    Do all the lists of numeric vectors contain an
>       element by the given name?
>    What is a typical size for the problem?  How
>       many times do you typically need to repeat
>       the solution?
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com  
> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Alexander Senger
>> Sent: Tuesday, December 07, 2010 9:12 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] fast subsetting of lists in lists
>>
>> Hello Gerrit, Gabor,
>>
>>
>> thank you for your suggestion.
>>
>> Unfortunately unlist seems to be rather expensive. A short 
>> test with one
>> of my datasets gives 0.01s for an extraction based on my approach and
>> 5.6s for unlist alone. The reason seems to be that unlist relies on
>> lapply internally and does so recursively?
>>
>> Maybe there is still another way to go?
>>
>> Alex
>>
>> Am 07.12.2010 15:59, schrieb Gerrit Eichner:
>>> Hello, Alexander,
>>>
>>> does
>>>
>>> utest <- unlist(test)
>>> utest[ names( utest) == "a"]
>>>
>>> come close to what you need?
>>>
>>> Hth,
>>>
>>> Gerrit
>>>
>>>
>>> On Tue, 7 Dec 2010, Alexander Senger wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> my data is contained in nested lists (which seems not 
>> necessarily to be
>>>> the best approach). What I need is a fast way to get 
>> subsets from the
>>>> data.
>>>>
>>>> An example:
>>>>
>>>> test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c
= 6),
>>>> list(a = 7, b = 8, c = 9))
>>>>
>>>> Now I would like to have all values in the named variables 
>> "a", that is
>>>> the vector c(1, 4, 7). The best I could come up with is:
>>>>
>>>> val <- sapply(1:3, function (i) {test[[i]]$a})
>>>>
>>>> which is unfortunately not very fast. According to 
>> R-inferno this is due
>>>> to the fact that apply and its derivates do looping in R 
>> rather than
>>>> rely on C-subroutines as the common [-operator.
>>>>
>>>> Does someone now a trick to do the same as above with the
faster
>>>> built-in subsetting? Something like:
>>>>
>>>> test[<somesubsettingmagic>]
>>>>
>>>>
>>>> Thank you for your advice
>>>>
>>>>
>>>> Alex
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

Henrik Bengtsson

2010-Dec-07 18:11 UTC

head link

[R] fast subsetting of lists in lists

First, subset 'test' once, e.g.

testT <- test[1:3];

and then use sapply() on that, e.g.

val <- sapply(testT, FUN=function (x) { x$a })

Then you can avoid one level of function calls, by

val <- sapply(testT, FUN="[[", "a")

Second, there is some overhead in "[[", "$" etc.  You can
use
.subset2() to avoid this, e.g.

val <- sapply(testT, FUN=.subset2, "a")

Third, it may be that using sapply() to structure you results is a bit
overkill.  If you know that the 'a' element is always of the same
dimension, you can do it yourself, e.g.

val <- lapply(testT, FUN=.subset2, "a")
val <- unlist(val, use.names=FALSE)   # use.names=FALSE is much faster than
TRUE

See what that does

/Henrik

On Tue, Dec 7, 2010 at 6:47 AM, Alexander Senger
<senger at physik.hu-berlin.de> wrote:> Hello,
>
>
> my data is contained in nested lists (which seems not necessarily to be
> the best approach). What I need is a fast way to get subsets from the data.
>
> An example:
>
> test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6),
> list(a = 7, b = 8, c = 9))
>
> Now I would like to have all values in the named variables "a",
that is
> the vector c(1, 4, 7). The best I could come up with is:
>
> val <- sapply(1:3, function (i) {test[[i]]$a})
>
> which is unfortunately not very fast. According to R-inferno this is due
> to the fact that apply and its derivates do looping in R rather than
> rely on C-subroutines as the common [-operator.
>
> Does someone now a trick to do the same as above with the faster
> built-in subsetting? Something like:
>
> test[<somesubsettingmagic>]
>
>
> Thank you for your advice
>
>
> Alex
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Dec 2010 - fast subsetting of lists in lists

[R] fast subsetting of lists in lists

[R] fast subsetting of lists in lists

[R] fast subsetting of lists in lists

[R] fast subsetting of lists in lists

[R] fast subsetting of lists in lists

[R] fast subsetting of lists in lists

Possibly Parallel Threads