thr3ads.net - R help - [R] subscripts in lists [Aug 2003]

If this information is useful, please help other people find it:
Share via:

Chris Knight

2003-Aug-11 12:21 UTC

[R] subscripts in lists

I am tying myself in knots over subscripts when applied to lists

I have a list along the lines of:

lis<-list(c("a","b","next","want1","c"),c("d",
"next", "want2", "a"))
>From which I want to extract the values following "next" in each member of the list, i.e. something along the lines of answer<-c( 
"want1", "want2"). Is this possible without using loops? The
elements
of lis are of different lengths and "next" occurs once per element 
somewhere in the middle.

The thought process behind this is:

It's easy enough to do it for an individual element of the list:
lis[[1]][match("next",lis[[1]])+1]

but how to do that to all elements of the list? I can get their 
indices e.g. as a list using lapply:

lapply(lapply(lis,match,x="next"),"+",y=1)

or return a particular subscript using:
lapply(lis,"[", i=3)

but don't see how one could combine the two to get
answer<-c("want1",
"want2") without resorting to:

answer<-character
for(s in 1:length(lis)){
answer<-c(answer,lis[[s]][match("next",lis[[s]])+1])
}

Am I missing something obvious (or non-obvious)? I suppose the 
secondary question is 'should I care?'. I am intending to use this on 
hundreds of lists sometimes with tens of thousands of elements, with 
more than one version of "next" in each, so felt that  the lower 
efficiency of looping was likely to matter.
Any help much appreciated,

Chris
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Christopher G. Knight               Tel:+44 (0)1865 275 111
Department of Plant Sciences              +44 (0)1865 275 790
South Parks Road
Oxford          OX1 3RB
UK                                                                 ` 
? . , ,><(((?>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Liaw, Andy

2003-Aug-11 12:33 UTC

head link

[R] subscripts in lists

> sapply(lis, function(x) x[which(x == "next") + 1])[1] "want1" "want2"

HTH,
Andy
> From: Chris Knight 
> 
> I am tying myself in knots over subscripts when applied to lists
> 
> I have a list along the lines of:
> 
>
lis<-list(c("a","b","next","want1","c"),c("d",
"next", "want2", "a"))
> 
> >From which I want to extract the values following "next" in
each
> member of the list, i.e. something along the lines of answer<-c( 
> "want1", "want2"). Is this possible without using
loops? The elements
> of lis are of different lengths and "next" occurs once per
element
> somewhere in the middle.
> 
> The thought process behind this is:
> 
> It's easy enough to do it for an individual element of the 
> list: lis[[1]][match("next",lis[[1]])+1]
> 
> but how to do that to all elements of the list? I can get their 
> indices e.g. as a list using lapply:
> 
> lapply(lapply(lis,match,x="next"),"+",y=1)
> 
> or return a particular subscript using:
> lapply(lis,"[", i=3)
> 
> but don't see how one could combine the two to get
answer<-c("want1",
> "want2") without resorting to:
> 
> answer<-character
> for(s in 1:length(lis)){
> answer<-c(answer,lis[[s]][match("next",lis[[s]])+1])
> }
> 
> Am I missing something obvious (or non-obvious)? I suppose the 
> secondary question is 'should I care?'. I am intending to use this
on
> hundreds of lists sometimes with tens of thousands of elements, with 
> more than one version of "next" in each, so felt that  the lower 
> efficiency of looping was likely to matter.
> Any help much appreciated,
> 
> Chris
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Dr. Christopher G. Knight               Tel:+44 (0)1865 275 111
> Department of Plant Sciences              +44 (0)1865 275 790
> South Parks Road
> Oxford          OX1 3RB
> UK                                                                 ` 
> ? . , ,><(((?>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
>

Richard A. O'Keefe

2003-Aug-11 22:24 UTC

head link

[R] subscripts in lists

Chris Knight <christopher.knight at plant-sciences.oxford.ac.uk> has
	
	lis<-list(c("a","b","next","want1","c"),c("d",
"next", "want2", "a"))
	
and wants c("want1","want2")


Step 1:
    inx <- sapply(lis, function(x) which(x == "next")) + 1
==> 4 3

Step 2:
    sapply(1:length(lis), function(i) lis[[i]][inx[i]])
==> "want1" "want2"

Think about this for a bit and restructure it:

    sapply(1:length(lis), function (i) {v <- lis[[i]];
v[which(v=="next")+1]})

Wrap it up:

    after <- function(lis, what="next") {
	sapply(1:length(lis), function (i) {
	    v <- lis[[i]]
	    v[which(v == what)+1]
	})
    }

Of course, from my point of view, a call to sapply() *is* a loop, just
packaged slightly differently.  I think this is reasonably clear.

Richard A. O'Keefe

2003-Aug-11 23:01 UTC

head link

[R] subscripts in lists

I suggested
	    sapply(1:length(lis), function (i) {v <- lis[[i]];
v[which(v=="next")+1]})
	
Of course that was really dumb.  It can be simplified, because the index i
is only used to select a list element, which sapply() wants to do for me
anyway.  It should be

    sapply(lis, function(v) v[which(v=="next")+1])

Perhaps the interesting thing is how one gets there.
- The result should be a character vector, not a list, so use sapply()
- The index of a list element does not enter into the calculation of
  the result, so use sapply(a.list, function (an.element) some.calculation)
- For list element, we want to find where something occurs, so use
  which(the.element == the.value.we.want.to.find)
- We want the element after that, so the.element[..... + 1]
and the code (NOT the code I first thought of) practically writes itself.

If I had used backwards reasoning like this, I'd have got there first thing;
what led me to produce an inferior version was using forwards reasoning,
and I *know* better than to do that.  *Sigh.*

The other approach is not to focus on the list structure at all,
but to flatten it into a single sequence:

    {u <- unlist(lis); u[which(u=="next")+1]}

Of course, if some list element should not contain "next" exactly
once,
these two versions would give different results.

We can also expect some kind of performance difference.  My expectation
was that as the "unlist" version has to build a data structure (the
flattened list) which is not part of the result, the "unlist" version
would be inferior.  But one must not trust to intuition; this is an
empirical question deserving an empirical answer.  I did this:

lis <-
list(c("a","b","next","want1","c"),
c("d","next","want2","a"))
f1 <- function(lis) sapply(lis, function(v) v[which(v=="next")+1])
f2 <- function(lis) {lis<-unlist(lis);
lis[which(lis=="next")+1]}

system.time(for(i in 1:10000) f1(lis))
[1] 22.03  7.56 30.97  0.00  0.00
system.time(for(i in 1:10000) f2(lis))
[1] 5.38 1.65 7.44 0.00 0.00

Hmm, unlist is about 4 times faster.  Is that still true with
bigger lists?
lis <- list(lis[[1]],lis[[2]],lis[[1]],lis[[2]],lis[[1]],lis[[2]],
            lis[[1]],lis[[2]],lis[[1]],lis[[2]],lis[[1]],lis[[2]])

system.time(for(i in 1:4000) f1(lis))
[1] 30.91  9.66 42.06  0.00  0.00> system.time(for(i in 1:4000) f2(lis))[1] 2.96 0.65 3.67 0.00 0.00

Yep, it holds up.

This is by no means an exhaustive study, but it certainly suggests that
the "unlist" version may be faster than the "sapply"
version.

Here's why my intuition was wrong:  the "sapply" version calls a
user-
defined function once for each element of the result, while the
"unlist"
version uses nothing but built in operations.  Calling user-defined
functions is currently slow in R.

M.Kondrin

2003-Aug-12 00:10 UTC

head link

[R] subscripts in lists

Chris Knight wrote:> 
> I am tying myself in knots over subscripts when applied to lists
> 
> I have a list along the lines of:
> 
>
lis<-list(c("a","b","next","want1","c"),c("d",
"next", "want2", "a"))
> 
>> From which I want to extract the values following "next" in
each
> 
> member of the list, i.e. something along the lines of answer<-c( 
> "want1", "want2"). Is this possible without using
loops? The elements of
> lis are of different lengths and "next" occurs once per element 
> somewhere in the middle.
> 
> The thought process behind this is:
> 
> It's easy enough to do it for an individual element of the list:
> lis[[1]][match("next",lis[[1]])+1]
> 
> but how to do that to all elements of the list? I can get their indices 
> e.g. as a list using lapply:
> 
> lapply(lapply(lis,match,x="next"),"+",y=1)
> 
> or return a particular subscript using:
> lapply(lis,"[", i=3)
> 
> but don't see how one could combine the two to get
answer<-c("want1",
> "want2") without resorting to:
> 
> answer<-character
> for(s in 1:length(lis)){
> answer<-c(answer,lis[[s]][match("next",lis[[s]])+1])
> }
> 
> Am I missing something obvious (or non-obvious)? I suppose the secondary 
> question is 'should I care?'. I am intending to use this on
hundreds of
> lists sometimes with tens of thousands of elements, with more than one 
> version of "next" in each, so felt that  the lower efficiency of
looping
> was likely to matter.
> Any help much appreciated,
> 
> Chris
 > unlist(lis)[which(unlist(lis)=="next")+1]
[1] "want1" "want2"

Maybe Matching Threads

Search for more reasonably related threads

R help - Aug 2003 - subscripts in lists

[R] subscripts in lists

[R] subscripts in lists

[R] subscripts in lists

[R] subscripts in lists

[R] subscripts in lists

Maybe Matching Threads