I am tying myself in knots over subscripts when applied to lists
I have a list along the lines of:
lis<-list(c("a","b","next","want1","c"),c("d",
"next", "want2", "a"))
>From which I want to extract the values following "next" in each
member of the list, i.e. something along the lines of answer<-c(
"want1", "want2"). Is this possible without using loops? The
elements
of lis are of different lengths and "next" occurs once per element
somewhere in the middle.
The thought process behind this is:
It's easy enough to do it for an individual element of the list:
lis[[1]][match("next",lis[[1]])+1]
but how to do that to all elements of the list? I can get their
indices e.g. as a list using lapply:
lapply(lapply(lis,match,x="next"),"+",y=1)
or return a particular subscript using:
lapply(lis,"[", i=3)
but don't see how one could combine the two to get
answer<-c("want1",
"want2") without resorting to:
answer<-character
for(s in 1:length(lis)){
answer<-c(answer,lis[[s]][match("next",lis[[s]])+1])
}
Am I missing something obvious (or non-obvious)? I suppose the
secondary question is 'should I care?'. I am intending to use this on
hundreds of lists sometimes with tens of thousands of elements, with
more than one version of "next" in each, so felt that the lower
efficiency of looping was likely to matter.
Any help much appreciated,
Chris
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Christopher G. Knight Tel:+44 (0)1865 275 111
Department of Plant Sciences +44 (0)1865 275 790
South Parks Road
Oxford OX1 3RB
UK `
? . , ,><(((?>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> sapply(lis, function(x) x[which(x == "next") + 1])[1] "want1" "want2" HTH, Andy> From: Chris Knight > > I am tying myself in knots over subscripts when applied to lists > > I have a list along the lines of: > > lis<-list(c("a","b","next","want1","c"),c("d", "next", "want2", "a")) > > >From which I want to extract the values following "next" in each > member of the list, i.e. something along the lines of answer<-c( > "want1", "want2"). Is this possible without using loops? The elements > of lis are of different lengths and "next" occurs once per element > somewhere in the middle. > > The thought process behind this is: > > It's easy enough to do it for an individual element of the > list: lis[[1]][match("next",lis[[1]])+1] > > but how to do that to all elements of the list? I can get their > indices e.g. as a list using lapply: > > lapply(lapply(lis,match,x="next"),"+",y=1) > > or return a particular subscript using: > lapply(lis,"[", i=3) > > but don't see how one could combine the two to get answer<-c("want1", > "want2") without resorting to: > > answer<-character > for(s in 1:length(lis)){ > answer<-c(answer,lis[[s]][match("next",lis[[s]])+1]) > } > > Am I missing something obvious (or non-obvious)? I suppose the > secondary question is 'should I care?'. I am intending to use this on > hundreds of lists sometimes with tens of thousands of elements, with > more than one version of "next" in each, so felt that the lower > efficiency of looping was likely to matter. > Any help much appreciated, > > Chris > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Dr. Christopher G. Knight Tel:+44 (0)1865 275 111 > Department of Plant Sciences +44 (0)1865 275 790 > South Parks Road > Oxford OX1 3RB > UK ` > ? . , ,><(((?> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >
Chris Knight <christopher.knight at plant-sciences.oxford.ac.uk> has
lis<-list(c("a","b","next","want1","c"),c("d",
"next", "want2", "a"))
and wants c("want1","want2")
Step 1:
inx <- sapply(lis, function(x) which(x == "next")) + 1
==> 4 3
Step 2:
sapply(1:length(lis), function(i) lis[[i]][inx[i]])
==> "want1" "want2"
Think about this for a bit and restructure it:
sapply(1:length(lis), function (i) {v <- lis[[i]];
v[which(v=="next")+1]})
Wrap it up:
after <- function(lis, what="next") {
sapply(1:length(lis), function (i) {
v <- lis[[i]]
v[which(v == what)+1]
})
}
Of course, from my point of view, a call to sapply() *is* a loop, just
packaged slightly differently. I think this is reasonably clear.
I suggested
sapply(1:length(lis), function (i) {v <- lis[[i]];
v[which(v=="next")+1]})
Of course that was really dumb. It can be simplified, because the index i
is only used to select a list element, which sapply() wants to do for me
anyway. It should be
sapply(lis, function(v) v[which(v=="next")+1])
Perhaps the interesting thing is how one gets there.
- The result should be a character vector, not a list, so use sapply()
- The index of a list element does not enter into the calculation of
the result, so use sapply(a.list, function (an.element) some.calculation)
- For list element, we want to find where something occurs, so use
which(the.element == the.value.we.want.to.find)
- We want the element after that, so the.element[..... + 1]
and the code (NOT the code I first thought of) practically writes itself.
If I had used backwards reasoning like this, I'd have got there first thing;
what led me to produce an inferior version was using forwards reasoning,
and I *know* better than to do that. *Sigh.*
The other approach is not to focus on the list structure at all,
but to flatten it into a single sequence:
{u <- unlist(lis); u[which(u=="next")+1]}
Of course, if some list element should not contain "next" exactly
once,
these two versions would give different results.
We can also expect some kind of performance difference. My expectation
was that as the "unlist" version has to build a data structure (the
flattened list) which is not part of the result, the "unlist" version
would be inferior. But one must not trust to intuition; this is an
empirical question deserving an empirical answer. I did this:
lis <-
list(c("a","b","next","want1","c"),
c("d","next","want2","a"))
f1 <- function(lis) sapply(lis, function(v) v[which(v=="next")+1])
f2 <- function(lis) {lis<-unlist(lis);
lis[which(lis=="next")+1]}
system.time(for(i in 1:10000) f1(lis))
[1] 22.03 7.56 30.97 0.00 0.00
system.time(for(i in 1:10000) f2(lis))
[1] 5.38 1.65 7.44 0.00 0.00
Hmm, unlist is about 4 times faster. Is that still true with
bigger lists?
lis <- list(lis[[1]],lis[[2]],lis[[1]],lis[[2]],lis[[1]],lis[[2]],
lis[[1]],lis[[2]],lis[[1]],lis[[2]],lis[[1]],lis[[2]])
system.time(for(i in 1:4000) f1(lis))
[1] 30.91 9.66 42.06 0.00 0.00> system.time(for(i in 1:4000) f2(lis))
[1] 2.96 0.65 3.67 0.00 0.00
Yep, it holds up.
This is by no means an exhaustive study, but it certainly suggests that
the "unlist" version may be faster than the "sapply"
version.
Here's why my intuition was wrong: the "sapply" version calls a
user-
defined function once for each element of the result, while the
"unlist"
version uses nothing but built in operations. Calling user-defined
functions is currently slow in R.
Chris Knight wrote:> > I am tying myself in knots over subscripts when applied to lists > > I have a list along the lines of: > > lis<-list(c("a","b","next","want1","c"),c("d", "next", "want2", "a")) > >> From which I want to extract the values following "next" in each > > member of the list, i.e. something along the lines of answer<-c( > "want1", "want2"). Is this possible without using loops? The elements of > lis are of different lengths and "next" occurs once per element > somewhere in the middle. > > The thought process behind this is: > > It's easy enough to do it for an individual element of the list: > lis[[1]][match("next",lis[[1]])+1] > > but how to do that to all elements of the list? I can get their indices > e.g. as a list using lapply: > > lapply(lapply(lis,match,x="next"),"+",y=1) > > or return a particular subscript using: > lapply(lis,"[", i=3) > > but don't see how one could combine the two to get answer<-c("want1", > "want2") without resorting to: > > answer<-character > for(s in 1:length(lis)){ > answer<-c(answer,lis[[s]][match("next",lis[[s]])+1]) > } > > Am I missing something obvious (or non-obvious)? I suppose the secondary > question is 'should I care?'. I am intending to use this on hundreds of > lists sometimes with tens of thousands of elements, with more than one > version of "next" in each, so felt that the lower efficiency of looping > was likely to matter. > Any help much appreciated, > > Chris> unlist(lis)[which(unlist(lis)=="next")+1] [1] "want1" "want2"