I am tying myself in knots over subscripts when applied to lists I have a list along the lines of: lis<-list(c("a","b","next","want1","c"),c("d", "next", "want2", "a"))>From which I want to extract the values following "next" in eachmember of the list, i.e. something along the lines of answer<-c( "want1", "want2"). Is this possible without using loops? The elements of lis are of different lengths and "next" occurs once per element somewhere in the middle. The thought process behind this is: It's easy enough to do it for an individual element of the list: lis[[1]][match("next",lis[[1]])+1] but how to do that to all elements of the list? I can get their indices e.g. as a list using lapply: lapply(lapply(lis,match,x="next"),"+",y=1) or return a particular subscript using: lapply(lis,"[", i=3) but don't see how one could combine the two to get answer<-c("want1", "want2") without resorting to: answer<-character for(s in 1:length(lis)){ answer<-c(answer,lis[[s]][match("next",lis[[s]])+1]) } Am I missing something obvious (or non-obvious)? I suppose the secondary question is 'should I care?'. I am intending to use this on hundreds of lists sometimes with tens of thousands of elements, with more than one version of "next" in each, so felt that the lower efficiency of looping was likely to matter. Any help much appreciated, Chris -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dr. Christopher G. Knight Tel:+44 (0)1865 275 111 Department of Plant Sciences +44 (0)1865 275 790 South Parks Road Oxford OX1 3RB UK ` ? . , ,><(((?> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> sapply(lis, function(x) x[which(x == "next") + 1])[1] "want1" "want2" HTH, Andy> From: Chris Knight > > I am tying myself in knots over subscripts when applied to lists > > I have a list along the lines of: > > lis<-list(c("a","b","next","want1","c"),c("d", "next", "want2", "a")) > > >From which I want to extract the values following "next" in each > member of the list, i.e. something along the lines of answer<-c( > "want1", "want2"). Is this possible without using loops? The elements > of lis are of different lengths and "next" occurs once per element > somewhere in the middle. > > The thought process behind this is: > > It's easy enough to do it for an individual element of the > list: lis[[1]][match("next",lis[[1]])+1] > > but how to do that to all elements of the list? I can get their > indices e.g. as a list using lapply: > > lapply(lapply(lis,match,x="next"),"+",y=1) > > or return a particular subscript using: > lapply(lis,"[", i=3) > > but don't see how one could combine the two to get answer<-c("want1", > "want2") without resorting to: > > answer<-character > for(s in 1:length(lis)){ > answer<-c(answer,lis[[s]][match("next",lis[[s]])+1]) > } > > Am I missing something obvious (or non-obvious)? I suppose the > secondary question is 'should I care?'. I am intending to use this on > hundreds of lists sometimes with tens of thousands of elements, with > more than one version of "next" in each, so felt that the lower > efficiency of looping was likely to matter. > Any help much appreciated, > > Chris > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Dr. Christopher G. Knight Tel:+44 (0)1865 275 111 > Department of Plant Sciences +44 (0)1865 275 790 > South Parks Road > Oxford OX1 3RB > UK ` > ? . , ,><(((?> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >
Chris Knight <christopher.knight at plant-sciences.oxford.ac.uk> has lis<-list(c("a","b","next","want1","c"),c("d", "next", "want2", "a")) and wants c("want1","want2") Step 1: inx <- sapply(lis, function(x) which(x == "next")) + 1 ==> 4 3 Step 2: sapply(1:length(lis), function(i) lis[[i]][inx[i]]) ==> "want1" "want2" Think about this for a bit and restructure it: sapply(1:length(lis), function (i) {v <- lis[[i]]; v[which(v=="next")+1]}) Wrap it up: after <- function(lis, what="next") { sapply(1:length(lis), function (i) { v <- lis[[i]] v[which(v == what)+1] }) } Of course, from my point of view, a call to sapply() *is* a loop, just packaged slightly differently. I think this is reasonably clear.
I suggested sapply(1:length(lis), function (i) {v <- lis[[i]]; v[which(v=="next")+1]}) Of course that was really dumb. It can be simplified, because the index i is only used to select a list element, which sapply() wants to do for me anyway. It should be sapply(lis, function(v) v[which(v=="next")+1]) Perhaps the interesting thing is how one gets there. - The result should be a character vector, not a list, so use sapply() - The index of a list element does not enter into the calculation of the result, so use sapply(a.list, function (an.element) some.calculation) - For list element, we want to find where something occurs, so use which(the.element == the.value.we.want.to.find) - We want the element after that, so the.element[..... + 1] and the code (NOT the code I first thought of) practically writes itself. If I had used backwards reasoning like this, I'd have got there first thing; what led me to produce an inferior version was using forwards reasoning, and I *know* better than to do that. *Sigh.* The other approach is not to focus on the list structure at all, but to flatten it into a single sequence: {u <- unlist(lis); u[which(u=="next")+1]} Of course, if some list element should not contain "next" exactly once, these two versions would give different results. We can also expect some kind of performance difference. My expectation was that as the "unlist" version has to build a data structure (the flattened list) which is not part of the result, the "unlist" version would be inferior. But one must not trust to intuition; this is an empirical question deserving an empirical answer. I did this: lis <- list(c("a","b","next","want1","c"), c("d","next","want2","a")) f1 <- function(lis) sapply(lis, function(v) v[which(v=="next")+1]) f2 <- function(lis) {lis<-unlist(lis); lis[which(lis=="next")+1]} system.time(for(i in 1:10000) f1(lis)) [1] 22.03 7.56 30.97 0.00 0.00 system.time(for(i in 1:10000) f2(lis)) [1] 5.38 1.65 7.44 0.00 0.00 Hmm, unlist is about 4 times faster. Is that still true with bigger lists? lis <- list(lis[[1]],lis[[2]],lis[[1]],lis[[2]],lis[[1]],lis[[2]], lis[[1]],lis[[2]],lis[[1]],lis[[2]],lis[[1]],lis[[2]]) system.time(for(i in 1:4000) f1(lis)) [1] 30.91 9.66 42.06 0.00 0.00> system.time(for(i in 1:4000) f2(lis))[1] 2.96 0.65 3.67 0.00 0.00 Yep, it holds up. This is by no means an exhaustive study, but it certainly suggests that the "unlist" version may be faster than the "sapply" version. Here's why my intuition was wrong: the "sapply" version calls a user- defined function once for each element of the result, while the "unlist" version uses nothing but built in operations. Calling user-defined functions is currently slow in R.
Chris Knight wrote:> > I am tying myself in knots over subscripts when applied to lists > > I have a list along the lines of: > > lis<-list(c("a","b","next","want1","c"),c("d", "next", "want2", "a")) > >> From which I want to extract the values following "next" in each > > member of the list, i.e. something along the lines of answer<-c( > "want1", "want2"). Is this possible without using loops? The elements of > lis are of different lengths and "next" occurs once per element > somewhere in the middle. > > The thought process behind this is: > > It's easy enough to do it for an individual element of the list: > lis[[1]][match("next",lis[[1]])+1] > > but how to do that to all elements of the list? I can get their indices > e.g. as a list using lapply: > > lapply(lapply(lis,match,x="next"),"+",y=1) > > or return a particular subscript using: > lapply(lis,"[", i=3) > > but don't see how one could combine the two to get answer<-c("want1", > "want2") without resorting to: > > answer<-character > for(s in 1:length(lis)){ > answer<-c(answer,lis[[s]][match("next",lis[[s]])+1]) > } > > Am I missing something obvious (or non-obvious)? I suppose the secondary > question is 'should I care?'. I am intending to use this on hundreds of > lists sometimes with tens of thousands of elements, with more than one > version of "next" in each, so felt that the lower efficiency of looping > was likely to matter. > Any help much appreciated, > > Chris> unlist(lis)[which(unlist(lis)=="next")+1] [1] "want1" "want2"