Hello, I have a list of character vectors like this: sequences <- list( c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M", "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F", "N","I","N","I","N","I","D","K","M","Y","I","H","*") ) and another list of subset ranges like this: indexes <- list( list( c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) ) ) What I now want to do is to subset each entry in "sequences" (sequences[[1]]) with all ranges in the corresponding low level list in "indexes" (indexes[[1]]). Here is what I came up with. fragments <- list() for(iN in seq(length(sequences))){ cat(paste(iN,"\n")) tmpFragments <- sapply( indexes[[iN]], function(x){ sequences[[iN]][seq.int(x[1],x[2])] } ) fragments[[iN]] <- tmpFragments } This works fine, but "sequences" contains thousands of entries and the corresponding "indexes" are sometimes hundreds of ranges long, so this whole process is EXTREMELY inefficient. Does somebody out there take the challenge and show me a way on how to speed this up? Thanks for any hints, Joh
Dear Johannes, Try this: sequences <- c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T", "Y","L","L","I","M","N","H","K","L","L","L","I","N","N","N","N","L","T","E","V", "H","T","Y","F","N","I","N","I","N","I","D","K","M","Y","I","H","*") indexes <- matrix(c(1,22,22,46,46,51,1,46,22,51,1,51),ncol=2,byrow=TRUE) apply(indexes,1,function(x){ ind<- x[1]:x[2] sequences[ind] } ) HTH, Jorge On Fri, Jan 16, 2009 at 8:06 AM, Johannes Graumann <johannes_graumann@web.de> wrote:> Hello, > > I have a list of character vectors like this: > > sequences <- list( > > c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M", > > "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F", > "N","I","N","I","N","I","D","K","M","Y","I","H","*") > ) > > and another list of subset ranges like this: > > indexes <- list( > list( > c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) > ) > ) > > What I now want to do is to subset each entry in "sequences" > (sequences[[1]]) with all ranges in the corresponding low level list in > "indexes" (indexes[[1]]). Here is what I came up with. > > fragments <- list() > for(iN in seq(length(sequences))){ > cat(paste(iN,"\n")) > tmpFragments <- sapply( > indexes[[iN]], > function(x){ > sequences[[iN]][seq.int(x[1],x[2])] > } > ) > fragments[[iN]] <- tmpFragments > } > > This works fine, but "sequences" contains thousands of entries and the > corresponding "indexes" are sometimes hundreds of ranges long, so this > whole > process is EXTREMELY inefficient. > > Does somebody out there take the challenge and show me a way on how to > speed > this up? > > Thanks for any hints, > > Joh > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Try this: lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))]) On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann < johannes_graumann@web.de> wrote:> Hello, > > I have a list of character vectors like this: > > sequences <- list( > > c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M", > > "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F", > "N","I","N","I","N","I","D","K","M","Y","I","H","*") > ) > > and another list of subset ranges like this: > > indexes <- list( > list( > c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) > ) > ) > > What I now want to do is to subset each entry in "sequences" > (sequences[[1]]) with all ranges in the corresponding low level list in > "indexes" (indexes[[1]]). Here is what I came up with. > > fragments <- list() > for(iN in seq(length(sequences))){ > cat(paste(iN,"\n")) > tmpFragments <- sapply( > indexes[[iN]], > function(x){ > sequences[[iN]][seq.int(x[1],x[2])] > } > ) > fragments[[iN]] <- tmpFragments > } > > This works fine, but "sequences" contains thousands of entries and the > corresponding "indexes" are sometimes hundreds of ranges long, so this > whole > process is EXTREMELY inefficient. > > Does somebody out there take the challenge and show me a way on how to > speed > this up? > > Thanks for any hints, > > Joh > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]