thr3ads.net - R help - [R] Efficiency challenge: MANY subsets [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Johannes Graumann

2009-Jan-16 13:06 UTC

[R] Efficiency challenge: MANY subsets

Hello,

I have a list of character vectors like this:

sequences <- list(
 
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M",
 
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
 
"N","I","N","I","N","I","D","K","M","Y","I","H","*")
)

and another list of subset ranges like this:

indexes <- list(
  list(
    c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
  )
)

What I now want to do is to subset each entry in "sequences" 
(sequences[[1]]) with all ranges in the corresponding low level list in 
"indexes" (indexes[[1]]). Here is what I came up with.

fragments <- list()
for(iN in seq(length(sequences))){
  cat(paste(iN,"\n"))
  tmpFragments <- sapply(
    indexes[[iN]],
    function(x){
      sequences[[iN]][seq.int(x[1],x[2])]
    }
  )
  fragments[[iN]] <- tmpFragments
}

This works fine, but "sequences" contains thousands of entries and the
corresponding "indexes" are sometimes hundreds of ranges long, so this
whole
process is EXTREMELY inefficient.

Does somebody out there take the challenge and show me a way on how to speed 
this up?

Thanks for any hints,

Joh

Jorge Ivan Velez

2009-Jan-16 13:18 UTC

head link

[R] Efficiency challenge: MANY subsets

Dear Johannes,
Try this:


sequences <-
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T",
"Y","L","L","I","M","N","H","K","L","L","L","I","N","N","N","N","L","T","E","V",
"H","T","Y","F","N","I","N","I","N","I","D","K","M","Y","I","H","*")

indexes <- matrix(c(1,22,22,46,46,51,1,46,22,51,1,51),ncol=2,byrow=TRUE)

apply(indexes,1,function(x){
                  ind<- x[1]:x[2]
          sequences[ind]
                  }
              )


HTH,

Jorge



On Fri, Jan 16, 2009 at 8:06 AM, Johannes Graumann
<johannes_graumann@web.de> wrote:
> Hello,
>
> I have a list of character vectors like this:
>
> sequences <- list(
>
> 
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M",
>
> 
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
> 
"N","I","N","I","N","I","D","K","M","Y","I","H","*")
> )
>
> and another list of subset ranges like this:
>
> indexes <- list(
>  list(
>    c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
>  )
> )
>
> What I now want to do is to subset each entry in "sequences"
> (sequences[[1]]) with all ranges in the corresponding low level list in
> "indexes" (indexes[[1]]). Here is what I came up with.
>
> fragments <- list()
> for(iN in seq(length(sequences))){
>  cat(paste(iN,"\n"))
>  tmpFragments <- sapply(
>    indexes[[iN]],
>    function(x){
>      sequences[[iN]][seq.int(x[1],x[2])]
>    }
>  )
>  fragments[[iN]] <- tmpFragments
> }
>
> This works fine, but "sequences" contains thousands of entries
and the
> corresponding "indexes" are sometimes hundreds of ranges long, so
this
> whole
> process is EXTREMELY inefficient.
>
> Does somebody out there take the challenge and show me a way on how to
> speed
> this up?
>
> Thanks for any hints,
>
> Joh
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Henrique Dallazuanna

2009-Jan-16 13:23 UTC

head link

[R] Efficiency challenge: MANY subsets

Try this:

lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])

On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann <
johannes_graumann@web.de> wrote:
> Hello,
>
> I have a list of character vectors like this:
>
> sequences <- list(
>
> 
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M",
>
> 
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
> 
"N","I","N","I","N","I","D","K","M","Y","I","H","*")
> )
>
> and another list of subset ranges like this:
>
> indexes <- list(
>  list(
>    c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
>  )
> )
>
> What I now want to do is to subset each entry in "sequences"
> (sequences[[1]]) with all ranges in the corresponding low level list in
> "indexes" (indexes[[1]]). Here is what I came up with.
>
> fragments <- list()
> for(iN in seq(length(sequences))){
>  cat(paste(iN,"\n"))
>  tmpFragments <- sapply(
>    indexes[[iN]],
>    function(x){
>      sequences[[iN]][seq.int(x[1],x[2])]
>    }
>  )
>  fragments[[iN]] <- tmpFragments
> }
>
> This works fine, but "sequences" contains thousands of entries
and the
> corresponding "indexes" are sometimes hundreds of ranges long, so
this
> whole
> process is EXTREMELY inefficient.
>
> Does somebody out there take the challenge and show me a way on how to
> speed
> this up?
>
> Thanks for any hints,
>
> Joh
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jan 2009 - Efficiency challenge: MANY subsets

[R] Efficiency challenge: MANY subsets

[R] Efficiency challenge: MANY subsets

[R] Efficiency challenge: MANY subsets

Possibly Parallel Threads