thr3ads.net - R help - [R] fast way to compare two matrices of combinations [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Mark W Kimpel

2008-Mar-13 16:23 UTC

[R] fast way to compare two matrices of combinations

I have a list (length 750), each element containing a vector of unique 
strings (unique gene ids), with length up to ~40 (median 15). I want to 
compile a matrix of all possible triplets and their frequency within 
gene elements. Using combn and a lot of looping, I am accomplishing this 
but it is VERY slow.

I've tried to figure out a way to vectorize this, using "match"
and
"%in%", but can't get my mind around it.

Below is my code. sig.tf.pairs is the list. Suggestions?

Mark


############################################################
M <- 3 # 3 for triplets, etc.
##########################################################
# count all triplets
all.triplets <- NULL
all.count.vec <- NULL
for (i in 1:length(sig.tf.pairs)){
   if (length(sig.tf.pairs[[i]] >= M)){
     triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
     for (j in 1:ncol(triplets)){
       o <- order(triplets[,j])
       triplets[,j] <- triplets[o,j]
       count.vec <- rep(1, ncol(triplets))
     }
     if (is.null(all.count.vec)){
       all.count.vec <- count.vec
       all.triplets <- triplets
     } else {
       redundant.vec <- NULL
       for (k in 1:ncol(all.triplets)){
         for (m in 1:ncol(triplets)){
           if (length(intersect(triplets[,m], all.triplets[,k] == M))){
             all.count.vec[k] <- all.count.vec[k] + 1
             redundant.vec <- c(redundant.vec, m)
           }
         }
       }
       if(!is.null(redundant.vec)){
         triplets <- triplets[,-redundant.vec]
         count.vec <- count.vec[,-redundant.vec]
       }
       all.triplets <- cbind(all.triplets, triplets)
       all.count.vec <- c(all.count.vec, count.vec)
     }
   }
}
###################################

-- 

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)

mwkimpel<at>gmail<dot>com

Erik Iverson

2008-Mar-13 16:27 UTC

head link

[R] fast way to compare two matrices of combinations

Hello Mark -

It may help if you provide a (small) set of example input and what you'd 
like as your output.

Best,
Erik Iverson

Mark W Kimpel wrote:> I have a list (length 750), each element containing a vector of unique 
> strings (unique gene ids), with length up to ~40 (median 15). I want to 
> compile a matrix of all possible triplets and their frequency within 
> gene elements. Using combn and a lot of looping, I am accomplishing this 
> but it is VERY slow.
> 
> I've tried to figure out a way to vectorize this, using
"match" and
> "%in%", but can't get my mind around it.
> 
> Below is my code. sig.tf.pairs is the list. Suggestions?
> 
> Mark
> 
> 
> ############################################################
> M <- 3 # 3 for triplets, etc.
> ##########################################################
> # count all triplets
> all.triplets <- NULL
> all.count.vec <- NULL
> for (i in 1:length(sig.tf.pairs)){
>    if (length(sig.tf.pairs[[i]] >= M)){
>      triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
>      for (j in 1:ncol(triplets)){
>        o <- order(triplets[,j])
>        triplets[,j] <- triplets[o,j]
>        count.vec <- rep(1, ncol(triplets))
>      }
>      if (is.null(all.count.vec)){
>        all.count.vec <- count.vec
>        all.triplets <- triplets
>      } else {
>        redundant.vec <- NULL
>        for (k in 1:ncol(all.triplets)){
>          for (m in 1:ncol(triplets)){
>            if (length(intersect(triplets[,m], all.triplets[,k] == M))){
>              all.count.vec[k] <- all.count.vec[k] + 1
>              redundant.vec <- c(redundant.vec, m)
>            }
>          }
>        }
>        if(!is.null(redundant.vec)){
>          triplets <- triplets[,-redundant.vec]
>          count.vec <- count.vec[,-redundant.vec]
>        }
>        all.triplets <- cbind(all.triplets, triplets)
>        all.count.vec <- c(all.count.vec, count.vec)
>      }
>    }
> }
> ###################################
>

Patrick Burns

2008-Mar-13 16:37 UTC

head link

[R] fast way to compare two matrices of combinations

One thing that will probably speed things enormously
is to not grow objects (all.triplets, etc.).  Instead create
them to be roughly the right size and do something like
double their size if they get full.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Mark W Kimpel wrote:
>I have a list (length 750), each element containing a vector of unique 
>strings (unique gene ids), with length up to ~40 (median 15). I want to 
>compile a matrix of all possible triplets and their frequency within 
>gene elements. Using combn and a lot of looping, I am accomplishing this 
>but it is VERY slow.
>
>I've tried to figure out a way to vectorize this, using
"match" and
>"%in%", but can't get my mind around it.
>
>Below is my code. sig.tf.pairs is the list. Suggestions?
>
>Mark
>
>
>############################################################
>M <- 3 # 3 for triplets, etc.
>##########################################################
># count all triplets
>all.triplets <- NULL
>all.count.vec <- NULL
>for (i in 1:length(sig.tf.pairs)){
>   if (length(sig.tf.pairs[[i]] >= M)){
>     triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
>     for (j in 1:ncol(triplets)){
>       o <- order(triplets[,j])
>       triplets[,j] <- triplets[o,j]
>       count.vec <- rep(1, ncol(triplets))
>     }
>     if (is.null(all.count.vec)){
>       all.count.vec <- count.vec
>       all.triplets <- triplets
>     } else {
>       redundant.vec <- NULL
>       for (k in 1:ncol(all.triplets)){
>         for (m in 1:ncol(triplets)){
>           if (length(intersect(triplets[,m], all.triplets[,k] == M))){
>             all.count.vec[k] <- all.count.vec[k] + 1
>             redundant.vec <- c(redundant.vec, m)
>           }
>         }
>       }
>       if(!is.null(redundant.vec)){
>         triplets <- triplets[,-redundant.vec]
>         count.vec <- count.vec[,-redundant.vec]
>       }
>       all.triplets <- cbind(all.triplets, triplets)
>       all.count.vec <- c(all.count.vec, count.vec)
>     }
>   }
>}
>###################################
>
>  
>

Charles C. Berry

2008-Mar-13 18:10 UTC

head link

[R] fast way to compare two matrices of combinations

On Thu, 13 Mar 2008, Mark W Kimpel wrote:
> I have a list (length 750), each element containing a vector of unique
> strings (unique gene ids), with length up to ~40 (median 15). I want to
> compile a matrix of all possible triplets and their frequency within
> gene elements. Using combn and a lot of looping, I am accomplishing this
> but it is VERY slow.
>
> I've tried to figure out a way to vectorize this, using
"match" and
> "%in%", but can't get my mind around it.
>
> Below is my code. sig.tf.pairs is the list. Suggestions?
First, be sure that your code does what you really intend for it to do.

Does this really do what you wanted?

       if (length(intersect(triplets[,m], all.triplets[,k] == M))){

If so, then why does the first line below never produce an error?

 	 count.vec <- count.vec[,-redundant.vec]

 	is.null(dim(count.vec)) ## TRUE

You are basically tabulating. Use the functions that are built for that.

It looks like what you want is along these lines:

 	tab.combns <- function(x) apply( combn( sort(x), M ),2,
         	                        function(x) paste(x,collapse=''))

 	tab.all <- table( unlist( lapply(sig.tf.pairs,tab.combns) ) )

Chuck>
> Mark
>
>
> ############################################################
> M <- 3 # 3 for triplets, etc.
> ##########################################################
> # count all triplets
> all.triplets <- NULL
> all.count.vec <- NULL
> for (i in 1:length(sig.tf.pairs)){
>   if (length(sig.tf.pairs[[i]] >= M)){
>     triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
>     for (j in 1:ncol(triplets)){
>       o <- order(triplets[,j])
>       triplets[,j] <- triplets[o,j]
>       count.vec <- rep(1, ncol(triplets))
>     }
>     if (is.null(all.count.vec)){
>       all.count.vec <- count.vec
>       all.triplets <- triplets
>     } else {
>       redundant.vec <- NULL
>       for (k in 1:ncol(all.triplets)){
>         for (m in 1:ncol(triplets)){
>           if (length(intersect(triplets[,m], all.triplets[,k] == M))){
>             all.count.vec[k] <- all.count.vec[k] + 1
>             redundant.vec <- c(redundant.vec, m)
>           }
>         }
>       }
>       if(!is.null(redundant.vec)){
>         triplets <- triplets[,-redundant.vec]
>         count.vec <- count.vec[,-redundant.vec]
>       }
>       all.triplets <- cbind(all.triplets, triplets)
>       all.count.vec <- c(all.count.vec, count.vec)
>     }
>   }
> }
> ###################################
>
> -- 
>
> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN  46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 204-4202 Home (no voice mail please)
>
> mwkimpel<at>gmail<dot>com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Seemingly Similar Threads

Search for more reasonably related threads

R help - Mar 2008 - fast way to compare two matrices of combinations

[R] fast way to compare two matrices of combinations

[R] fast way to compare two matrices of combinations

[R] fast way to compare two matrices of combinations

[R] fast way to compare two matrices of combinations

Seemingly Similar Threads