Hi: I have two matrices, A and B, where A is n x k, and B is m x k, where n >> m >> k. Is there a computationally fast way to count the number of times each row (a k-vector) of B occurs in A? Thanks for any suggestions. Best, Ravi. [[alternative HTML version deleted]]
What have you tried? Have you considered something like the following: n <- 4 m <- 3 k <- 2 A <- array(1, dim=c(n, k)) B <- array(1, dim=c(m,k)) BinA <- rep(NA, m) tA <- t(A) for(i in 1:m){ BinA[i] <- sum(apply(B[i,]==tA, 2, sum)==k) } > BinA [1] 4 4 4 hope this helps. spencer graves Ravi Varadhan wrote:>Hi: > >I have two matrices, A and B, where A is n x k, and B is m x k, where n >> m >> k. Is there a computationally fast way to count the number of times each row (a k-vector) of B occurs in A? Thanks for any suggestions. > >Best, >Ravi. > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >
On Sat, 2004-07-03 at 09:31, Ravi Varadhan wrote:> Hi: > > I have two matrices, A and B, where A is n x k, and B is m x k, where > n >> m >> k. Is there a computationally fast way to count the number > of times each row (a k-vector) of B occurs in A? Thanks for any > suggestions. > > Best, > Ravi.How about something like this: row.match <- function(m1, m2) { if (ncol(m1) != (ncol(m2))) stop("Matrices must have the same number of columns") m1.l <- apply(m1, 1, list) m2.l <- apply(m2 ,1, list) # return boolean for m1.l in m2.l m1.l %in% m2.l } Example of use: m <- matrix(1:20, ncol = 4, byrow = TRUE) n <- matrix(1:40, ncol = 4, byrow = TRUE)> m[,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 [4,] 13 14 15 16 [5,] 17 18 19 20> n[,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 [4,] 13 14 15 16 [5,] 17 18 19 20 [6,] 21 22 23 24 [7,] 25 26 27 28 [8,] 29 30 31 32 [9,] 33 34 35 36 [10,] 37 38 39 40> row.match(n, m)[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE If you want to know which rows from n are matches:> n[row.match(n, m), ][,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 [4,] 13 14 15 16 [5,] 17 18 19 20 and if you just want the indices from n:> which(row.match(n, m))[1] 1 2 3 4 5 For timing, if I create some large matrices:> m <- matrix(1:20000, ncol = 4, byrow = TRUE) > nrow(m)[1] 5000> n <- matrix(1:40000, ncol = 4, byrow = TRUE) > nrow(n)[1] 10000> system.time(row.match(n, m))[1] 0.39 0.01 0.41 0.00 0.00> length(row.match(n, m))[1] 10000 Does that get you what you want? HTH, Marc Schwartz
Ravi Varadhan <rvaradha <at> jhsph.edu> writes:> Hi: > > I have two matrices, A and B, where A is n x k, and B is m x k, where n >> m >> k. Is there a computationally fast way to > count the number of times each row (a k-vector) of B occurs in A? Thanksfor any suggestions.> > Best, > Ravi.Here are two approaches. The first one is an order of magnitude faster than the second. R> # test matrices R> set.seed(1) R> a <- matrix(sample(3,1000,rep=T),nc=5) R> b <- matrix(sample(3,100,rep=T),nc=5) R> f1 <- function(a,b) { + a2 <- apply(a, 1, paste, collapse=":") + b2 <- apply(b, 1, paste, collapse=":") + c(table(c(a2,unique(b2)))[b2] - 1) + } R> f2 <- function(a,b) { + ta <- t(a) + apply(b,1,function(x)sum(apply(ta == x,2,all))) + } R> gc(); system.time(ans1 <- f1(a,b)) used (Mb) gc trigger (Mb) Ncells 458311 12.3 818163 21.9 Vcells 124264 1.0 786432 6.0 [1] 0.03 0.00 0.03 NA NA R> gc(); system.time(ans2 <- f2(a,b)) used (Mb) gc trigger (Mb) Ncells 458312 12.3 818163 21.9 Vcells 124270 1.0 786432 6.0 [1] 0.1 0.0 0.1 NA NA R> all.equal(ans1, ans2) [1] TRUE
Thanks to Gabor, Marc, and Spencer for their elegant solutions. Gabor's first solution worked the best for me. Best, Ravi. ________________________________ From: r-help-bounces@stat.math.ethz.ch on behalf of Gabor Grothendieck Sent: Sat 7/3/2004 12:12 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] counting the occurrences of vectors Ravi Varadhan <rvaradha <at> jhsph.edu> writes:> Hi: > > I have two matrices, A and B, where A is n x k, and B is m x k, where n >> m >> k. Is there a computationally fast way to > count the number of times each row (a k-vector) of B occurs in A? Thanksfor any suggestions.> > Best, > Ravi.Here are two approaches. The first one is an order of magnitude faster than the second. R> # test matrices R> set.seed(1) R> a <- matrix(sample(3,1000,rep=T),nc=5) R> b <- matrix(sample(3,100,rep=T),nc=5) R> f1 <- function(a,b) { + a2 <- apply(a, 1, paste, collapse=":") + b2 <- apply(b, 1, paste, collapse=":") + c(table(c(a2,unique(b2)))[b2] - 1) + } R> f2 <- function(a,b) { + ta <- t(a) + apply(b,1,function(x)sum(apply(ta == x,2,all))) + } R> gc(); system.time(ans1 <- f1(a,b)) used (Mb) gc trigger (Mb) Ncells 458311 12.3 818163 21.9 Vcells 124264 1.0 786432 6.0 [1] 0.03 0.00 0.03 NA NA R> gc(); system.time(ans2 <- f2(a,b)) used (Mb) gc trigger (Mb) Ncells 458312 12.3 818163 21.9 Vcells 124270 1.0 786432 6.0 [1] 0.1 0.0 0.1 NA NA R> all.equal(ans1, ans2) [1] TRUE ______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]]