I am looking for a fast way to count the number of rows in a matrix are identical to a pattern vector. For example, if I am interested in counting the number of row vectors in a matrix that are identical to (1,2,3) what would I do? I have tried the identical statement in a loop but this is far too slow. I have a very large matrix and need to avoid loops at all costs. Thanks for any help. Todd Remund
Probably you use the idea from unique.matrix, that is 1) form a string from each row and 2) call match() to see which strings match your pattern row. On Sun, 14 Aug 2005, Todd Remund wrote:> I am looking for a fast way to count the number of rows in a matrix are > identical to a pattern vector. For example, if I am interested in counting > the number of row vectors in a matrix that are identical to (1,2,3) what > would I do? I have tried the identical statement in a loop but this is far > too slow. I have a very large matrix and need to avoid loops at all costs.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Mon, 15 Aug 2005, Prof Brian Ripley wrote:> Probably you use the idea from unique.matrix, that is > > 1) form a string from each row and > 2) call match() to see which strings match your pattern row.If your matrix A really does have short rows like c(1,2,3) and millions of them, another idea is to do target <- rep(c(1,2,3), each= nrow(A)) rowSums(A != target) == 0 For wider rows my first suggestion is probably faster.> On Sun, 14 Aug 2005, Todd Remund wrote: > >> I am looking for a fast way to count the number of rows in a matrix are >> identical to a pattern vector. For example, if I am interested in counting >> the number of row vectors in a matrix that are identical to (1,2,3) what >> would I do? I have tried the identical statement in a loop but this is far >> too slow. I have a very large matrix and need to avoid loops at all costs.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi Todd, Here is a function that was suggested to me by Gabor Grothendieck. This function counts the number of times each row of a matrix B occurs in another matrix A. rowmatch.count <- function(a,b) { f <- function(...) paste(..., sep=":") a2 <- do.call("f", as.data.frame(a)) b2 <- do.call("f", as.data.frame(b)) c(table(c(a2,unique(b2)))[b2] - 1) } If you are interested in finding the number of occurrences of a vector "b" instead, you can call this function as follows: rowmatch.count(A,t(as.matrix(b)) Hope this is helps, Ravi.> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help- > bounces at stat.math.ethz.ch] On Behalf Of Todd Remund > Sent: Monday, August 15, 2005 1:13 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Vector comparison to matrix > > I am looking for a fast way to count the number of rows in a matrix are > identical to a pattern vector. For example, if I am interested in > counting > the number of row vectors in a matrix that are identical to (1,2,3) what > would I do? I have tried the identical statement in a loop but this is > far > too slow. I have a very large matrix and need to avoid loops at all > costs. > Thanks for any help. > Todd Remund > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting- > guide.html
Hi Todd and list, I see you have received a few suggestions, here's another: # set up data: your vector and an a 3x300000 matrix with a few # matching lines: target<-c(1,2,3) A<-matrix(sample(1:3,300000,replace=TRUE),ncol=3) # count matches: nMatches<-sum(apply(A,1,function(x,target) all.equal(x,target),target)=="TRUE") # by applying a simple function, which takes 'target' as an 'extra' # argument, to the rows of A. The function returns a vector of # differences and 'TRUE'-s, the latter of which can be counted. This took 1-2 minutes on my >3 year old laptop. Siggi> version_ platform i686-redhat-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 0.0 year 2004 month 10 day 04 language R Yeah, I know, an update is (over)due. -- ----------------------------------------------------------------------------- Sigur??ur ????r J??nsson / Sigurdur Tor Jonsson E-mail: <sigurdur at hafro.is> Snail-mail: Marine Research Institute, P.O. Box 1390, 121 Reykjavik,Iceland Telephone (direct line): +354 5752093 Telephone (switchboard): +354 5752000 Fax: +354 5752001