I am looking for a fast way to count the number of rows in a matrix are identical to a pattern vector. For example, if I am interested in counting the number of row vectors in a matrix that are identical to (1,2,3) what would I do? I have tried the identical statement in a loop but this is far too slow. I have a very large matrix and need to avoid loops at all costs. Thanks for any help. Todd Remund
Probably you use the idea from unique.matrix, that is 1) form a string from each row and 2) call match() to see which strings match your pattern row. On Sun, 14 Aug 2005, Todd Remund wrote:> I am looking for a fast way to count the number of rows in a matrix are > identical to a pattern vector. For example, if I am interested in counting > the number of row vectors in a matrix that are identical to (1,2,3) what > would I do? I have tried the identical statement in a loop but this is far > too slow. I have a very large matrix and need to avoid loops at all costs.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Mon, 15 Aug 2005, Prof Brian Ripley wrote:> Probably you use the idea from unique.matrix, that is > > 1) form a string from each row and > 2) call match() to see which strings match your pattern row.If your matrix A really does have short rows like c(1,2,3) and millions of them, another idea is to do target <- rep(c(1,2,3), each= nrow(A)) rowSums(A != target) == 0 For wider rows my first suggestion is probably faster.> On Sun, 14 Aug 2005, Todd Remund wrote: > >> I am looking for a fast way to count the number of rows in a matrix are >> identical to a pattern vector. For example, if I am interested in counting >> the number of row vectors in a matrix that are identical to (1,2,3) what >> would I do? I have tried the identical statement in a loop but this is far >> too slow. I have a very large matrix and need to avoid loops at all costs.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi Todd,
Here is a function that was suggested to me by Gabor Grothendieck. This
function counts the number of times each row of a matrix B occurs in another
matrix A.
rowmatch.count <- function(a,b) {
f <- function(...) paste(..., sep=":")
a2 <- do.call("f", as.data.frame(a))
b2 <- do.call("f", as.data.frame(b))
c(table(c(a2,unique(b2)))[b2] - 1)
}
If you are interested in finding the number of occurrences of a vector
"b"
instead, you can call this function as follows:
rowmatch.count(A,t(as.matrix(b))
Hope this is helps,
Ravi.
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> bounces at stat.math.ethz.ch] On Behalf Of Todd Remund
> Sent: Monday, August 15, 2005 1:13 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Vector comparison to matrix
>
> I am looking for a fast way to count the number of rows in a matrix are
> identical to a pattern vector. For example, if I am interested in
> counting
> the number of row vectors in a matrix that are identical to (1,2,3) what
> would I do? I have tried the identical statement in a loop but this is
> far
> too slow. I have a very large matrix and need to avoid loops at all
> costs.
> Thanks for any help.
> Todd Remund
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-
> guide.html
Hi Todd and list, I see you have received a few suggestions, here's another: # set up data: your vector and an a 3x300000 matrix with a few # matching lines: target<-c(1,2,3) A<-matrix(sample(1:3,300000,replace=TRUE),ncol=3) # count matches: nMatches<-sum(apply(A,1,function(x,target) all.equal(x,target),target)=="TRUE") # by applying a simple function, which takes 'target' as an 'extra' # argument, to the rows of A. The function returns a vector of # differences and 'TRUE'-s, the latter of which can be counted. This took 1-2 minutes on my >3 year old laptop. Siggi> version_ platform i686-redhat-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 0.0 year 2004 month 10 day 04 language R Yeah, I know, an update is (over)due. -- ----------------------------------------------------------------------------- Sigur??ur ????r J??nsson / Sigurdur Tor Jonsson E-mail: <sigurdur at hafro.is> Snail-mail: Marine Research Institute, P.O. Box 1390, 121 Reykjavik,Iceland Telephone (direct line): +354 5752093 Telephone (switchboard): +354 5752000 Fax: +354 5752001