Hi, I have a rectangular matrix and I need to check whether any columns are identical or not. Currently I'm looping over the columns and checking each column with all the others with identical(). However, as experience has shown me, getting rid of loops is a good idea :) Would anybody have any suggestions as to how I could do this job more efficiently. (It would be nice to know which columns are identical but thats not a necessity.) ------------------------------------------------------------------- Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Entropy isn't what it used to be.
On Wed, 2003-12-03 at 13:18, J.R. Lockwood wrote:> list will come up with something clever. the other issues is that you > need to be careful when doing equality comparisons with floating point > numbers. unless your matrix consists of characters or integers, > you'll need to think about some level of numerical tolerance of your > comparison.Yes, the matrix will always be integer. ------------------------------------------------------------------- Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- All theoretical chemistry is really physics; and all theoretical chemists know it. -- Richard P. Feynman
On Wed, 2003-12-03 at 12:06, Rajarshi Guha wrote:> Hi, > I have a rectangular matrix and I need to check whether any columns > are identical or not. Currently I'm looping over the columns and > checking each column with all the others with identical(). > > However, as experience has shown me, getting rid of loops is a good idea > :) Would anybody have any suggestions as to how I could do this job more > efficiently. > > (It would be nice to know which columns are identical but thats not a > necessity.)If your matrix is 'x' and contains text and/or integer values (since float comparisons can be problematic) you can use: any(duplicated(x, MARGIN = 2)) to find out if any of the columns are duplicated and which(duplicated(x, MARGIN = 2)) to get the column numbers that are duplicates in the matrix. If you want to extract the unique columns, you can use: unique(x, MARGIN = 2) See ?duplicated and ?unique for more information. Example:> x <- matrix(c(1:3, 4:6, 1:3, 7:9), ncol = 4) > x[,1] [,2] [,3] [,4] [1,] 1 4 1 7 [2,] 2 5 2 8 [3,] 3 6 3 9> any(duplicated(x, MARGIN = 2))[1] TRUE> which(duplicated(x, MARGIN = 2))[1] 3> unique(x, MARGIN = 2)[,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 HTH, Marc Schwartz
> From: Rajarshi Guha> On Wed, 2003-12-03 at 13:18, J.R. Lockwood wrote: > > > list will come up with something clever. the other issues > is that you > > need to be careful when doing equality comparisons with > floating point > > numbers. unless your matrix consists of characters or integers, > > you'll need to think about some level of numerical tolerance of your > > comparison. > > Yes, the matrix will always be integer.Other than what J.R. and Marc suggested, you might could try to use dist(t(x), method="manhattan") and see which entries are 0 (or close enough to 0). HTH, Andy> ------------------------------------------------------------------- > Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net> > GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE > ------------------------------------------------------------------- > All theoretical chemistry is really physics; and all theoretical > chemists > know it. > -- Richard P. Feynman > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >