I was posed the following problem/teaser: given two matrices, come up with an "elegant" (=fast & short) function that returns a matrix with all and only the non-duplicated columns of both matrices; the column order does not matter. In essence, a matrix equivalent of union(x,y), where x and y are vectors. I could not come with anything nice. Any ideas? Giuseppe -- Giuseppe A. Paleologo :: Email: paleologo@gmail.com :: AOL: gappy3000 :: Skype :: gappy3000 :: Gtalk: paleologo :: Mobile: 917.331.3497 fact: 2^32,582,657-1 is a prime [[alternative HTML version deleted]]
Ling, Gary (Electronic Trading)
2008-Aug-06 23:20 UTC
[R] Union of columns of two matrices
Here is my attempt. I'm not sure if that's the most efficient way to do it, cause I'm "cheating" using the nice features from R, namely "duplicated()". I assume the matrices have same number of rows. ### example ### ### background setup: simulate 2 matrices with some common columns (A <- cbind(1:4, matrix(rnorm(16),4), 101:104)) # [,1] [,2] [,3] [,4] [,5] [,6] # [1,] 1 -0.5305169 -1.7243920 -0.1722617 1.7343167 101 # [2,] 2 -0.3466017 0.3737072 0.5961296 1.4493053 102 # [3,] 3 -1.7812876 -1.5707614 1.4401485 0.9683144 103 # [4,] 4 -1.7219545 0.4762025 -0.2137656 0.7008253 104 (B <- cbind(matrix(rnorm(8),4),1:4,matrix(rnorm(12),4),101:104,c(1:3,5))) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] # [1,] 1.1182879 0.5340995 1 1.0434300 -0.5105291 -1.0994476 101 1 # [2,] 0.4031942 0.3156704 2 -0.4704723 0.8367561 -1.6163610 102 2 # [3,] -1.0317547 -0.5642614 3 1.0916636 1.0411857 0.1914676 103 3 # [4,] -0.6036328 3.2339688 4 1.8505135 2.0055947 -0.0359060 104 5 ### some auxiliary abstractions matrix2list <- function(M) lapply(split(M,col(M)), function(c) c) list2matrix <- function(L) sapply(L, function(c) c) ### Then the problem can be solved in 2 lines L <- c(matrix2list(A),matrix2list(B)) list2matrix(L[!duplicated(L)]) # Or even 1 line, but kind of confusing (function(L) list2matrix(L[!duplicated(L)]))(c(matrix2list(A),matrix2list(B))) # output; compare to above, the duplicated columns are gone # 1 2 3 4 5 6 1 2 # [1,] 1 -0.5305169 -1.7243920 -0.1722617 1.7343167 101 1.1182879 0.5340995 # [2,] 2 -0.3466017 0.3737072 0.5961296 1.4493053 102 0.4031942 0.3156704 # [3,] 3 -1.7812876 -1.5707614 1.4401485 0.9683144 103 -1.0317547 -0.5642614 # [4,] 4 -1.7219545 0.4762025 -0.2137656 0.7008253 104 -0.6036328 3.2339688 # 4 5 6 8 # [1,] 1.0434300 -0.5105291 -1.0994476 1 # [2,] -0.4704723 0.8367561 -1.6163610 2 # [3,] 1.0916636 1.0411857 0.1914676 3 # [4,] 1.8505135 2.0055947 -0.0359060 5 ##### end example ##### I'm not sure how "duplicated" is coded in R. If those two lists are sorted before comparing, then I guess the complexity is O(n). If not, then it's O(n^2). [n = ncol(L)] -gary -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Giuseppe Paleologo Sent: Wednesday, August 06, 2008 6:33 PM To: r-help at r-project.org Subject: [R] Union of columns of two matrices I was posed the following problem/teaser: given two matrices, come up with an "elegant" (=fast & short) function that returns a matrix with all and only the non-duplicated columns of both matrices; the column order does not matter. In essence, a matrix equivalent of union(x,y), where x and y are vectors. I could not come with anything nice. Any ideas? Giuseppe -- Giuseppe A. Paleologo :: Email: paleologo at gmail.com :: AOL: gappy3000 :: Skype :: gappy3000 :: Gtalk: paleologo :: Mobile: 917.331.3497 fact: 2^32,582,657-1 is a prime [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------- This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.
If a and b are your matrices of common row length, unique() can solve it: unique(cbind(a,b), MARGIN=2) Eric -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Giuseppe Paleologo Sent: Wednesday, August 06, 2008 3:33 PM To: r-help at r-project.org Subject: [R] Union of columns of two matrices I was posed the following problem/teaser: given two matrices, come up with an "elegant" (=fast & short) function that returns a matrix with all and only the non-duplicated columns of both matrices; the column order does not matter. In essence, a matrix equivalent of union(x,y), where x and y are vectors. I could not come with anything nice. Any ideas? Giuseppe -- Giuseppe A. Paleologo :: Email: paleologo at gmail.com :: AOL: gappy3000 :: Skype :: gappy3000 :: Gtalk: paleologo :: Mobile: 917.331.3497 fact: 2^32,582,657-1 is a prime [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Wed, Aug 06, 2008 at 06:32:43PM -0400, Giuseppe Paleologo wrote:> I was posed the following problem/teaser: > > given two matrices, come up with an "elegant" (=fast & short) function that > returns a matrix with all and only the non-duplicated columns of both > matrices; the column order does not matter. In essence, a matrix equivalent > of union(x,y), where x and y are vectors. I could not come with anything > nice. Any ideas?union.matrices <- function(a, b) { u <- cbind(a,b) u[,!duplicated(u, MARGIN=2)] } ? (Obviously not attempting to deal with issues of identity of columns containing real numbers) Dan> > Giuseppe > > -- > Giuseppe A. Paleologo :: Email: paleologo at gmail.com :: AOL: gappy3000 :: > Skype :: gappy3000 :: Gtalk: paleologo :: Mobile: 917.331.3497 > fact: 2^32,582,657-1 is a prime > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Thu, 7 Aug 2008, Dan Davison wrote:> On Wed, Aug 06, 2008 at 06:32:43PM -0400, Giuseppe Paleologo wrote: >> I was posed the following problem/teaser: >> >> given two matrices, come up with an "elegant" (=fast & short) function that >> returns a matrix with all and only the non-duplicated columns of both >> matrices; the column order does not matter. In essence, a matrix equivalent >> of union(x,y), where x and y are vectors. I could not come with anything >> nice. Any ideas? > > union.matrices <- function(a, b) { > u <- cbind(a,b) > u[,!duplicated(u, MARGIN=2)] > } > > ?Or just union.matrices <- function(a, b) unique( cbind( a , b ), MARGIN=2 ) Chuck> > (Obviously not attempting to deal with issues of identity of columns containing real numbers) > > Dan > >> >> Giuseppe >> >> -- >> Giuseppe A. Paleologo :: Email: paleologo at gmail.com :: AOL: gappy3000 :: >> Skype :: gappy3000 :: Gtalk: paleologo :: Mobile: 917.331.3497 >> fact: 2^32,582,657-1 is a prime >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901