Dimitri Liakhovitski
2009-Sep-21 18:14 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
Hello, dear R-ers! I built a data frame "grid" (below) with 4 columns. I want to exclude all rows that have equal values in ANY 2 columns. Here is how I am doing it: index<-expand.grid(1:4,1:4,1:4,1:4) dim(index) # Deleting rows that have identical values in any two columns (1 line of code): index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),] dim(index) index I was wondering if there is a more elegant way of doing it - because as the number of columns increases, the amount of code one would have to write increases A LOT. Thank you very much for any suggestion! -- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
Dimitris Rizopoulos
2009-Sep-21 18:44 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
one way is the following: index <- expand.grid(1:4, 1:4, 1:4, 1:4) mat <- data.matrix(index) keep <- apply(mat, 1, function (x, d) length(unique(x)) == d, d = ncol(mat)) index[keep, ] I hope it helps. Best, Dimitris Dimitri Liakhovitski wrote:> Hello, dear R-ers! > > I built a data frame "grid" (below) with 4 columns. I want to exclude > all rows that have equal values in ANY 2 columns. Here is how I am > doing it: > > index<-expand.grid(1:4,1:4,1:4,1:4) > dim(index) > # Deleting rows that have identical values in any two columns (1 line of code): > index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),] > dim(index) > index > > > I was wondering if there is a more elegant way of doing it - because > as the number of columns increases, the amount of code one would have > to write increases A LOT. > > Thank you very much for any suggestion! > > >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Erik Iverson
2009-Sep-21 18:55 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
Hello, Do you mean exactly any 2 columns. What if the value is equal in more than 2 columns?> > I built a data frame "grid" (below) with 4 columns. I want to exclude > all rows that have equal values in ANY 2 columns. Here is how I am > doing it: > > index<-expand.grid(1:4,1:4,1:4,1:4)If a value is equal in 2 or more rows, i.e., duplicated, then the following should work, assuming index can be changed to a matrix for apply ... t3 <- index[apply(index, 1, function(x) all(!duplicated(x))),]
David Winsemius
2009-Sep-21 19:05 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
On Sep 21, 2009, at 2:14 PM, Dimitri Liakhovitski wrote:> Hello, dear R-ers! > > I built a data frame "grid" (below) with 4 columns. I want to exclude > all rows that have equal values in ANY 2 columns. Here is how I am > doing it: > > index<-expand.grid(1:4,1:4,1:4,1:4) > dim(index) > # Deleting rows that have identical values in any two columns (1 > line of code): > index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&! > (index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index > $Var2==index$Var4)&!(index$Var3==index$Var4),]What "worked" seems longer that it needs to be, but here is where I ended up: index[sapply(apply(index, 1, unique), function(x) length(x)==4), ] Same output as: library(e1071) permutations(4) David Winsemius, MD Heritage Laboratories West Hartford, CT
Jorge Ivan Velez
2009-Sep-21 19:16 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
Hi Dimitri, Try also either index[apply(index, 1, function( x ) length( unique( x ) ) == 4 ),] or # install.packages('e1071') require(e1071) permutations(4) HTH, Jorge On Mon, Sep 21, 2009 at 2:14 PM, Dimitri Liakhovitski <> wrote:> Hello, dear R-ers! > > I built a data frame "grid" (below) with 4 columns. I want to exclude > all rows that have equal values in ANY 2 columns. Here is how I am > doing it: > > index<-expand.grid(1:4,1:4,1:4,1:4) > dim(index) > # Deleting rows that have identical values in any two columns (1 line of > code): > > index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),] > dim(index) > index > > > I was wondering if there is a more elegant way of doing it - because > as the number of columns increases, the amount of code one would have > to write increases A LOT. > > Thank you very much for any suggestion! > > > > -- > Dimitri Liakhovitski > Ninah.com > Dimitri.Liakhovitski@ninah.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
William Dunlap
2009-Sep-21 19:26 UTC
[R] More elegant way of excluding rows with equal values in any 2columns?
Assuming your real dataset isn't the one you showed (for which e1071::permutation(4) works well) you can sort each row and then quickly check for duplicates by comparing each column to the previous column. E.g., f <- function(index){ rowSort <- function(x){ x <- t(as.matrix(x)) x[] <- x[order(col(x), x)] t(x) } tmp <- rowSort(index) keep <- rep(T, nrow(tmp)) if(ncol(tmp)>1) for(i in 2:ncol(tmp)) keep <- keep & tmp[,i] != tmp[,i-1] index[keep,] } f(index) Some package probably has a row sorting function but the above works pretty well. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Dimitri > Liakhovitski > Sent: Monday, September 21, 2009 11:14 AM > To: R-Help List > Subject: [R] More elegant way of excluding rows with equal > values in any 2columns? > > Hello, dear R-ers! > > I built a data frame "grid" (below) with 4 columns. I want to exclude > all rows that have equal values in ANY 2 columns. Here is how I am > doing it: > > index<-expand.grid(1:4,1:4,1:4,1:4) > dim(index) > # Deleting rows that have identical values in any two columns > (1 line of code): > index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var > 3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index > $Var2==index$Var4)&!(index$Var3==index$Var4),] > dim(index) > index > > > I was wondering if there is a more elegant way of doing it - because > as the number of columns increases, the amount of code one would have > to write increases A LOT. > > Thank you very much for any suggestion! > > > > -- > Dimitri Liakhovitski > Ninah.com > Dimitri.Liakhovitski at ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >