Dimitri Liakhovitski
2009-Sep-21 18:14 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
Hello, dear R-ers! I built a data frame "grid" (below) with 4 columns. I want to exclude all rows that have equal values in ANY 2 columns. Here is how I am doing it: index<-expand.grid(1:4,1:4,1:4,1:4) dim(index) # Deleting rows that have identical values in any two columns (1 line of code): index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),] dim(index) index I was wondering if there is a more elegant way of doing it - because as the number of columns increases, the amount of code one would have to write increases A LOT. Thank you very much for any suggestion! -- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
Dimitris Rizopoulos
2009-Sep-21 18:44 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
one way is the following:
index <- expand.grid(1:4, 1:4, 1:4, 1:4)
mat <- data.matrix(index)
keep <- apply(mat, 1, function (x, d)
length(unique(x)) == d, d = ncol(mat))
index[keep, ]
I hope it helps.
Best,
Dimitris
Dimitri Liakhovitski wrote:> Hello, dear R-ers!
>
> I built a data frame "grid" (below) with 4 columns. I want to
exclude
> all rows that have equal values in ANY 2 columns. Here is how I am
> doing it:
>
> index<-expand.grid(1:4,1:4,1:4,1:4)
> dim(index)
> # Deleting rows that have identical values in any two columns (1 line of
code):
>
index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),]
> dim(index)
> index
>
>
> I was wondering if there is a more elegant way of doing it - because
> as the number of columns increases, the amount of code one would have
> to write increases A LOT.
>
> Thank you very much for any suggestion!
>
>
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Erik Iverson
2009-Sep-21 18:55 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
Hello, Do you mean exactly any 2 columns. What if the value is equal in more than 2 columns?> > I built a data frame "grid" (below) with 4 columns. I want to exclude > all rows that have equal values in ANY 2 columns. Here is how I am > doing it: > > index<-expand.grid(1:4,1:4,1:4,1:4)If a value is equal in 2 or more rows, i.e., duplicated, then the following should work, assuming index can be changed to a matrix for apply ... t3 <- index[apply(index, 1, function(x) all(!duplicated(x))),]
David Winsemius
2009-Sep-21 19:05 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
On Sep 21, 2009, at 2:14 PM, Dimitri Liakhovitski wrote:> Hello, dear R-ers! > > I built a data frame "grid" (below) with 4 columns. I want to exclude > all rows that have equal values in ANY 2 columns. Here is how I am > doing it: > > index<-expand.grid(1:4,1:4,1:4,1:4) > dim(index) > # Deleting rows that have identical values in any two columns (1 > line of code): > index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&! > (index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index > $Var2==index$Var4)&!(index$Var3==index$Var4),]What "worked" seems longer that it needs to be, but here is where I ended up: index[sapply(apply(index, 1, unique), function(x) length(x)==4), ] Same output as: library(e1071) permutations(4) David Winsemius, MD Heritage Laboratories West Hartford, CT
Jorge Ivan Velez
2009-Sep-21 19:16 UTC
[R] More elegant way of excluding rows with equal values in any 2 columns?
Hi Dimitri,
Try also either
index[apply(index, 1, function( x ) length( unique( x ) ) == 4 ),]
or
# install.packages('e1071')
require(e1071)
permutations(4)
HTH,
Jorge
On Mon, Sep 21, 2009 at 2:14 PM, Dimitri Liakhovitski <> wrote:
> Hello, dear R-ers!
>
> I built a data frame "grid" (below) with 4 columns. I want to
exclude
> all rows that have equal values in ANY 2 columns. Here is how I am
> doing it:
>
> index<-expand.grid(1:4,1:4,1:4,1:4)
> dim(index)
> # Deleting rows that have identical values in any two columns (1 line of
> code):
>
>
index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),]
> dim(index)
> index
>
>
> I was wondering if there is a more elegant way of doing it - because
> as the number of columns increases, the amount of code one would have
> to write increases A LOT.
>
> Thank you very much for any suggestion!
>
>
>
> --
> Dimitri Liakhovitski
> Ninah.com
> Dimitri.Liakhovitski@ninah.com
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
William Dunlap
2009-Sep-21 19:26 UTC
[R] More elegant way of excluding rows with equal values in any 2columns?
Assuming your real dataset isn't the one you showed
(for which e1071::permutation(4) works well) you can
sort each row and then quickly check for duplicates by
comparing each column to the previous column. E.g.,
f <- function(index){
rowSort <- function(x){
x <- t(as.matrix(x))
x[] <- x[order(col(x), x)]
t(x)
}
tmp <- rowSort(index)
keep <- rep(T, nrow(tmp))
if(ncol(tmp)>1) for(i in 2:ncol(tmp))
keep <- keep & tmp[,i] != tmp[,i-1]
index[keep,]
}
f(index)
Some package probably has a row sorting function but
the above works pretty well.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Dimitri
> Liakhovitski
> Sent: Monday, September 21, 2009 11:14 AM
> To: R-Help List
> Subject: [R] More elegant way of excluding rows with equal
> values in any 2columns?
>
> Hello, dear R-ers!
>
> I built a data frame "grid" (below) with 4 columns. I want to
exclude
> all rows that have equal values in ANY 2 columns. Here is how I am
> doing it:
>
> index<-expand.grid(1:4,1:4,1:4,1:4)
> dim(index)
> # Deleting rows that have identical values in any two columns
> (1 line of code):
> index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var
> 3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index
> $Var2==index$Var4)&!(index$Var3==index$Var4),]
> dim(index)
> index
>
>
> I was wondering if there is a more elegant way of doing it - because
> as the number of columns increases, the amount of code one would have
> to write increases A LOT.
>
> Thank you very much for any suggestion!
>
>
>
> --
> Dimitri Liakhovitski
> Ninah.com
> Dimitri.Liakhovitski at ninah.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>