thr3ads.net - R help - [R] More elegant way of excluding rows with equal values in any 2 columns? [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Dimitri Liakhovitski

2009-Sep-21 18:14 UTC

[R] More elegant way of excluding rows with equal values in any 2 columns?

Hello, dear R-ers!

I built a data frame "grid" (below) with 4 columns. I want to exclude
all rows that have equal values in ANY 2 columns. Here is how I am
doing it:

index<-expand.grid(1:4,1:4,1:4,1:4)
dim(index)
# Deleting rows that have identical values in any two columns (1 line of code):
index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),]
dim(index)
index


I was wondering if there is a more elegant way of doing it - because
as the number of columns increases, the amount of code one would have
to write increases A LOT.

Thank you very much for any suggestion!



-- 
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com

Dimitris Rizopoulos

2009-Sep-21 18:44 UTC

head link

[R] More elegant way of excluding rows with equal values in any 2 columns?

one way is the following:

index <- expand.grid(1:4, 1:4, 1:4, 1:4)

mat <- data.matrix(index)
keep <- apply(mat, 1, function (x, d)
     length(unique(x)) == d, d = ncol(mat))
index[keep, ]


I hope it helps.

Best,
Dimitris


Dimitri Liakhovitski wrote:> Hello, dear R-ers!
> 
> I built a data frame "grid" (below) with 4 columns. I want to
exclude
> all rows that have equal values in ANY 2 columns. Here is how I am
> doing it:
> 
> index<-expand.grid(1:4,1:4,1:4,1:4)
> dim(index)
> # Deleting rows that have identical values in any two columns (1 line of
code):
>
index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),]
> dim(index)
> index
> 
> 
> I was wondering if there is a more elegant way of doing it - because
> as the number of columns increases, the amount of code one would have
> to write increases A LOT.
> 
> Thank you very much for any suggestion!
> 
> 
> 
-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

Erik Iverson

2009-Sep-21 18:55 UTC

head link

[R] More elegant way of excluding rows with equal values in any 2 columns?

Hello, 

Do you mean exactly any 2 columns.  What if the value is equal in more than 2
columns?
> 
> I built a data frame "grid" (below) with 4 columns. I want to
exclude
> all rows that have equal values in ANY 2 columns. Here is how I am
> doing it:
> 
> index<-expand.grid(1:4,1:4,1:4,1:4)
If a value is equal in 2 or more rows, i.e., duplicated, then the following
should work, assuming index can be changed to a matrix for apply ...

t3 <- index[apply(index, 1, function(x) all(!duplicated(x))),]

David Winsemius

2009-Sep-21 19:05 UTC

head link

[R] More elegant way of excluding rows with equal values in any 2 columns?

On Sep 21, 2009, at 2:14 PM, Dimitri Liakhovitski wrote:
> Hello, dear R-ers!
>
> I built a data frame "grid" (below) with 4 columns. I want to
exclude
> all rows that have equal values in ANY 2 columns. Here is how I am
> doing it:
>
> index<-expand.grid(1:4,1:4,1:4,1:4)
> dim(index)
> # Deleting rows that have identical values in any two columns (1  
> line of code):
>
index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!
> (index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index 
> $Var2==index$Var4)&!(index$Var3==index$Var4),]
What "worked" seems longer that it needs to be, but here is where I  
ended up:

index[sapply(apply(index, 1, unique), function(x) length(x)==4), ]

Same output as:
library(e1071)
permutations(4)


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Jorge Ivan Velez

2009-Sep-21 19:16 UTC

head link

[R] More elegant way of excluding rows with equal values in any 2 columns?

Hi Dimitri,
Try also either

index[apply(index, 1, function( x ) length( unique( x ) ) == 4 ),]

or

# install.packages('e1071')
require(e1071)
permutations(4)

HTH,
Jorge


On Mon, Sep 21, 2009 at 2:14 PM, Dimitri Liakhovitski <> wrote:
> Hello, dear R-ers!
>
> I built a data frame "grid" (below) with 4 columns. I want to
exclude
> all rows that have equal values in ANY 2 columns. Here is how I am
> doing it:
>
> index<-expand.grid(1:4,1:4,1:4,1:4)
> dim(index)
> # Deleting rows that have identical values in any two columns (1 line of
> code):
>
>
index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index$Var2==index$Var4)&!(index$Var3==index$Var4),]
> dim(index)
> index
>
>
> I was wondering if there is a more elegant way of doing it - because
> as the number of columns increases, the amount of code one would have
> to write increases A LOT.
>
> Thank you very much for any suggestion!
>
>
>
> --
> Dimitri Liakhovitski
> Ninah.com
> Dimitri.Liakhovitski@ninah.com
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

William Dunlap

2009-Sep-21 19:26 UTC

head link

[R] More elegant way of excluding rows with equal values in any 2columns?

Assuming your real dataset isn't the one you showed
(for which e1071::permutation(4) works well) you can
sort each row and then quickly check for duplicates by
comparing each column to the previous column.  E.g.,

f <- function(index){
   rowSort <- function(x){
      x <- t(as.matrix(x))
      x[] <- x[order(col(x), x)]
      t(x)
   }
   tmp <- rowSort(index)
   keep <- rep(T, nrow(tmp))
   if(ncol(tmp)>1) for(i in 2:ncol(tmp))
     keep <- keep & tmp[,i] != tmp[,i-1]
   index[keep,]
} 

f(index)

Some package probably has a row sorting function but
the above works pretty well.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Dimitri 
> Liakhovitski
> Sent: Monday, September 21, 2009 11:14 AM
> To: R-Help List
> Subject: [R] More elegant way of excluding rows with equal 
> values in any 2columns?
> 
> Hello, dear R-ers!
> 
> I built a data frame "grid" (below) with 4 columns. I want to
exclude
> all rows that have equal values in ANY 2 columns. Here is how I am
> doing it:
> 
> index<-expand.grid(1:4,1:4,1:4,1:4)
> dim(index)
> # Deleting rows that have identical values in any two columns 
> (1 line of code):
> index<-index[!(index$Var1==index$Var2)&!(index$Var1==index$Var
> 3)&!(index$Var1==index$Var4)&!(index$Var2==index$Var3)&!(index
> $Var2==index$Var4)&!(index$Var3==index$Var4),]
> dim(index)
> index
> 
> 
> I was wondering if there is a more elegant way of doing it - because
> as the number of columns increases, the amount of code one would have
> to write increases A LOT.
> 
> Thank you very much for any suggestion!
> 
> 
> 
> -- 
> Dimitri Liakhovitski
> Ninah.com
> Dimitri.Liakhovitski at ninah.com
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Sep 2009 - More elegant way of excluding rows with equal values in any 2 columns?

[R] More elegant way of excluding rows with equal values in any 2 columns?

[R] More elegant way of excluding rows with equal values in any 2 columns?

[R] More elegant way of excluding rows with equal values in any 2 columns?

[R] More elegant way of excluding rows with equal values in any 2 columns?

[R] More elegant way of excluding rows with equal values in any 2 columns?

[R] More elegant way of excluding rows with equal values in any 2columns?

Possibly Parallel Threads