Dear All, I have a data frame with n columns: X1, X2, ., Xn. Now I want to create a new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, the value is 0. How to do that in a quick way instead of doing (n choose 2) comparisons? Thank you, Frank [[alternative HTML version deleted]]
F Duan wrote:> Dear All, > > > > I have a data frame with n columns: X1, X2, ., Xn. Now I want to create a > new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, the value is 0. > > > > How to do that in a quick way instead of doing (n choose 2) comparisons? > > > > Thank you, > > > > Frank > >How about something like? x <- data.frame(X1 = c(1, 1, 2, 4), X2 = c(4, 1, 2, 5), X3 = c(2, 1, 2, 2)) nuniq <- function(x) length(unique(x)) as.numeric(apply(as.matrix(x), 1, nuniq) == 1) --sundar
F Duan wrote:> Dear All, > > > > I have a data frame with n columns: X1, X2, ., Xn. Now I want to create a > new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, the value is 0. >One idea is: apply(apply(X, 2, "==", X[,1]), 1, all) but there may be better solutions. Uwe Ligges> > How to do that in a quick way instead of doing (n choose 2) comparisons? > > > > Thank you, > > > > Frank > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Here's an alternative:> x <- data.frame(X1 = c(1, 1, 2, 4),+ X2 = c(4, 1, 2, 5), + X3 = c(2, 1, 2, 2))> check <- paste(names(x), collapse=" == ") > with(x, eval(parse(text=check)))[1] FALSE TRUE FALSE FALSE Cheers, Andy> From: Sundar Dorai-Raj > > F Duan wrote: > > > Dear All, > > > > I have a data frame with n columns: X1, X2, ., Xn. Now I > want to create a > > new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, > the value is 0. > > > > How to do that in a quick way instead of doing (n choose 2) > comparisons? > > > > Thank you, > > > > Frank > > How about something like? > > x <- data.frame(X1 = c(1, 1, 2, 4), > X2 = c(4, 1, 2, 5), > X3 = c(2, 1, 2, 2)) > nuniq <- function(x) length(unique(x)) > as.numeric(apply(as.matrix(x), 1, nuniq) == 1) > > --sundar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
How about: X<-as.matrix(yourframe) apply(X,2, '==',X[,1])%*%rep(1,ncol(X)) == ncol(x) avoiding the rowwise apply overhead? Cheers, Bert Gunter Non-Clinical Biostatistics Genentech MS: 240B Phone: 650-467-7374 "The business of the statistician is to catalyze the scientific learning process." -- George E.P. Box F Duan wrote:> Dear All, > > > > I have a data frame with n columns: X1, X2, ., Xn. Now I want to create a > new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, the value is 0. > > > > How to do that in a quick way instead of doing (n choose 2) comparisons? > > > > Thank you, > > > > Frank > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
F Duan <f.duan <at> yale.edu> writes:> I have a data frame with n columns: X1, X2, ., Xn. Now I want to create a > new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, the value is 0.Here is one possibility if your data frame is all numeric: x$new <- (sd(t(x))==0)+0 x is the data frame, new is the new column, sd(t(x)) is TRUE for rows with zero standard deviation (which occurs iff the entries are all zero) and FALSE otherwise and +0 converts that to 1 or 0.
Stupid me: fell into this trap:> 0 == 0 == 0[1] FALSE Andy> From: Sundar Dorai-Raj > > Liaw, Andy wrote: > > > Here's an alternative: > > > > > >>x <- data.frame(X1 = c(1, 1, 2, 4), > > > > + X2 = c(4, 1, 2, 5), > > + X3 = c(2, 1, 2, 2)) > > > >>check <- paste(names(x), collapse=" == ") > >>with(x, eval(parse(text=check))) > > > > [1] FALSE TRUE FALSE FALSE > > Oops. Should be > > > [1] FALSE TRUE TRUE FALSE > > This is TRUE for the second case by accident since the second > element is 1. > > > x$X4 <- (x$X1 == x$X2) > > as.numeric(x$X4) > [1] 0 1 1 0 > > x$X4 == x$X3 > [1] FALSE TRUE FALSE FALSE > > > > > >
Gabor Grothendieck <ggrothendieck <at> myway.com> writes: : : F Duan <f.duan <at> yale.edu> writes: : : > I have a data frame with n columns: X1, X2, ., Xn. Now I want to create a : > new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, the value is 0. : : Here is one possibility if your data frame is all numeric: : : x$new <- (sd(t(x))==0)+0 : : x is the data frame, new is the new column, sd(t(x)) is TRUE for : rows with zero standard deviation (which occurs iff the entries : are all zero) and FALSE otherwise and +0 converts that to 1 or 0. ^^^^ "zero" should be replaced by "the same".
This is'nt very elegant, and you may want to replace identical () with a function doing more appropiate numerical floating-point comparison: > midentical <- function(...){ dots <- list(...) n <- length(dots) ans <- TRUE for ( i in (1:(n-1))) { ans <- ans && identical(dots[i], dots[i+1]) } return(ans) } > x1 <- c(1,2,3,4,5) > x2 <- c(1,3,2,4,5) > x3 <- c(1,6,7,4,5) > mapply(function(...) ifelse(midentical(...),1,0),x1,x2,x3) [1] 1 0 0 1 1 Kjetil Halvorsen F Duan wrote:>Dear All, > > > >I have a data frame with n columns: X1, X2, ., Xn. Now I want to create a >new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, the value is 0. > > > >How to do that in a quick way instead of doing (n choose 2) comparisons? > > > >Thank you, > > > >Frank > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > > >
Not so fast:> x <- matrix(rep(c(2, 1, 3), 2), nr=2, byrow=TRUE) > x[,1] [,2] [,3] [1,] 2 1 3 [2,] 2 1 3> rowSums(x) / ncol(x) == x[,1][1] TRUE TRUE Andy> From: Jim Brennan > > This similar method may be quicker > x1$new <- 1*(rowSums(x1)/ncol(x1)==x1[,1]) > Learning lots from these type questions! > > Jim > > From: "Gabor Grothendieck" > > > Berton Gunter <gunter.berton <at> gene.com> writes: > > > > : > > : How about: > > : > > : X<-as.matrix(yourframe) > > : apply(X,2, '==',X[,1])%*%rep(1,ncol(X)) == ncol(x) > > : > > : avoiding the rowwise apply overhead? > > > > Following up on your idea we can use rowSums instead of matrix > multiplication > > to speed it up even more: > > > > R> x <- data.frame(X1 = c(1.5, 1.5, 2.5, 4.5), > > + X2 = c(4.5, 1.5, 2.5, 5.5), X3 = c(2.5, 1.5, 2.5, 2.5)) > > R> set.seed(1) > > R> x1 <- x2 <- x[sample(4,100000,rep=T),] > > > > R> gc();system.time({x1$new <- (rowSums(x1==x1[,1])==ncol(x))+0}) > > used (Mb) gc trigger (Mb) > > Ncells 634654 17.0 1590760 42.5 > > Vcells 1017322 7.8 3820120 29.2 > > [1] 0.48 0.00 0.48 NA NA > > > > R> gc(); system.time({X <- as.matrix(x2); x2$new <- c(apply(X,2, > '==',X[,1])%*% > > rep(1,ncol(X)) == ncol(x))+0}) > > used (Mb) gc trigger (Mb) > > Ncells 634668 17.0 1590760 42.5 > > Vcells 1517333 11.6 3820120 29.2 > > [1] 1.39 0.03 1.50 NA NA > > > > R> all.equal(x1,x2) > > [1] TRUE > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
F Duan wrote:> Dear All, > > > > I have a data frame with n columns: X1, X2, ., Xn. Now I want to create a > new column: if X1 = X2 = . = Xn, the value is 1; Otherwise, the value is 0. > > > > How to do that in a quick way instead of doing (n choose 2) comparisons? >Assuming that the Xs are numeric, and your data frame is named data.df: data.df$newcol<- as.numeric(apply(as.matrix(my.df),1,function(x) return(length(unique(x))==1))) Jim