Tiago R Magalhaes
2005-Mar-18 02:11 UTC
[R] extract rows in dataframe with duplicated column values
Hi I want to extract all the rows in a data frame that have duplicates for a given column. I would expect this question to come up pretty often but I have researched the archives and surprisingly couldn't find anything. The best I can come up with is: x <- data.frame(a=c(1,2,2,3,3,3), b=10) xdup1 <- duplicated(x[,1]) xdup2 <- duplicated(x[,1][nrow(x):1])[nrow(x):1] xAllDups <- x[(xdup1+xdup2)!=0,] This seems to work, but it's so convoluted that I'm sure there's a better method. Thanks for any help and enlightenment [[alternative HTML version deleted]]
Liaw, Andy
2005-Mar-18 03:14 UTC
[R] extract rows in dataframe with duplicated column values
Does this work for you?> x[table(x[,1]) > 1,]a b 2 2 10 3 2 10 5 3 10 6 3 10 Andy> From: Tiago R Magalhaes > > Hi > > I want to extract all the rows in a data frame that have duplicates > for a given column. > I would expect this question to come up pretty often but I have > researched the archives and surprisingly couldn't find anything. > The best I can come up with is: > > x <- data.frame(a=c(1,2,2,3,3,3), b=10) > xdup1 <- duplicated(x[,1]) > xdup2 <- duplicated(x[,1][nrow(x):1])[nrow(x):1] > xAllDups <- x[(xdup1+xdup2)!=0,] > > This seems to work, but it's so convoluted that I'm sure there's a > better method. > Thanks for any help and enlightenment > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
Liaw, Andy
2005-Mar-18 03:25 UTC
[R] extract rows in dataframe with duplicated column values
OK, strike one... Here's my second try:> cnt <- table(x[,1]) > v <- as.numeric(names(cnt[cnt > 1])) > v[1] 2 3> x[x[,1] %in% v, ]a b 2 2 10 3 2 10 4 3 10 5 3 10 6 3 10 Andy> From: Liaw, Andy > > Does this work for you? > > > x[table(x[,1]) > 1,] > a b > 2 2 10 > 3 2 10 > 5 3 10 > 6 3 10 > > Andy > > > From: Tiago R Magalhaes > > > > Hi > > > > I want to extract all the rows in a data frame that have duplicates > > for a given column. > > I would expect this question to come up pretty often but I have > > researched the archives and surprisingly couldn't find anything. > > The best I can come up with is: > > > > x <- data.frame(a=c(1,2,2,3,3,3), b=10) > > xdup1 <- duplicated(x[,1]) > > xdup2 <- duplicated(x[,1][nrow(x):1])[nrow(x):1] > > xAllDups <- x[(xdup1+xdup2)!=0,] > > > > This seems to work, but it's so convoluted that I'm sure there's a > > better method. > > Thanks for any help and enlightenment > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > -------------------------------------------------------------- > ---------------- > Notice: This e-mail message, together with any attachments, > contains information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, New Jersey, USA 08889), and/or its > affiliates (which may be known outside the United States as > Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as > Banyu) that may be confidential, proprietary copyrighted > and/or legally privileged. It is intended solely for the use > of the individual or entity named on this message. If you > are not the intended recipient, and have received this > message in error, please notify us immediately by reply > e-mail and then delete it from your system. > -------------------------------------------------------------- > ---------------- > >
Rob J Goedman
2005-Mar-18 03:35 UTC
[R] extract rows in dataframe with duplicated column values
Tiago, Assuming the column in x is sorted: t = which(duplicated(x[, 1])) x[sort(union(t-1, t)),] or, if not sorted: t = which(duplicated(sort(x[, 1]))) x[sort(union(t-1, t)),] Rob On Mar 17, 2005, at 6:11 PM, Tiago R Magalhaes wrote:> Hi > > I want to extract all the rows in a data frame that have duplicates > for a given column. > I would expect this question to come up pretty often but I have > researched the archives and surprisingly couldn't find anything. > The best I can come up with is: > > x <- data.frame(a=c(1,2,2,3,3,3), b=10) > xdup1 <- duplicated(x[,1]) > xdup2 <- duplicated(x[,1][nrow(x):1])[nrow(x):1] > xAllDups <- x[(xdup1+xdup2)!=0,] > > This seems to work, but it's so convoluted that I'm sure there's a > better method. > Thanks for any help and enlightenment > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html[[alternative text/enriched version deleted]]