Hi, Is there an easy way to remove dataframe rows without duplicated values of a specified column ('id')? e.g., dat <- data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 c(1,4,3,3,4,3)) dat id value value2 1 1 5 1 2 1 6 4 3 1 7 3 4 2 4 3 5 3 5 4 6 3 4 3 This is sample data and the real data has hundreds of rows. In this case, only row 4 does not have a duplicated id and I would like to remove it without using: dat$id[4] <- NULL Any help is appreciated! AC [[alternative HTML version deleted]]
This is ugly, but it gets what you want. dat[which(dat[,1] %in% unique((dat[duplicated(dat[,1], fromLast = T), 1]))),] AC Del Re wrote> > Hi, > > Is there an easy way to remove dataframe rows without duplicated values of > a specified column ('id')? e.g., > > dat <- data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 > c(1,4,3,3,4,3)) > dat > > id value value2 > 1 1 5 1 > 2 1 6 4 > 3 1 7 3 > 4 2 4 3 > 5 3 5 4 > 6 3 4 3 > > > This is sample data and the real data has hundreds of rows. In this > case, only row 4 does not have a duplicated id and I would like to > remove it without using: > > > dat$id[4] <- NULL > > > Any help is appreciated! > > > AC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/Removing-rows-in-dataframe-w-o-duplicated-values-tp4096582p4096672.html Sent from the R help mailing list archive at Nabble.com.
Hi: Here's one way: do.call(rbind, lapply(L, function(d) if(nrow(d) > 1) return(d))) id value value2 1.1 1 5 1 1.2 1 6 4 1.3 1 7 3 3.5 3 5 4 3.6 3 4 3 HTH, Dennis On Tue, Nov 22, 2011 at 9:43 AM, AC Del Re <delre at wisc.edu> wrote:> Hi, > > Is there an easy way to remove dataframe rows without duplicated values of > a specified column ('id')? ?e.g., > > dat <- data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 > c(1,4,3,3,4,3)) > dat > > ?id value value2 > 1 ?1 ? ? 5 ? ? ?1 > 2 ?1 ? ? 6 ? ? ?4 > 3 ?1 ? ? 7 ? ? ?3 > 4 ?2 ? ? 4 ? ? ?3 > 5 ?3 ? ? 5 ? ? ?4 > 6 ?3 ? ? 4 ? ? ?3 > > > This is sample data and the real data has hundreds of rows. In this > case, only row 4 does not have a duplicated id and I would like to > remove it without using: > > > dat$id[4] <- NULL > > > Any help is appreciated! > > > AC > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dimitris Rizopoulos
2011-Nov-22 18:45 UTC
[R] Removing rows in dataframe w'o duplicated values
one approach is the following: dat <- data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 = c(1,4,3,3,4,3)) ind <- ave(dat$id, dat$id, FUN = length) > 1 dat[ind, ] I hope it helps. Best, Dimitris On 11/22/2011 6:43 PM, AC Del Re wrote:> Hi, > > Is there an easy way to remove dataframe rows without duplicated values of > a specified column ('id')? e.g., > > dat<- data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 > c(1,4,3,3,4,3)) > dat > > id value value2 > 1 1 5 1 > 2 1 6 4 > 3 1 7 3 > 4 2 4 3 > 5 3 5 4 > 6 3 4 3 > > > This is sample data and the real data has hundreds of rows. In this > case, only row 4 does not have a duplicated id and I would like to > remove it without using: > > > dat$id[4]<- NULL > > > Any help is appreciated! > > > AC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
David Winsemius
2011-Nov-22 18:47 UTC
[R] Removing rows in dataframe w'o duplicated values
On Nov 22, 2011, at 12:43 PM, AC Del Re wrote:> Hi, > > Is there an easy way to remove dataframe rows without duplicated > values of > a specified column ('id')? e.g., > > dat <- data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), > value2 > c(1,4,3,3,4,3)) > dat > > id value value2 > 1 1 5 1 > 2 1 6 4 > 3 1 7 3 > 4 2 4 3 > 5 3 5 4 > 6 3 4 3> dat[ave(dat$id, dat$id, FUN=length) >1, ] id value value2 1 1 5 1 2 1 6 4 3 1 7 3 5 3 5 4 6 3 4 3> > > This is sample data and the real data has hundreds of rows. In this > case, only row 4 does not have a duplicated id and I would like to > remove it without using: > > > dat$id[4] <- NULL > > > Any help is appreciated! > > > AC > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT