Hello. I am trying to remove from my dataframe, those rows in which the first 7 columns are duplicated even if subsequent columns make those rows unique. df<-data.frame(id=rep(c('amy','bob','joe') , each=5), pet1=sample(LETTERS[1:3],15, replace=T), pet2=sample(LETTERS[1:3],15, replace=T), pet3=sample(LETTERS[1:5],15, replace=T))>dfid pet1 pet2 pet3 1 amy C B A 2 amy B A A 3 amy A A D 4 amy B C A 5 amy C B B 6 bob B A A 7 bob C A C 8 bob C C A 9 bob B C E 10 bob C B C 11 joe C B A 12 joe A B E 13 joe C C B 14 joe C A D 15 joe A C C I am trying to identify and remove the rows of df that are duplicates in df[,1:3]. culled.df<-unique(x[,1:3])>culled.dfid pet1 pet2 1 amy A A 2 amy C C 3 amy C A 5 amy A B 6 bob A B 7 bob C C 8 bob B C 10 bob B A 11 joe B B 13 joe B C 14 joe B A This is where I'm hung up. I've been trying match() or %in% to get the rows of df where df[,1:3] match df.culled> df[df.culled %in% df[,1:3],]Is this a reasonable solution, or am I making it more difficult than it need to be? Thanks for your suggestions, Jason -- View this message in context: http://r.789695.n4.nabble.com/partial-duplicates-of-dataframe-rows-indexing-and-removal-tp4171322p4171322.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]
Jean V Adams
2011-Dec-08 12:21 UTC
[R] partial duplicates of dataframe rows, indexing and removal
Try this: df[!duplicated(df[, 1:3]), ] Jean Dgnn wrote on 12/07/2011 08:24:01 PM:> Hello. I am trying to remove from my dataframe, those rows in which thefirst> 7 columns are duplicated even if subsequent columns make those rowsunique.> > df<-data.frame(id=rep(c('amy','bob','joe') , each=5), > pet1=sample(LETTERS[1:3],15, replace=T), > pet2=sample(LETTERS[1:3],15, replace=T), > pet3=sample(LETTERS[1:5],15, replace=T)) > > >df > > id pet1 pet2 pet3 > 1 amy C B A > 2 amy B A A > 3 amy A A D > 4 amy B C A > 5 amy C B B > 6 bob B A A > 7 bob C A C > 8 bob C C A > 9 bob B C E > 10 bob C B C > 11 joe C B A > 12 joe A B E > 13 joe C C B > 14 joe C A D > 15 joe A C C > > I am trying to identify and remove the rows of df that are duplicates in > df[,1:3]. > > culled.df<-unique(x[,1:3]) > >culled.df > id pet1 pet2 > 1 amy A A > 2 amy C C > 3 amy C A > 5 amy A B > 6 bob A B > 7 bob C C > 8 bob B C > 10 bob B A > 11 joe B B > 13 joe B C > 14 joe B A > > This is where I'm hung up. I've been trying match() or %in% to get therows> of df where df[,1:3] match df.culled > > > df[df.culled %in% df[,1:3],] > > Is this a reasonable solution, or am I making it more difficult than itneed> to be? > > Thanks for your suggestions, > > Jason[[alternative HTML version deleted]]