Hello. I am trying to remove from my dataframe, those rows in which the first
7 columns are duplicated even if subsequent columns make those rows unique. 
df<-data.frame(id=rep(c('amy','bob','joe') , each=5),
   pet1=sample(LETTERS[1:3],15, replace=T),
   pet2=sample(LETTERS[1:3],15, replace=T),
   pet3=sample(LETTERS[1:5],15, replace=T))
>df
    id     pet1 pet2 pet3
1  amy    C    B    A
2  amy    B    A    A
3  amy    A    A    D
4  amy    B    C    A
5  amy    C    B    B
6  bob    B    A    A
7  bob    C    A    C
8  bob    C    C    A
9  bob    B    C    E
10 bob    C    B    C
11 joe    C    B    A
12 joe    A    B    E
13 joe    C    C    B
14 joe    C    A    D
15 joe    A    C    C
I am trying to identify and remove the rows of df that are duplicates in
df[,1:3]. 
culled.df<-unique(x[,1:3])>culled.df
    id pet1 pet2
1  amy    A    A
2  amy    C    C
3  amy    C    A
5  amy    A    B
6  bob    A    B
7  bob    C    C
8  bob    B    C
10 bob    B    A
11 joe    B    B
13 joe    B    C
14 joe    B    A
This is where I'm hung up. I've been trying match() or %in%  to get the
rows
of df where df[,1:3] match df.culled
> df[df.culled %in% df[,1:3],]
Is this a reasonable solution, or am I making it more difficult than it need
to be?
Thanks for your suggestions,
Jason
 
--
View this message in context:
http://r.789695.n4.nabble.com/partial-duplicates-of-dataframe-rows-indexing-and-removal-tp4171322p4171322.html
Sent from the R help mailing list archive at Nabble.com.
	[[alternative HTML version deleted]]
Jean V Adams
2011-Dec-08  12:21 UTC
[R] partial duplicates of dataframe rows, indexing and removal
Try this: df[!duplicated(df[, 1:3]), ] Jean Dgnn wrote on 12/07/2011 08:24:01 PM:> Hello. I am trying to remove from my dataframe, those rows in which thefirst> 7 columns are duplicated even if subsequent columns make those rowsunique.> > df<-data.frame(id=rep(c('amy','bob','joe') , each=5), > pet1=sample(LETTERS[1:3],15, replace=T), > pet2=sample(LETTERS[1:3],15, replace=T), > pet3=sample(LETTERS[1:5],15, replace=T)) > > >df > > id pet1 pet2 pet3 > 1 amy C B A > 2 amy B A A > 3 amy A A D > 4 amy B C A > 5 amy C B B > 6 bob B A A > 7 bob C A C > 8 bob C C A > 9 bob B C E > 10 bob C B C > 11 joe C B A > 12 joe A B E > 13 joe C C B > 14 joe C A D > 15 joe A C C > > I am trying to identify and remove the rows of df that are duplicates in > df[,1:3]. > > culled.df<-unique(x[,1:3]) > >culled.df > id pet1 pet2 > 1 amy A A > 2 amy C C > 3 amy C A > 5 amy A B > 6 bob A B > 7 bob C C > 8 bob B C > 10 bob B A > 11 joe B B > 13 joe B C > 14 joe B A > > This is where I'm hung up. I've been trying match() or %in% to get therows> of df where df[,1:3] match df.culled > > > df[df.culled %in% df[,1:3],] > > Is this a reasonable solution, or am I making it more difficult than itneed> to be? > > Thanks for your suggestions, > > Jason[[alternative HTML version deleted]]