thr3ads.net - R help - [R] partial duplicates of dataframe rows, indexing and removal [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Dgnn

2011-Dec-08 02:24 UTC

[R] partial duplicates of dataframe rows, indexing and removal

Hello. I am trying to remove from my dataframe, those rows in which the first
7 columns are duplicated even if subsequent columns make those rows unique. 

df<-data.frame(id=rep(c('amy','bob','joe') , each=5),
   pet1=sample(LETTERS[1:3],15, replace=T),
   pet2=sample(LETTERS[1:3],15, replace=T),
   pet3=sample(LETTERS[1:5],15, replace=T))
>df
    id     pet1 pet2 pet3
1  amy    C    B    A
2  amy    B    A    A
3  amy    A    A    D
4  amy    B    C    A
5  amy    C    B    B
6  bob    B    A    A
7  bob    C    A    C
8  bob    C    C    A
9  bob    B    C    E
10 bob    C    B    C
11 joe    C    B    A
12 joe    A    B    E
13 joe    C    C    B
14 joe    C    A    D
15 joe    A    C    C

I am trying to identify and remove the rows of df that are duplicates in
df[,1:3]. 

culled.df<-unique(x[,1:3])>culled.df    id pet1 pet2
1  amy    A    A
2  amy    C    C
3  amy    C    A
5  amy    A    B
6  bob    A    B
7  bob    C    C
8  bob    B    C
10 bob    B    A
11 joe    B    B
13 joe    B    C
14 joe    B    A

This is where I'm hung up. I've been trying match() or %in%  to get the
rows
of df where df[,1:3] match df.culled
> df[df.culled %in% df[,1:3],]
Is this a reasonable solution, or am I making it more difficult than it need
to be?

Thanks for your suggestions,

Jason



 


--
View this message in context:
http://r.789695.n4.nabble.com/partial-duplicates-of-dataframe-rows-indexing-and-removal-tp4171322p4171322.html
Sent from the R help mailing list archive at Nabble.com.
	[[alternative HTML version deleted]]

Jean V Adams

2011-Dec-08 12:21 UTC

head link

[R] partial duplicates of dataframe rows, indexing and removal

Try this:

df[!duplicated(df[, 1:3]), ]

Jean


Dgnn wrote on 12/07/2011 08:24:01 PM:
> Hello. I am trying to remove from my dataframe, those rows in which the 
first> 7 columns are duplicated even if subsequent columns make those rows 
unique. > 
> df<-data.frame(id=rep(c('amy','bob','joe') ,
each=5),
>    pet1=sample(LETTERS[1:3],15, replace=T),
>    pet2=sample(LETTERS[1:3],15, replace=T),
>    pet3=sample(LETTERS[1:5],15, replace=T))
> 
> >df
> 
>     id     pet1 pet2 pet3
> 1  amy    C    B    A
> 2  amy    B    A    A
> 3  amy    A    A    D
> 4  amy    B    C    A
> 5  amy    C    B    B
> 6  bob    B    A    A
> 7  bob    C    A    C
> 8  bob    C    C    A
> 9  bob    B    C    E
> 10 bob    C    B    C
> 11 joe    C    B    A
> 12 joe    A    B    E
> 13 joe    C    C    B
> 14 joe    C    A    D
> 15 joe    A    C    C
> 
> I am trying to identify and remove the rows of df that are duplicates in
> df[,1:3]. 
> 
> culled.df<-unique(x[,1:3])
> >culled.df
>     id pet1 pet2
> 1  amy    A    A
> 2  amy    C    C
> 3  amy    C    A
> 5  amy    A    B
> 6  bob    A    B
> 7  bob    C    C
> 8  bob    B    C
> 10 bob    B    A
> 11 joe    B    B
> 13 joe    B    C
> 14 joe    B    A
> 
> This is where I'm hung up. I've been trying match() or %in%  to get
the
rows> of df where df[,1:3] match df.culled
> 
> > df[df.culled %in% df[,1:3],]
> 
> Is this a reasonable solution, or am I making it more difficult than it 
need> to be?
> 
> Thanks for your suggestions,
> 
> Jason	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more apparently analagous threads

R help - Dec 2011 - partial duplicates of dataframe rows, indexing and removal

[R] partial duplicates of dataframe rows, indexing and removal

[R] partial duplicates of dataframe rows, indexing and removal

Maybe Matching Threads