thr3ads.net - R help - [R] How to delete duplicate cases? [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Daniel Wagner

2008-Jul-24 14:00 UTC

[R] How to delete duplicate cases?

Dear R users,
 
I have a dataframe with lot of duplicate cases and I want to delete duplicate
ones which have low rank and keep that case which has highest rank.
e.g
 > df1  cno      rank
1  1342    0.23
2  1342    0.14
3  1342    0.56
4  2568    0.15
5  2568    0.89
 
so I want to keep 3rd and 5th  cases with highest rank (0.56 & 0.89) and
delete rest of the duplicate cases.
Could somebody help me?
 
Regards
 
Daniel
Amsterdam
 
 
 
 
 
 
 
 

Send instant messages to your online friends http://uk.messenger.yahoo.com 
	[[alternative HTML version deleted]]

Jorge Ivan Velez

2008-Jul-24 14:16 UTC

head link

[R] How to delete duplicate cases?

Dear Daniel,

Try this:

x=read.table(textConnection("cno      rank
1  1342    0.23
2  1342    0.14
3  1342    0.56
4  2568    0.15
5  2568    0.89"),header=TRUE,sep="")

x[cumsum(tapply(x$rank,x$cno,which.max)),]
cno rank
3 1342 0.56
5 2568 0.89


HTH,

Jorge



On Thu, Jul 24, 2008 at 10:00 AM, Daniel Wagner <danieljm1976@yahoo.com>
wrote:
> Dear R users,
>
> I have a dataframe with lot of duplicate cases and I want to delete
> duplicate ones which have low rank and keep that case which has highest
> rank.
> e.g
>
> > df1
>   cno      rank
> 1  1342    0.23
> 2  1342    0.14
> 3  1342    0.56
> 4  2568    0.15
> 5  2568    0.89
>
> so I want to keep 3rd and 5th  cases with highest rank (0.56 & 0.89)
and
> delete rest of the duplicate cases.
> Could somebody help me?
>
> Regards
>
> Daniel
> Amsterdam
>
>
>
>
>
>
>
>
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

Henrique Dallazuanna

2008-Jul-24 14:20 UTC

head link

[R] How to delete duplicate cases?

Try this:

 aggregate(x$rank, list(cno=x$cno), max)

On 7/24/08, Daniel Wagner <danieljm1976 at yahoo.com>
wrote:> Dear R users,
>
> I have a dataframe with lot of duplicate cases and I want to delete
duplicate ones which have low rank and keep that case which has highest rank.
> e.g
>
> > df1
>   cno      rank
> 1  1342    0.23
> 2  1342    0.14
> 3  1342    0.56
> 4  2568    0.15
> 5  2568    0.89
>
> so I want to keep 3rd and 5th  cases with highest rank (0.56 & 0.89)
and delete rest of the duplicate cases.
> Could somebody help me?
>
> Regards
>
> Daniel
> Amsterdam
>
>
>
>
>
>
>
>
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

Erik Iverson

2008-Jul-24 14:21 UTC

head link

[R] How to delete duplicate cases?

Daniel -

First, use order() to arrange the data.frame into an appropriate format.

Then, use duplicated() with the negation operator to get rid of the 
duplicated values.



Daniel Wagner wrote:> Dear R users,
> ? 
> I have a dataframe with lot of duplicate cases and I want to delete
duplicate ones which have low rank and keep that case which has highest rank.
> e.g
> ? 
>> df1
> ?  cno? ? ? ? ?  rank
> 1?  1342? ? ?  0.23
> 2?  1342? ? ?  0.14
> 3?  1342? ? ?  0.56
> 4? ? 2568? ? ?  0.15
> 5?  2568? ? ?  0.89
> ? 
> so I want to keep? 3rd and 5th?  cases with highest rank (0.56 & 0.89)
and delete? rest of the duplicate cases.
> Could somebody help me?
> ? 
> Regards
> ? 
> Daniel
> Amsterdam
> ? 
> ? 
> ? 
> ? 
> ? 
> ? 
> ? 
> ? 
> 
> Send instant messages to your online friends http://uk.messenger.yahoo.com 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Marc Schwartz

2008-Jul-24 14:34 UTC

head link

[R] How to delete duplicate cases?

on 07/24/2008 09:00 AM Daniel Wagner wrote:> Dear R users,
>  
> I have a dataframe with lot of duplicate cases and I want to delete
duplicate ones which have low rank and keep that case which has highest rank.
> e.g
>  
>> df1
>   cno      rank
> 1  1342    0.23
> 2  1342    0.14
> 3  1342    0.56
> 4  2568    0.15
> 5  2568    0.89
>  
> so I want to keep 3rd and 5th  cases with highest rank (0.56 & 0.89)
and delete rest of the duplicate cases.
> Could somebody help me?
>  
> Regards
>  
> Daniel
> Amsterdam
For the simple two column case, see ?aggregate:

 > aggregate(dfl$rank, list(cno = dfl$cno), max)
    cno    x
1 1342 0.56
2 2568 0.89


A more generic approach might be:

 > do.call(rbind, lapply(split(dfl, dfl$cno),
                         function(x) x[which.max(x$rank), ]))
       cno rank
1342 1342 0.56
2568 2568 0.89


For example, using the iris dataset, get the rows, by Species, with the 
highest Sepal.Length:


 > do.call(rbind, lapply(split(iris, iris$Species),
                         function(x) x[which.max(x$Sepal.Length), ]))
            Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
setosa              5.8         4.0          1.2         0.2     setosa
versicolor          7.0         3.2          4.7         1.4 versicolor
virginica           7.9         3.8          6.4         2.0  virginica


HTH,

Marc Schwartz

Patrizio Frederic

2008-Jul-24 14:34 UTC

head link

[R] How to delete duplicate cases?

this works

cno 	= c(rep(1342,times=3),rep(2568,times=2))
rank	= c(.23,.14,.56,.15,.89)

df1	= data.frame(cno,rank)[order(cno,rank),]

cnou	= unique(cno)
ind	= match(cno,cnou)
where	= tapply(rank,ind,length)
where	= cumsum(as.numeric(where))

df1[where,]


regards,

PF


2008/7/24 Daniel Wagner <danieljm1976 at
yahoo.com>:> Dear R users,
>
> I have a dataframe with lot of duplicate cases and I want to delete
duplicate ones which have low rank and keep that case which has highest rank.
> e.g
>
>> df1
>   cno      rank
> 1  1342    0.23
> 2  1342    0.14
> 3  1342    0.56
> 4  2568    0.15
> 5  2568    0.89
>
> so I want to keep 3rd and 5th  cases with highest rank (0.56 & 0.89)
and delete rest of the duplicate cases.
> Could somebody help me?
>
> Regards
>
> Daniel
> Amsterdam
>
>
>
>
>
>
>
>
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Jul 2008 - How to delete duplicate cases?

[R] How to delete duplicate cases?

[R] How to delete duplicate cases?

[R] How to delete duplicate cases?

[R] How to delete duplicate cases?

[R] How to delete duplicate cases?

[R] How to delete duplicate cases?

Apparently Analagous Threads