Dear R users, I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank. e.g> df1cno rank 1 1342 0.23 2 1342 0.14 3 1342 0.56 4 2568 0.15 5 2568 0.89 so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and delete rest of the duplicate cases. Could somebody help me? Regards Daniel Amsterdam Send instant messages to your online friends http://uk.messenger.yahoo.com [[alternative HTML version deleted]]
Dear Daniel, Try this: x=read.table(textConnection("cno rank 1 1342 0.23 2 1342 0.14 3 1342 0.56 4 2568 0.15 5 2568 0.89"),header=TRUE,sep="") x[cumsum(tapply(x$rank,x$cno,which.max)),] cno rank 3 1342 0.56 5 2568 0.89 HTH, Jorge On Thu, Jul 24, 2008 at 10:00 AM, Daniel Wagner <danieljm1976@yahoo.com> wrote:> Dear R users, > > I have a dataframe with lot of duplicate cases and I want to delete > duplicate ones which have low rank and keep that case which has highest > rank. > e.g > > > df1 > cno rank > 1 1342 0.23 > 2 1342 0.14 > 3 1342 0.56 > 4 2568 0.15 > 5 2568 0.89 > > so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and > delete rest of the duplicate cases. > Could somebody help me? > > Regards > > Daniel > Amsterdam > > > > > > > > > > Send instant messages to your online friends http://uk.messenger.yahoo.com > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Try this: aggregate(x$rank, list(cno=x$cno), max) On 7/24/08, Daniel Wagner <danieljm1976 at yahoo.com> wrote:> Dear R users, > > I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank. > e.g > > > df1 > cno rank > 1 1342 0.23 > 2 1342 0.14 > 3 1342 0.56 > 4 2568 0.15 > 5 2568 0.89 > > so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and delete rest of the duplicate cases. > Could somebody help me? > > Regards > > Daniel > Amsterdam > > > > > > > > > > Send instant messages to your online friends http://uk.messenger.yahoo.com > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
Daniel - First, use order() to arrange the data.frame into an appropriate format. Then, use duplicated() with the negation operator to get rid of the duplicated values. Daniel Wagner wrote:> Dear R users, > ? > I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank. > e.g > ? >> df1 > ? cno? ? ? ? ? rank > 1? 1342? ? ? 0.23 > 2? 1342? ? ? 0.14 > 3? 1342? ? ? 0.56 > 4? ? 2568? ? ? 0.15 > 5? 2568? ? ? 0.89 > ? > so I want to keep? 3rd and 5th? cases with highest rank (0.56 & 0.89) and delete? rest of the duplicate cases. > Could somebody help me? > ? > Regards > ? > Daniel > Amsterdam > ? > ? > ? > ? > ? > ? > ? > ? > > Send instant messages to your online friends http://uk.messenger.yahoo.com > [[alternative HTML version deleted]] > > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
on 07/24/2008 09:00 AM Daniel Wagner wrote:> Dear R users, > > I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank. > e.g > >> df1 > cno rank > 1 1342 0.23 > 2 1342 0.14 > 3 1342 0.56 > 4 2568 0.15 > 5 2568 0.89 > > so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and delete rest of the duplicate cases. > Could somebody help me? > > Regards > > Daniel > AmsterdamFor the simple two column case, see ?aggregate: > aggregate(dfl$rank, list(cno = dfl$cno), max) cno x 1 1342 0.56 2 2568 0.89 A more generic approach might be: > do.call(rbind, lapply(split(dfl, dfl$cno), function(x) x[which.max(x$rank), ])) cno rank 1342 1342 0.56 2568 2568 0.89 For example, using the iris dataset, get the rows, by Species, with the highest Sepal.Length: > do.call(rbind, lapply(split(iris, iris$Species), function(x) x[which.max(x$Sepal.Length), ])) Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa 5.8 4.0 1.2 0.2 setosa versicolor 7.0 3.2 4.7 1.4 versicolor virginica 7.9 3.8 6.4 2.0 virginica HTH, Marc Schwartz
this works cno = c(rep(1342,times=3),rep(2568,times=2)) rank = c(.23,.14,.56,.15,.89) df1 = data.frame(cno,rank)[order(cno,rank),] cnou = unique(cno) ind = match(cno,cnou) where = tapply(rank,ind,length) where = cumsum(as.numeric(where)) df1[where,] regards, PF 2008/7/24 Daniel Wagner <danieljm1976 at yahoo.com>:> Dear R users, > > I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank. > e.g > >> df1 > cno rank > 1 1342 0.23 > 2 1342 0.14 > 3 1342 0.56 > 4 2568 0.15 > 5 2568 0.89 > > so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and delete rest of the duplicate cases. > Could somebody help me? > > Regards > > Daniel > Amsterdam > > > > > > > > > > Send instant messages to your online friends http://uk.messenger.yahoo.com > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >