Hi, I have a data frame with 3 columns: ID, year and score. How can I select for each unique ID, the year that has the max score? For example, for data frame ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 2000, 97 tom, 1998, 60 mary, 1998,100 I shall have ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 1998,100 Thanks, Richard [[alternative HTML version deleted]]
Henrique Dallazuanna
2010-Mar-09 18:58 UTC
[R] data frame select max group by like function
Try this: library(sqldf) sqldf("SELECT ID, Year, MAX(score) FROM DF GROUP BY ID") Or do.call(rbind, lapply(split(DF <- DF[order(DF$score),], DF$ID), tail, 1)) On Tue, Mar 9, 2010 at 3:35 PM, Tan, Richard <RTan at panagora.com> wrote:> Hi, I have a data frame with 3 columns: ID, year and score. ?How can I > select for each unique ID, the year that has the max score? ?For > example, for data frame > > ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 2000, 97 > tom, 1998, 60 > mary, 1998,100 > > I shall have > ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 1998,100 > > Thanks, > Richard > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
Hi Richard, There are probably better ways, but here is one approach: Dat <- read.table(textConnection("ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 2000, 97 tom, 1998, 60 mary, 1998,100"), header=TRUE, sep=",") MaxScore <- aggregate(Dat$score, list(Dat$ID), max) names(MaxScore) <- c("ID", "score") MaxYear <- merge(Dat, MaxScore) Best, Ista On Tue, Mar 9, 2010 at 1:35 PM, Tan, Richard <RTan at panagora.com> wrote:> Hi, I have a data frame with 3 columns: ID, year and score. ?How can I > select for each unique ID, the year that has the max score? ?For > example, for data frame > > ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 2000, 97 > tom, 1998, 60 > mary, 1998,100 > > I shall have > ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 1998,100 > > Thanks, > Richard > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
I find ddply() in package plyr handy for this sort of thing: library(plyr) f <- function(x) x[which.max( x[["score"]] ), ] ## x will be a subset of Dat according to ID ddply(Dat, "ID", f) -Peter Ehlers On 2010-03-09 11:59, Ista Zahn wrote:> Hi Richard, > There are probably better ways, but here is one approach: > > Dat<- read.table(textConnection("ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 2000, 97 > tom, 1998, 60 > mary, 1998,100"), header=TRUE, sep=",") > > MaxScore<- aggregate(Dat$score, list(Dat$ID), max) > names(MaxScore)<- c("ID", "score") > MaxYear<- merge(Dat, MaxScore) > > Best, > Ista > > > On Tue, Mar 9, 2010 at 1:35 PM, Tan, Richard<RTan at panagora.com> wrote: >> Hi, I have a data frame with 3 columns: ID, year and score. How can I >> select for each unique ID, the year that has the max score? For >> example, for data frame >> >> ID, year, score >> tom, 1995, 88 >> rick, 1994, 90 >> mary, 2000, 97 >> tom, 1998, 60 >> mary, 1998,100 >> >> I shall have >> ID, year, score >> tom, 1995, 88 >> rick, 1994, 90 >> mary, 1998,100 >> >> Thanks, >> Richard >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > >-- Peter Ehlers University of Calgary
Yet another way to do this with base R:> dat = read.csv(textConnection('ID, year, score+ tom, 1995, 88 + rick, 1994, 90 + mary, 2000, 97 + tom, 1998, 60 + mary, 1998,100'))> do.call(rbind,lapply(split(dat,dat$ID),function(x)x[which.max(x$score),]))ID year score mary mary 1998 100 rick rick 1994 90 tom tom 1995 88 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Tue, 9 Mar 2010, Tan, Richard wrote:> Hi, I have a data frame with 3 columns: ID, year and score. How can I > select for each unique ID, the year that has the max score? For > example, for data frame > > ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 2000, 97 > tom, 1998, 60 > mary, 1998,100 > > I shall have > ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 1998,100 > > Thanks, > Richard > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Try this. The aggregate call gives a data frame with the ID and max score. Then we merge that back with the original data frame so that we pick up the year too: merge(DF, aggregate(DF['score'], DF['ID'], max)) On Tue, Mar 9, 2010 at 1:35 PM, Tan, Richard <RTan at panagora.com> wrote:> Hi, I have a data frame with 3 columns: ID, year and score. ?How can I > select for each unique ID, the year that has the max score? ?For > example, for data frame > > ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 2000, 97 > tom, 1998, 60 > mary, 1998,100 > > I shall have > ID, year, score > tom, 1995, 88 > rick, 1994, 90 > mary, 1998,100 > > Thanks, > Richard > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >