Leif Kirschenbaum
2006-Jan-10 21:03 UTC
[R] Find last row (observation) for each combination of variables
Let's say I have a data.frame like A B C TS other columns 1 1 1 12345 1 1 1 56789 1 2 1 23456 1 2 2 23457 2 4 7 23458 2 4 7 34567 2 4 7 45678 and I want the last row for each unique combination of A/B/C, where by "last" I mean greatest TS. A B C TS other columns 1 1 1 56789 1 2 1 23456 1 2 2 23457 2 4 7 45678 I did this simply in SAS: proc sort data=DF; by A B C descending TS run; proc sort data=DF NODUPKEY; by A B C; run; I tried using "aggregate" to find the maximum TS for each combination of A/B/C, but it's slow. I also tried "by" but it's also slow. My current (faster) solution is: DF$abc<-paste(DF$A,DF$B,DF$C,sep="") abclist<-unique(DF$ABC) numtest<-length(abclist) maxTS<-rep(0,numtest) for(i in 1:numtest){ maxTS[i]<-max(DF$TS[DF$abc==abclist[i]],na.rm=TRUE) } maxTSdf<-data.frame(device=I(abc),maxTS=maxTS ) DF<-merge(DF,maxTSdf,by="abc",all.x=TRUE) DF<-Df[DF$TS==DF$maxTS,,drop=TRUE] DF$maxTS<-NULL This seems a bit lengthy for such a simple task. Any simpler suggestions? -Leif K. Leif Kirschenbaum Senior Yield Engineer Reflectivity, Inc. (408) 737-8100 x307 leif at reflectivity.com
Berton Gunter
2006-Jan-10 21:55 UTC
[R] Find last row (observation) for each combination of variables
Leif: Rather than trying to mimic what you might do in SAS take advantage of R's ability to use arbitrary data structures, e.g. lists. So, one approach is: (your.df is the data frame) your.list<-split(your.df,your.ts[,1:3],drop=TRUE) t(sapply(your.list,function(x)x[which.max(x$TS),])) Cheers, Bert -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leif > Kirschenbaum > Sent: Tuesday, January 10, 2006 1:04 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Find last row (observation) for each combination > of variables > > Let's say I have a data.frame like > A B C TS other columns > 1 1 1 12345 > 1 1 1 56789 > 1 2 1 23456 > 1 2 2 23457 > 2 4 7 23458 > 2 4 7 34567 > 2 4 7 45678 > > and I want the last row for each unique combination of A/B/C, > where by "last" I mean greatest TS. > A B C TS other columns > 1 1 1 56789 > 1 2 1 23456 > 1 2 2 23457 > 2 4 7 45678 > > I did this simply in SAS: > proc sort data=DF; > by A B C descending TS > run; > proc sort data=DF NODUPKEY; > by A B C; > run; > > I tried using "aggregate" to find the maximum TS for each > combination of A/B/C, but it's slow. > I also tried "by" but it's also slow. > My current (faster) solution is: > > DF$abc<-paste(DF$A,DF$B,DF$C,sep="") > abclist<-unique(DF$ABC) > numtest<-length(abclist) > maxTS<-rep(0,numtest) > for(i in 1:numtest){ > maxTS[i]<-max(DF$TS[DF$abc==abclist[i]],na.rm=TRUE) > } > maxTSdf<-data.frame(device=I(abc),maxTS=maxTS ) > DF<-merge(DF,maxTSdf,by="abc",all.x=TRUE) > DF<-Df[DF$TS==DF$maxTS,,drop=TRUE] > DF$maxTS<-NULL > > This seems a bit lengthy for such a simple task. > > Any simpler suggestions? > > -Leif K. > > Leif Kirschenbaum > Senior Yield Engineer > Reflectivity, Inc. > (408) 737-8100 x307 > leif at reflectivity.com > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >