Leif Kirschenbaum
2006-Jan-10 21:03 UTC
[R] Find last row (observation) for each combination of variables
Let's say I have a data.frame like
A B C TS other columns
1 1 1 12345
1 1 1 56789
1 2 1 23456
1 2 2 23457
2 4 7 23458
2 4 7 34567
2 4 7 45678
and I want the last row for each unique combination of A/B/C, where by
"last" I mean greatest TS.
A B C TS other columns
1 1 1 56789
1 2 1 23456
1 2 2 23457
2 4 7 45678
I did this simply in SAS:
proc sort data=DF;
by A B C descending TS
run;
proc sort data=DF NODUPKEY;
by A B C;
run;
I tried using "aggregate" to find the maximum TS for each combination
of A/B/C, but it's slow.
I also tried "by" but it's also slow.
My current (faster) solution is:
DF$abc<-paste(DF$A,DF$B,DF$C,sep="")
abclist<-unique(DF$ABC)
numtest<-length(abclist)
maxTS<-rep(0,numtest)
for(i in 1:numtest){
maxTS[i]<-max(DF$TS[DF$abc==abclist[i]],na.rm=TRUE)
}
maxTSdf<-data.frame(device=I(abc),maxTS=maxTS )
DF<-merge(DF,maxTSdf,by="abc",all.x=TRUE)
DF<-Df[DF$TS==DF$maxTS,,drop=TRUE]
DF$maxTS<-NULL
This seems a bit lengthy for such a simple task.
Any simpler suggestions?
-Leif K.
Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
leif at reflectivity.com
Berton Gunter
2006-Jan-10 21:55 UTC
[R] Find last row (observation) for each combination of variables
Leif: Rather than trying to mimic what you might do in SAS take advantage of R's ability to use arbitrary data structures, e.g. lists. So, one approach is: (your.df is the data frame) your.list<-split(your.df,your.ts[,1:3],drop=TRUE) t(sapply(your.list,function(x)x[which.max(x$TS),])) Cheers, Bert -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leif > Kirschenbaum > Sent: Tuesday, January 10, 2006 1:04 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Find last row (observation) for each combination > of variables > > Let's say I have a data.frame like > A B C TS other columns > 1 1 1 12345 > 1 1 1 56789 > 1 2 1 23456 > 1 2 2 23457 > 2 4 7 23458 > 2 4 7 34567 > 2 4 7 45678 > > and I want the last row for each unique combination of A/B/C, > where by "last" I mean greatest TS. > A B C TS other columns > 1 1 1 56789 > 1 2 1 23456 > 1 2 2 23457 > 2 4 7 45678 > > I did this simply in SAS: > proc sort data=DF; > by A B C descending TS > run; > proc sort data=DF NODUPKEY; > by A B C; > run; > > I tried using "aggregate" to find the maximum TS for each > combination of A/B/C, but it's slow. > I also tried "by" but it's also slow. > My current (faster) solution is: > > DF$abc<-paste(DF$A,DF$B,DF$C,sep="") > abclist<-unique(DF$ABC) > numtest<-length(abclist) > maxTS<-rep(0,numtest) > for(i in 1:numtest){ > maxTS[i]<-max(DF$TS[DF$abc==abclist[i]],na.rm=TRUE) > } > maxTSdf<-data.frame(device=I(abc),maxTS=maxTS ) > DF<-merge(DF,maxTSdf,by="abc",all.x=TRUE) > DF<-Df[DF$TS==DF$maxTS,,drop=TRUE] > DF$maxTS<-NULL > > This seems a bit lengthy for such a simple task. > > Any simpler suggestions? > > -Leif K. > > Leif Kirschenbaum > Senior Yield Engineer > Reflectivity, Inc. > (408) 737-8100 x307 > leif at reflectivity.com > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >