Hi all, Here is my situation. I have a dataframe, the structure would be something like this, TestData<-data.frame(ID=rep(1:10,each=10),TIME=rep(seq(0.1,1,0.1),10),VAR1=rnorm(100),VAR2=5*rnorm(100),VAR3=10*rnorm(100)) Basically, I want to extract the maximum value from each ID for VAR1, VAR2, VAR3...... The way I can think of is do.call(rbind,lapply(split(TestData,TestData$ID),function(x)x[which.max(x$VAR1),'VAR1'])) and do this for each of the variables and put the results back. It's kind of clumsy but OK for several variables. I have dozens of them. Is there a better way to do it? It would be ideal to produce the results like ID VAR1.max VAR2.max VAR3.max 1 1.2828796 8.63276 15.051992 2 1.1870067 8.691801 10.736301 3 1.2815352 6.335692 5.827524 4 1.6719411 5.998597 16.646212 5 1.5631107 6.067457 15.331046 6 0.718989 6.610279 7.306005 7 0.8734315 13.39844 16.965365 8 2.7447862 10.21613 22.545131 9 3.490395 10.83543 25.744662 10 0.4719087 11.73021 7.226687 Thanks for any help. Jun Shen [[alternative HTML version deleted]]
How about aggregate(TestData[,c('VAR1','VAR2','VAR3')], by=list(id=TestData$ID), FUN=max) -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 6/20/14 12:42 PM, "Jun Shen" <jun.shen.ut at gmail.com> wrote:>Hi all, > >Here is my situation. I have a dataframe, the structure would be something >like this, > >TestData<-data.frame(ID=rep(1:10,each=10),TIME=rep(seq(0.1,1,0.1),10),VAR1 >=rnorm(100),VAR2=5*rnorm(100),VAR3=10*rnorm(100)) > >Basically, I want to extract the maximum value from each ID for VAR1, >VAR2, >VAR3...... > >The way I can think of is > >do.call(rbind,lapply(split(TestData,TestData$ID),function(x)x[which.max(x$ >VAR1),'VAR1'])) > >and do this for each of the variables and put the results back. It's kind >of clumsy but OK for several variables. I have dozens of them. Is there a >better way to do it? > >It would be ideal to produce the results like > > ID VAR1.max VAR2.max VAR3.max 1 1.2828796 8.63276 15.051992 2 >1.1870067 >8.691801 10.736301 3 1.2815352 6.335692 5.827524 4 1.6719411 5.998597 >16.646212 5 1.5631107 6.067457 15.331046 6 0.718989 6.610279 7.306005 7 >0.8734315 13.39844 16.965365 8 2.7447862 10.21613 22.545131 9 3.490395 >10.83543 25.744662 10 0.4719087 11.73021 7.226687 >Thanks for any help. > >Jun Shen > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Have you looked at the 'aggregate' function? E.g., aggregate(TestData[c("VAR1","VAR2","VAR3")], by=TestData["ID"], max) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 20, 2014 at 12:42 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:> Hi all, > > Here is my situation. I have a dataframe, the structure would be something > like this, > > TestData<-data.frame(ID=rep(1:10,each=10),TIME=rep(seq(0.1,1,0.1),10),VAR1=rnorm(100),VAR2=5*rnorm(100),VAR3=10*rnorm(100)) > > Basically, I want to extract the maximum value from each ID for VAR1, VAR2, > VAR3...... > > The way I can think of is > > do.call(rbind,lapply(split(TestData,TestData$ID),function(x)x[which.max(x$VAR1),'VAR1'])) > > and do this for each of the variables and put the results back. It's kind > of clumsy but OK for several variables. I have dozens of them. Is there a > better way to do it? > > It would be ideal to produce the results like > > ID VAR1.max VAR2.max VAR3.max 1 1.2828796 8.63276 15.051992 2 1.1870067 > 8.691801 10.736301 3 1.2815352 6.335692 5.827524 4 1.6719411 5.998597 > 16.646212 5 1.5631107 6.067457 15.331046 6 0.718989 6.610279 7.306005 7 > 0.8734315 13.39844 16.965365 8 2.7447862 10.21613 22.545131 9 3.490395 > 10.83543 25.744662 10 0.4719087 11.73021 7.226687 > Thanks for any help. > > Jun Shen > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You could try: library(plyr) res <- ddply(TestData[,-2],.(ID),numcolwise(max)) colnames(res)[-1] <- paste0(colnames(res)[-1],".max") A.K. On Friday, June 20, 2014 3:43 PM, Jun Shen <jun.shen.ut at gmail.com> wrote: Hi all, Here is my situation. I have a dataframe, the structure would be something like this, TestData<-data.frame(ID=rep(1:10,each=10),TIME=rep(seq(0.1,1,0.1),10),VAR1=rnorm(100),VAR2=5*rnorm(100),VAR3=10*rnorm(100)) Basically, I want to extract the maximum value from each ID for VAR1, VAR2, VAR3...... The way I can think of is do.call(rbind,lapply(split(TestData,TestData$ID),function(x)x[which.max(x$VAR1),'VAR1'])) and do this for each of the variables and put the results back. It's kind of clumsy but OK for several variables. I have dozens of them. Is there a better way to do it? It would be ideal to produce the results like ? ID VAR1.max VAR2.max VAR3.max? 1 1.2828796 8.63276 15.051992? 2 1.1870067 8.691801 10.736301? 3 1.2815352 6.335692 5.827524? 4 1.6719411 5.998597 16.646212? 5 1.5631107 6.067457 15.331046? 6 0.718989 6.610279 7.306005? 7 0.8734315 13.39844 16.965365? 8 2.7447862 10.21613 22.545131? 9 3.490395 10.83543 25.744662? 10 0.4719087 11.73021 7.226687 Thanks for any help. Jun Shen ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.