violet lock
2010-Feb-09 22:00 UTC
[Rd] Aggregate dataframe variables, return more than 2 vars
Hello r-devel, I have data.frame with 3 columns and I would like to group by 1 column(id), find the max of the third column (date) and return the data for that max date value along with the id and the value in the second column. Example:>dat <- data.frame(id = rep(1:3, 3), date = as.Date(rep(c("2005-08-25","2005-08-26", "2005-08-29"), each = 3)), decod = c("SCREEN", "SCREEN", "SCREEN", "RAND", "RAND", "RAND", "COMPLETE", "COMPLETE", "WITHDRAWAL") ) What I need is it to return is: id x.decod.1. end 1 1 COMPLETE 2005-08-29 2 2 COMPLETE 2005-08-29 3 3 WITHDRAWAL 2005-08-29 I can get the max date and the id 2 different ways:> do.call("rbind", lapply(split(dat, dat$id), function(x) data.frame(id x$id[1], max_date = max(x$date))))id end 1 1 2005-08-29 2 2 2005-08-29 3 3 2005-08-29 OR> aggregate(dat$date, list(USUBJID=dat$id),FUN="max")USUBJID x 1 1 13024 2 2 13024 3 3 13024 (which oddly returns some number of days after 1-1-1970 iso of the max as a date value) I’d like to do this without looping or filtering for date and usubjid if possible. If there is a way to return the index from the max date function that I can then use to index the data.frame? I came across a function dapply which looks like it might work but unfortunately the package isn’t one I can install in the near future due to some company restrictions. Any ideas would be appreciated, VL [[alternative HTML version deleted]]
Martin Maechler
2010-Feb-10 07:59 UTC
[Rd] Aggregate dataframe .. *inappropriate* for R-devel
Not at all appropriate to be posted to R-devel :>>>>> "vl" == violet lock <violetlock at gmail.com> >>>>> on Tue, 9 Feb 2010 17:00:48 -0500 writes:> Hello r-devel, > I have data.frame with 3 columns and I would like to group by 1 column(id), > find the max of the third column (date) and return the data for that max > date value along with the id and the value in the second column. > Example: >> dat <- data.frame(id = rep(1:3, 3), date = as.Date(rep(c("2005-08-25", > "2005-08-26", "2005-08-29"), each = 3)), decod = c("SCREEN", "SCREEN", > "SCREEN", "RAND", "RAND", "RAND", "COMPLETE", "COMPLETE", "WITHDRAWAL") ) > What I need is it to return is: [...............] [...............] > Any ideas would be appreciated, E-mails that look like the above *CLEARLY* belong to R-help and *NEVER* to R-devel. Please read the posting guide>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.and *then* repost to R-help! Martin Maechler, ETH Zurich