Hello, I am relatively new to R, and I am trying to select the last observation within a group, where the group is defined by two variables. One of the variables is a date. In the below example, C3 varies within C2, which varies within C1. I need to select the last observation in C3 for 4 groups (C1*C2): 1x, 1y, 2x, and 2y. In my real dataset, C2 is a date (mm/dd/yy) C1 C2 C3 1 x 1 1 x 2 1 y 1 1 y 2 2 x 1 2 x 2 2 y 1 2 y 2 I have found code (from UCLA R FAQs and this list's archives) for selecting the last observation when a group is defined by ONE variable (e.g., C1): last <-by(mydata, mydata$C1, tail, n=1) lastd<-do.call("rbind", as.list(last)) The by function does not seem to allow two variables in the Indices argument: last <-by(mydata, mydata$C1 mydata$C2, tail, n=1) THIS DOESN'T WORK I tried creating a new variable C1*C2, but I think this is risky since it may not be unique depending on my values of C1 and C2 (I have a very large dataset) Thank you for the help, [[alternative HTML version deleted]]
Peter Alspach
2012-Apr-04 20:52 UTC
[R] Selecting obs within groups defined by 2 variables
Tena koe Naomi There are lots of ways to do this. Here are a couple (note I've made a minor modification to your example):> naomiC1 C2 C3 1 1 x 1 2 1 x 2 3 1 y 1 4 1 y 2 5 2 x 1 6 2 x 2 7 2 x 3 8 2 y 1 9 2 y 2> tapply(naomi[,3], naomi[,1:2], function(x) x[length(x)])C2 C1 x y 1 2 2 2 3 2> aggregate(naomi[,3], naomi[,1:2], function(x) x[length(x)])C1 C2 x 1 1 x 2 2 2 x 3 3 1 y 2 4 2 y 2 HTH .... Peter Alspach -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Naomi Sugie Sent: Thursday, 5 April 2012 8:21 a.m. To: r-help at r-project.org Subject: [R] Selecting obs within groups defined by 2 variables Hello, I am relatively new to R, and I am trying to select the last observation within a group, where the group is defined by two variables. One of the variables is a date. In the below example, C3 varies within C2, which varies within C1. I need to select the last observation in C3 for 4 groups (C1*C2): 1x, 1y, 2x, and 2y. In my real dataset, C2 is a date (mm/dd/yy) C1 C2 C3 1 x 1 1 x 2 1 y 1 1 y 2 2 x 1 2 x 2 2 y 1 2 y 2 I have found code (from UCLA R FAQs and this list's archives) for selecting the last observation when a group is defined by ONE variable (e.g., C1): last <-by(mydata, mydata$C1, tail, n=1) lastd<-do.call("rbind", as.list(last)) The by function does not seem to allow two variables in the Indices argument: last <-by(mydata, mydata$C1 mydata$C2, tail, n=1) THIS DOESN'T WORK I tried creating a new variable C1*C2, but I think this is risky since it may not be unique depending on my values of C1 and C2 (I have a very large dataset) Thank you for the help, [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.
Hello,> > The by function does not seem to allow two variables in the Indices > argument: >Yes it does, but you must use a list of variables. (Read the help for 'by': INDICES a factor or a list of factors, each of length nrow(data).) mydata <- read.table(text=" C1 C2 C3 1 x 1 1 x 2 1 y 1 1 y 2 2 x 1 2 x 2 2 y 1 2 y 2 ", header=TRUE) last <-by(mydata, list(mydata$C1, mydata$C2), tail, n=1) last # Another way, output is more usefull. last2 <- aggregate(mydata, list(mydata$C1, mydata$C2), tail, n=1) last2[, -(1:2)] Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Selecting-obs-within-groups-defined-by-2-variables-tp4533125p4533169.html Sent from the R help mailing list archive at Nabble.com.