Dylan Arena
2013-Feb-05 02:29 UTC
[R] How to subset a data frame to include only first events
Hi there, I have data frame with columns ID and Date. There are multiple rows for each ID, but I only want to keep the *first* such row--i.e., the row corresponding to the earliest event. So if I had, say, 1000 rows of 100 IDs doing an average of ten events each, I'd run this trimming procedure and end up with a data frame containing 100 rows (one for each ID), where each row record that ID's first event. I can think of slow, clumsy, for-loop ways to trim the data frame, but I'm hopeful that there is some slick "R" way to do it that someone here can help me find. But so deep is my ignorance that I can't even come up with useful search terms to use on Rseek.org (I investigated "merge" but had no luck there). Grateful for any ideas/tips/pointers, Dylan [[alternative HTML version deleted]]
Dylan Arena
2013-Feb-05 03:31 UTC
[R] How to subset a data frame to include only first events
Update: A bit more digging and I found duplicated() ( http://tolstoy.newcastle.edu.au/R/e4/help/08/06/13592.html). Sorry for the premature request for help! On Mon, Feb 4, 2013 at 6:29 PM, Dylan Arena <darena@stanford.edu> wrote:> Hi there, > > > I have data frame with columns ID and Date. There are multiple rows for > each ID, but I only want to keep the *first* such row--i.e., the row > corresponding to the earliest event. So if I had, say, 1000 rows of 100 > IDs doing an average of ten events each, I'd run this trimming procedure > and end up with a data frame containing 100 rows (one for each ID), where > each row record that ID's first event. > > I can think of slow, clumsy, for-loop ways to trim the data frame, but I'm > hopeful that there is some slick "R" way to do it that someone here can > help me find. But so deep is my ignorance that I can't even come up with > useful search terms to use on Rseek.org (I investigated "merge" but had no > luck there). > > > Grateful for any ideas/tips/pointers, > Dylan > >[[alternative HTML version deleted]]
HI, If the ?`Date` column is not ordered: Date1=as.Date(c("01/05/2012","01/07/2012","01/15/2012","01/09/2012","01/14/2012","01/25/2012", "01/08/2012","01/24/2012","01/03/2012"),format="%m/%d/%Y") dat1<-data.frame(ID=rep(1:3,each=3),Date1) ?aggregate(Date1~ID,data=dat1,function(x) min(x)) # ?ID ? ? ?Date1 #1 ?1 2012-01-05 #2 ?2 2012-01-09 #3 ?3 2012-01-03 #If it is ordered: Date2=as.Date(c("01/05/2012","01/07/2012","01/15/2012","01/09/2012","01/14/2012","01/25/2012", "01/03/2012","01/08/2012","01/24/2012"),format="%m/%d/%Y") dat2<- data.frame(ID=rep(1:3,each=3),Date2) ?aggregate(Date2~ID,data=dat2,head,1) ?# ID ? ? ?Date2 #1 ?1 2012-01-05 #2 ?2 2012-01-09 #3 ?3 2012-01-03 A.K. ----- Original Message ----- From: Dylan Arena <darena at stanford.edu> To: r-help at r-project.org Cc: Sent: Monday, February 4, 2013 9:29 PM Subject: [R] How to subset a data frame to include only first events Hi there, I have data frame with columns ID and Date.? There are multiple rows for each ID, but I only want to keep the *first* such row--i.e., the row corresponding to the earliest event.? So if I had, say, 1000 rows of 100 IDs doing an average of ten events each, I'd run this trimming procedure and end up with a data frame containing 100 rows (one for each ID), where each row record that ID's first event. I can think of slow, clumsy, for-loop ways to trim the data frame, but I'm hopeful that there is some slick "R" way to do it that someone here can help me find.? But so deep is my ignorance that I can't even come up with useful search terms to use on Rseek.org (I investigated "merge" but had no luck there). Grateful for any ideas/tips/pointers, Dylan ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.