Dylan Arena
2013-Feb-05  02:29 UTC
[R] How to subset a data frame to include only first events
Hi there, I have data frame with columns ID and Date. There are multiple rows for each ID, but I only want to keep the *first* such row--i.e., the row corresponding to the earliest event. So if I had, say, 1000 rows of 100 IDs doing an average of ten events each, I'd run this trimming procedure and end up with a data frame containing 100 rows (one for each ID), where each row record that ID's first event. I can think of slow, clumsy, for-loop ways to trim the data frame, but I'm hopeful that there is some slick "R" way to do it that someone here can help me find. But so deep is my ignorance that I can't even come up with useful search terms to use on Rseek.org (I investigated "merge" but had no luck there). Grateful for any ideas/tips/pointers, Dylan [[alternative HTML version deleted]]
Dylan Arena
2013-Feb-05  03:31 UTC
[R] How to subset a data frame to include only first events
Update: A bit more digging and I found duplicated() ( http://tolstoy.newcastle.edu.au/R/e4/help/08/06/13592.html). Sorry for the premature request for help! On Mon, Feb 4, 2013 at 6:29 PM, Dylan Arena <darena@stanford.edu> wrote:> Hi there, > > > I have data frame with columns ID and Date. There are multiple rows for > each ID, but I only want to keep the *first* such row--i.e., the row > corresponding to the earliest event. So if I had, say, 1000 rows of 100 > IDs doing an average of ten events each, I'd run this trimming procedure > and end up with a data frame containing 100 rows (one for each ID), where > each row record that ID's first event. > > I can think of slow, clumsy, for-loop ways to trim the data frame, but I'm > hopeful that there is some slick "R" way to do it that someone here can > help me find. But so deep is my ignorance that I can't even come up with > useful search terms to use on Rseek.org (I investigated "merge" but had no > luck there). > > > Grateful for any ideas/tips/pointers, > Dylan > >[[alternative HTML version deleted]]
HI,
If the ?`Date` column is not ordered:
Date1=as.Date(c("01/05/2012","01/07/2012","01/15/2012","01/09/2012","01/14/2012","01/25/2012",
"01/08/2012","01/24/2012","01/03/2012"),format="%m/%d/%Y")
dat1<-data.frame(ID=rep(1:3,each=3),Date1)
?aggregate(Date1~ID,data=dat1,function(x) min(x))
# ?ID ? ? ?Date1
#1 ?1 2012-01-05
#2 ?2 2012-01-09
#3 ?3 2012-01-03
#If it is ordered:
Date2=as.Date(c("01/05/2012","01/07/2012","01/15/2012","01/09/2012","01/14/2012","01/25/2012",
"01/03/2012","01/08/2012","01/24/2012"),format="%m/%d/%Y")
dat2<- data.frame(ID=rep(1:3,each=3),Date2)
?aggregate(Date2~ID,data=dat2,head,1)
?# ID ? ? ?Date2
#1 ?1 2012-01-05
#2 ?2 2012-01-09
#3 ?3 2012-01-03
A.K.
----- Original Message -----
From: Dylan Arena <darena at stanford.edu>
To: r-help at r-project.org
Cc: 
Sent: Monday, February 4, 2013 9:29 PM
Subject: [R] How to subset a data frame to include only first events
Hi there,
I have data frame with columns ID and Date.? There are multiple rows for
each ID, but I only want to keep the *first* such row--i.e., the row
corresponding to the earliest event.? So if I had, say, 1000 rows of 100
IDs doing an average of ten events each, I'd run this trimming procedure
and end up with a data frame containing 100 rows (one for each ID), where
each row record that ID's first event.
I can think of slow, clumsy, for-loop ways to trim the data frame, but I'm
hopeful that there is some slick "R" way to do it that someone here
can
help me find.? But so deep is my ignorance that I can't even come up with
useful search terms to use on Rseek.org (I investigated "merge" but
had no
luck there).
Grateful for any ideas/tips/pointers,
Dylan
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.