Chandra Salgado Kent
2012-Jul-03 06:55 UTC
[R] subset data based on values in multiple columns
Dear list members,
I am trying to create a subset of a data frame based on conditions in two
columns, and after spending much time trying (and search R-help) have not had
any luck. Essentially, I have a data frame that is something like this:
date<-as.POSIXct(as.character(c("2012-01-25","2012-01-25","2012-01-26","2012-01-27","2012-01-27","2012-01-27")))
time<-as.POSIXct(as.character(c("13:20", "13:40",
"14:00", "10:00", "10:20", "10:20")),
format="%H:%M")
count<-c(12,14,11,12,12,8)
data<-data.frame(date,time,count)
which looks like:
date time count
1 2012-01-25 13:20:00 12
2 2012-01-25 13:40:00 14
3 2012-01-26 14:00:00 11
4 2012-01-27 10:00:00 12
5 2012-01-27 10:20:00 12
6 2012-01-27 10:20:00 8
I would like to create a subset by doing the following: for each unique date,
only include one case which will be the case with the max value for the column
labelled "count". So the resulting subset would be:
date time count
2 2012-01-25 13:40:00 14
3 2012-01-26 14:00:00 11
4 2012-01-27 10:00:00 12
Some dates have two cases at which the "count" was the same, but I
only want to include one case (I don't really mind which case it chooses,
but if need be it could be based on the earliest "time" for which the
same counts occurred). I have tried various loops with no success! I'm sure
that there is an easy answer that I have not found! Any help is much
appreciated!!
All the best,
Chandra
[[alternative HTML version deleted]]
Hi> > Dear list members, > > I am trying to create a subset of a data frame based on conditions intwo> columns, and after spending much time trying (and search R-help) havenot> had any luck. Essentially, I have a data frame that is something likethis:> > date<-as.POSIXct(as.character(c >("2012-01-25","2012-01-25","2012-01-26","2012-01-27","2012-01-27","2012-01-27")))> time<-as.POSIXct(as.character(c("13:20", "13:40", "14:00", "10:00", "10: > 20", "10:20")), format="%H:%M") > count<-c(12,14,11,12,12,8) > data<-data.frame(date,time,count) > > which looks like: > > date time count > 1 2012-01-25 13:20:00 12 > 2 2012-01-25 13:40:00 14 > 3 2012-01-26 14:00:00 11 > 4 2012-01-27 10:00:00 12 > 5 2012-01-27 10:20:00 12 > 6 2012-01-27 10:20:00 8 > > I would like to create a subset by doing the following: for each unique > date, only include one case which will be the case with the max valuefor> the column labelled "count". So the resulting subset would be: > > date time count > 2 2012-01-25 13:40:00 14 > 3 2012-01-26 14:00:00 11 > 4 2012-01-27 10:00:00 12 > > Some dates have two cases at which the "count" was the same, but I only > want to include one case (I don't really mind which case it chooses, but> if need be it could be based on the earliest "time" for which the same > counts occurred). I have tried various loops with no success! I'm sure > that there is an easy answer that I have not found! Any help is muchappreciated!! Just few days ago similarquestion was asked (selecting rows by maximum value of one variables in dataframe nested by another Variable). Here is what was recommended. do.call("rbind",lapply(split(data, data$date), function(x) x[which.max(x[,2]),])) Regards Petr> > All the best, > > Chandra > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Hi,
Try this:
data1<-data.frame(date,time,count)
dat1<-data1[with(data1,rev(order(count))),]
data2<-subset(dat1,rle(dat1$count)$lengths==1)
dat3<-aggregate(data2$count,list(data2$date),max)
colnames(dat3)<-c("date","count")
data4<-merge(dat3,data2)
?data4<-data4[,c(1,3,2)]
?data4
??????? date??????????????? time count
1 2012-01-25 2012-07-03 13:40:00??? 14
2 2012-01-26 2012-07-03 14:00:00??? 11
3 2012-01-27 2012-07-03 10:00:00??? 12
?
A.K.
----- Original Message -----
From: Chandra Salgado Kent <C.Salgado at cmst.curtin.edu.au>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Tuesday, July 3, 2012 2:55 AM
Subject: [R] subset data based on values in multiple columns
Dear list members,
I am trying to create a subset of a data frame based on conditions in two
columns, and after spending much time trying (and search R-help) have not had
any luck. Essentially, I have a data frame that is something like this:
date<-as.POSIXct(as.character(c("2012-01-25","2012-01-25","2012-01-26","2012-01-27","2012-01-27","2012-01-27")))
time<-as.POSIXct(as.character(c("13:20", "13:40",
"14:00", "10:00", "10:20", "10:20")),
format="%H:%M")
count<-c(12,14,11,12,12,8)
data<-data.frame(date,time,count)
which looks like:
? ? date? ? ? ? time? ? ? count
1 2012-01-25? ? 13:20:00? ? 12
2 2012-01-25? ? 13:40:00? ? 14
3 2012-01-26? ? 14:00:00? ? 11
4 2012-01-27? ? 10:00:00? ? 12
5 2012-01-27? ? 10:20:00? ? 12
6 2012-01-27? ? 10:20:00? ? ? 8
I would like to create a subset by doing the following: for each unique date,
only include one case which will be the case with the max value for the column
labelled "count". So the resulting subset would be:
? ? date? ? ? ? time? ? ? count
2 2012-01-25? ? 13:40:00? ? 14
3 2012-01-26? ? 14:00:00? ? 11
4 2012-01-27? ? 10:00:00? ? 12
Some dates have two cases at which the "count" was the same, but I
only want to include one case (I don't really mind which case it chooses,
but if need be it could be based on the earliest "time" for which the
same counts occurred). I have tried various loops with no success! I'm sure
that there is an easy answer that I have not found! Any help is much
appreciated!!
All the best,
Chandra
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.