christiaan pauw
2009-Jun-10 19:20 UTC
[R] by and by: using two indices in by() or tapply()
Hi everyone I want to apply a function by two indices. I have a number of surveyors submitting questionnaires. I want to check the time of the first submission for the day for each surveyor and also see a NA is no submission was done on a particular day. This generates a sample of the data: starttime=c("11:07:32","14:07:28","11:32:21","13:27:49","11:45:05", "12:30:06","10:27:07","10:18:07","15:29:36","16:29:23","13:46:45","10:45:26" ,"09:21:14","10:29:51","12:32:56","11:06:02","12:41:36","11:03:47", "10:58:12","10:05:54") submitdate=c("2009-05-21","2009-06-02", "2009-05-12" ,"2009-05-21", "2009-05-21", "2009-05-07", "2009-05-19" ,"2009-05-13" ,"2009-06-05", "2009-05-13", "2009-06-05", "2009-05-28", "2009-05-15", "2009-05-28", "2009-06-05", "2009-05-28", "2009-05-12", "2009-05-28", "2009-05-07", "2009-05-20") surveyor=rep(LETTERS[1:4],5) data=data.frame(surveyor, submitdate,starttime) I can generate a list of the earliest submission per day: tapply(starttime,submitdate,min) or of the earliest submission per surveyor: tapply(starttime,surveyor,min) or of the number of submissions per surveyor day: table(submitdate,surveyor) But what I want is the time of the earliest submission per surveyor per day (and NA's where applicable) Can anyone offer some advice Thanks Christiaan [[alternative HTML version deleted]]
Henrique Dallazuanna
2009-Jun-10 19:33 UTC
[R] by and by: using two indices in by() or tapply()
Try this: tapply(starttime,list(submitdate, surveyor),min) On Wed, Jun 10, 2009 at 4:20 PM, christiaan pauw <cjpauw@gmail.com> wrote:> Hi everyone > I want to apply a function by two indices. > > I have a number of surveyors submitting questionnaires. I want to check the > time of the first submission for the day for each surveyor and also see a > NA is no submission was done on a particular day. > > This generates a sample of the data: > > starttime=c("11:07:32","14:07:28","11:32:21","13:27:49","11:45:05", > > "12:30:06","10:27:07","10:18:07","15:29:36","16:29:23","13:46:45","10:45:26" > ,"09:21:14","10:29:51","12:32:56","11:06:02","12:41:36","11:03:47", > "10:58:12","10:05:54") > > submitdate=c("2009-05-21","2009-06-02", "2009-05-12" ,"2009-05-21", > "2009-05-21", "2009-05-07", "2009-05-19" ,"2009-05-13" ,"2009-06-05", > > "2009-05-13", "2009-06-05", "2009-05-28", "2009-05-15", "2009-05-28", > "2009-06-05", "2009-05-28", "2009-05-12", "2009-05-28", > > "2009-05-07", "2009-05-20") > > surveyor=rep(LETTERS[1:4],5) > > data=data.frame(surveyor, submitdate,starttime) > > > I can generate a list of the earliest submission per day: > > tapply(starttime,submitdate,min) > > or of the earliest submission per surveyor: > > tapply(starttime,surveyor,min) > > or of the number of submissions per surveyor day: > > table(submitdate,surveyor) > > > But what I want is the time of the earliest submission per surveyor per day > (and NA's where applicable) > > > Can anyone offer some advice > > Thanks > > Christiaan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]