Hi, I have something very interesting: Say I have this: GroupID Date 1 1 10-Dec-12 2 1 11-Dec-12 3 2 13-Dec-12 4 2 15-Dec-12 5 3 06-Dec-12 6 3 19-Dec-12 Now, I have time interval, week 1: from 9-Dec-12 to 15-Dec-12, week 2: from 16-Dec-12 to 22-Dec-12, and so on. Obviously, the 1st, 2nd, 3rd, 4th row falls to week 1, 5th rows should not be counted, 6th row falls into week2. Therefore, by GroupID, I will have GroupID=1, Week1=2, Week2=0 GroupID=2, Week1=2, Week2=0 GroupID=3, Week1=0, Week2=1. I just want to count the valid date that falls into a 7-day week interval, and I shall have new variables for EACH WEEK, and the counts for dates that fall into this week interval. Can anyone please help me on programming this? W [[alternative HTML version deleted]]
Hi, Please use ?dput() to show the dataset.? Also, it is not clear about how you store the time interval. dat <- read.table(text=" GroupID??????? Date 1????? 1????? 10-Dec-12 2????? 1????? 11-Dec-12 3????? 2????? 13-Dec-12 4????? 2????? 15-Dec-12 5????? 3????? 06-Dec-12 6????? 3????? 19-Dec-12",sep="",header=TRUE,stringsAsFactors=FALSE) ?dat2 <- data.frame(Week=1:2, from= c("9-Dec-12", "16-Dec-12"), to=c("15-Dec-12","22-Dec-12"),stringsAsFactors=FALSE) #Check ?findInterval() res <- t(sapply(split(dat,dat$GroupID), function(x) { ??? ??? ??? ??? ??? ??? ??? x$Date <-as.Date(x$Date,"%d-%b-%y") ??? ??? ??? ??? ??? ??? ??? ?unsplit(lapply(split(dat2,dat2$Week),function(y) { ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? y$from <- as.Date(y$from, "%d-%b-%y") ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ?y$to <- as.Date(y$to, "%d-%b-%y") ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ?sum(x$Date > y$from & x$Date <= y$to)}), ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? dat2$Week) ??? ??? ??? ??? ??? ??? ? })) colnames(res) <- paste0("Week",1:2) ?res #? Week1 Week2 #1???? 2???? 0 #2???? 2???? 0 #3???? 0???? 1 A.K. On Tuesday, October 15, 2013 6:24 PM, Weijia Wang <zeleehom at gmail.com> wrote: Hi, I have something very interesting: Say I have this: GroupID? ? ? ? Date 1? ? ? 1? ? ? 10-Dec-12 2? ? ? 1? ? ? 11-Dec-12 3? ? ? 2? ? ? 13-Dec-12 4? ? ? 2? ? ? 15-Dec-12 5? ? ? 3? ? ? 06-Dec-12 6? ? ? 3? ? ? 19-Dec-12 Now, I have time interval, week 1: from 9-Dec-12 to 15-Dec-12, week 2: from 16-Dec-12 to 22-Dec-12, and so on. Obviously, the 1st, 2nd, 3rd, 4th row falls to week 1, 5th rows should not be counted, 6th row falls into week2. Therefore, by GroupID, I will have GroupID=1, Week1=2, Week2=0 GroupID=2, Week1=2, Week2=0 GroupID=3, Week1=0, Week2=1. I just want to count the valid date that falls into a 7-day week interval, and I shall have new variables for EACH WEEK, and the counts for dates that fall into this week interval. Can anyone please help me on programming this? W ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI Weijia, Please check whether this is what you wanted. Weijia <- load("/home/arunksa111/Downloads/arun_help.RData" ) a[sapply(a,is.factor)] <-lapply(a[sapply(a,is.factor)],as.character) str(a) ?b[sapply(b,is.factor)] <- lapply(b[sapply(b,is.factor)],as.character) str(b) ?b$DT_ETP <- as.Date(b$DT_ETP,"%d-%b-%y") ?b$DT_END <- c(b$DT_ETP[-1]-1, b$DT_ETP[length(b$DT_ETP)]+6) res <- do.call(rbind,lapply(split(a,list(a$COUNTRY,a$SITEID),drop=TRUE),function(x){ x$SCRNDT <- as.Date(x$SCRNDT, "%d-%b-%y") do.call(cbind,lapply(split(b,b$TMPT),function(y) { ?sum(x$SCRNDT>= y$DT_ETP & x$SCRNDT <= y$DT_ETP) })) ?})) ?colnames(res) <- paste0("WEEK",colnames(res)) A.K. On Wednesday, October 16, 2013 10:48 AM, Weijia Wang <zeleehom at gmail.com> wrote: Hi, Arun Here I attached a R object with two dataframes. The first one is the one I need to count the number of dates that fall into a certain week interval. For example, ?? STUDYID COUNTRY SITEID SUBJID?? SCRNDT??? 1 GRTMD101???? USA???? 13??????? 130101?? ?4-Dec-12? 2 GRTMD101???? USA???? 13?????? ?130102??? 4-Dec-12??? 3 GRTMD101???? USA???? 13??????? 130103??? 4-Dec-12?? 4 GRTMD101???? USA????? 6??????? ?60101??? ??5-Dec-12? 5 GRTMD101???? USA????? 5??????? ?50101??????5-Dec-12??? 6 GRTMD101???? USA???? 13??????? 130104??? 6-Dec-12 So I will need to count number so dates under 'SCRNDT' for each unique 'SITEID' that falls into the following 'DT_ETP' ?STUDYID?????? DT_ETP??? TMPT 1 GRTMD101?? 9-Nov-12?? ? 1??????? 2 GRTMD101 ?16-Nov-12??? 2???? 3 GRTMD101 ?23-Nov-12??? 3??????? 4 GRTMD101 ?30-Nov-12??? 4???? 5 GRTMD101 ? 7-Dec-12? ?? 5?????? 6 GRTMD101 ?14-Dec-12??? 6??? For example, for site 13, the 1st and 2nd SCRNDT both are 4-Dec-12 that falls into the week from '30-Nov-12' to '7-Dec-12', then in the result dataset, it should say, 2 under the variable 'week 4', and the result data frame should look like this: SITEID WEEK1 WEEK2 WEEK3 WEEK4.......WEEK64?? (The values are just an example) ???? 1?????????? 0??????????? 0????????? 1??????????? 2???????????????? 3 ???? 2?????????? 1??????????? 2??????????3??????????? 5???????????????? 6 ???? .???????????? .??????????? .?????????? .???????????? .???????????????? . Here is a code I modified based on the one you sent me, but quite intuitively, and R returns an empty vector to me, lol, ?library(plyr) ?res <- ddply(a,.(COUNTRY, SITEID), function(x) { ?????? ???? x$SCRNDT <-as.Date(x$SCRNDT,"%d-%b-%y") ???????????????????????????? ? unsplit(lapply(split(b,b$TMPT),function(y) { ????????numweek<-as.numeric(length(unique(b$TMPT))) ????????for (i in 1:numweek) { ???????????????????????????????????????? ????y$DT_ETP <- as.Date(y$DT_ETP, "%d-%b-%y") ???????????????????????????????????????? ????sum(x$SCRNDT > y$DT_ETP[i] & x$SCRNDT <= y$DT_ETP[i+1])}}),? ???????????? b$TMPT)}) I really hope you can teach me on this one again! Thank you so much!! W 2013/10/15 arun <smartpink111 at yahoo.com> HI Weijia,>No problem. >Let me know if it works.? One slight modification as I noticed that your weeks are not overlapping >#change > > > >sum(x$Date > y$from & x$Date <= y$to) > >#to >sum(x$Date >= y$from & x$Date <= y$to > >Regards, >Arun > > > > > > >On Tuesday, October 15, 2013 8:14 PM, Weijia wang <zeleehom at gmail.com> wrote: >Thank you Arun! I will try it and get back to you! You are amazing! > >Weijia Wang > > >> On Oct 15, 2013, at 8:00 PM, arun <smartpink111 at yahoo.com> wrote: >> >> Hi, >> >> Please use ?dput() to show the dataset.? Also, it is not clear about how you store the time interval. >> dat <- read.table(text=" >> GroupID? ? ? ? Date >> 1? ? ? 1? ? ? 10-Dec-12 >> 2? ? ? 1? ? ? 11-Dec-12 >> 3? ? ? 2? ? ? 13-Dec-12 >> 4? ? ? 2? ? ? 15-Dec-12 >> 5? ? ? 3? ? ? 06-Dec-12 >> 6? ? ? 3? ? ? 19-Dec-12",sep="",header=TRUE,stringsAsFactors=FALSE) >> >> >> >>? dat2 <- data.frame(Week=1:2, from= c("9-Dec-12", "16-Dec-12"), to=c("15-Dec-12","22-Dec-12"),stringsAsFactors=FALSE) >> >> #Check ?findInterval() >> >> res <- t(sapply(split(dat,dat$GroupID), function(x) { >>? ? ? ? ? ? ? ? ? ? ? ? ? ? ?x$Date <-as.Date(x$Date,"%d-%b-%y") >>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsplit(lapply(split(dat2,dat2$Week),function(y) { >>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?y$from <- as.Date(y$from, "%d-%b-%y") >>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y$to <- as.Date(y$to, "%d-%b-%y") >>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? sum(x$Date > y$from & x$Date <= y$to)}), >>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?dat2$Week) >>? ? ? ? ? ? ? ? ? ? ? ? ? ?})) >> >> colnames(res) <- paste0("Week",1:2) >>? res >> #? Week1 Week2 >> #1? ? ?2? ? ?0 >> #2? ? ?2? ? ?0 >> #3? ? ?0? ? ?1 >> >> >> A.K. >> >> >> >> >> >> >> >> >> >> >> On Tuesday, October 15, 2013 6:24 PM, Weijia Wang <zeleehom at gmail.com> wrote: >> Hi, I have something very interesting: >> >> Say I have this: >> >> GroupID? ? ? ? ?Date >> 1? ? ? 1? ? ? ?10-Dec-12 >> 2? ? ? 1? ? ? ?11-Dec-12 >> 3? ? ? 2? ? ? ?13-Dec-12 >> 4? ? ? 2? ? ? ?15-Dec-12 >> 5? ? ? 3? ? ? ?06-Dec-12 >> 6? ? ? 3? ? ? ?19-Dec-12 >> >> Now, I have time interval, >> >> week 1: from 9-Dec-12 to 15-Dec-12, >> >> week 2: from 16-Dec-12 to 22-Dec-12, and so on. >> >> Obviously, the 1st, 2nd, 3rd, 4th row falls to week 1, 5th rows should not >> be counted, 6th row falls into week2. >> >> Therefore, by GroupID, I will have >> >> GroupID=1, Week1=2, Week2=0 >> GroupID=2, Week1=2, Week2=0 >> GroupID=3, Week1=0, Week2=1. >> >> I just want to count the valid date that falls into a 7-day week interval, >> and I shall have new variables for EACH WEEK, and the counts for dates that >> fall into this week interval. >> >> Can anyone please help me on programming this? >> >> W >> >>? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- --Sent via Gmail Weijia Wang Division of Epidemiology Department of Population Health, School of Medicine New York University, NY, 10016
Hi Weijia, This will give you the rownames of the split variables. lst1 <- split(a,list(a$COUNTRY,a$SITEID)) ?res <- t(sapply(lst1,function(x) { ??? ??? ??? ??? x$SCRNDT <- as.Date(x$SCRNDT, "%d-%b-%y") ??? ??? ??? ??? unlist(lapply(split(b,b$TMPT),function(y){ ??? ??? ??? ??? ? sum(x$SCRNDT >= y$DT_ETP & x$SCRNDT <= y$DT_END) ??? ??? ??? ??? ??? })) ??? ??? ??? ??? ?})) ?colnames(res) <- paste0("WEEK",colnames(res)) ?res[5:8,1:4] #????? WEEK1 WEEK2 WEEK3 WEEK4 #USA.5???? 0???? 0???? 0???? 4 #USA.6???? 0???? 0???? 0???? 1 #USA.8???? 0???? 0???? 0???? 0 #USA.9???? 0???? 0???? 0???? 0 A.K. On , arun <smartpink111 at yahoo.com> wrote: Yes, Sorry a typo.? I was in a hurry when I sent it. On Wednesday, October 16, 2013 2:32 PM, Weijia Wang <zeleehom at gmail.com> wrote: Arun I think you mean ?sum(x$SCRNDT>= y$DT_ETP & x$SCRNDT <= y$DT_END) instead of DT_ETP at the end of this line, right? W 2013/10/16 arun <smartpink111 at yahoo.com> HI Weijia,> >Please check whether this is what you wanted. > >Weijia <- load("/home/arunksa111/Downloads/arun_help.RData" ) >a[sapply(a,is.factor)] <-lapply(a[sapply(a,is.factor)],as.character) >str(a) >?b[sapply(b,is.factor)] <- lapply(b[sapply(b,is.factor)],as.character) >str(b) >?b$DT_ETP <- as.Date(b$DT_ETP,"%d-%b-%y") >?b$DT_END <- c(b$DT_ETP[-1]-1, b$DT_ETP[length(b$DT_ETP)]+6) >res <- do.call(rbind,lapply(split(a,list(a$COUNTRY,a$SITEID),drop=TRUE),function(x){ >x$SCRNDT <- as.Date(x$SCRNDT, "%d-%b-%y") >do.call(cbind,lapply(split(b,b$TMPT),function(y) { >?sum(x$SCRNDT>= y$DT_ETP & x$SCRNDT <= y$DT_ETP) >})) >?})) > >?colnames(res) <- paste0("WEEK",colnames(res)) > > > >A.K. > > > > > >On Wednesday, October 16, 2013 10:48 AM, Weijia Wang <zeleehom at gmail.com> wrote: > >Hi, Arun > >Here I attached a R object with two dataframes. > >The first one is the one I need to count the number of dates that fall into a certain week interval. > >For example, > >?? STUDYID COUNTRY SITEID SUBJID?? SCRNDT??? >1 GRTMD101???? USA???? 13??????? 130101?? ?4-Dec-12? >2 GRTMD101???? USA???? 13?????? ?130102??? 4-Dec-12??? >3 GRTMD101???? USA???? 13??????? 130103??? 4-Dec-12?? >4 GRTMD101???? USA????? 6??????? ?60101??? ??5-Dec-12? >5 GRTMD101???? USA????? 5??????? ?50101??????5-Dec-12??? >6 GRTMD101???? USA???? 13??????? 130104??? 6-Dec-12 > >So I will need to count number so dates under 'SCRNDT' for each unique 'SITEID' that falls into the >following 'DT_ETP' > >?STUDYID?????? DT_ETP??? TMPT >1 GRTMD101?? 9-Nov-12?? ? 1??????? >2 GRTMD101 ?16-Nov-12??? 2???? >3 GRTMD101 ?23-Nov-12??? 3??????? >4 GRTMD101 ?30-Nov-12??? 4???? >5 GRTMD101 ? 7-Dec-12? ?? 5?????? >6 GRTMD101 ?14-Dec-12??? 6??? > >For example, for site 13, the 1st and 2nd SCRNDT both are 4-Dec-12 that falls into the week from '30-Nov-12' to '7-Dec-12', then in the result dataset, it should say, 2 under the variable 'week 4', and the result data frame should look like this: > >SITEID WEEK1 WEEK2 WEEK3 WEEK4.......WEEK64?? (The values are just an example) >???? 1?????????? 0??????????? 0????????? 1??????????? 2???????????????? 3 >???? 2?????????? 1??????????? 2??????????3??????????? 5???????????????? 6 >???? .???????????? .??????????? .?????????? .???????????? .???????????????? . > >Here is a code I modified based on the one you sent me, but quite intuitively, and R returns an empty vector to me, lol, > > >?library(plyr) >?res <- ddply(a,.(COUNTRY, SITEID), function(x) { >?????? ???? x$SCRNDT <-as.Date(x$SCRNDT,"%d-%b-%y") >???????????????????????????? ? unsplit(lapply(split(b,b$TMPT),function(y) { >????????numweek<-as.numeric(length(unique(b$TMPT))) >????????for (i in 1:numweek) { >???????????????????????????????????????? ????y$DT_ETP <- as.Date(y$DT_ETP, "%d-%b-%y") >???????????????????????????????????????? ????sum(x$SCRNDT > y$DT_ETP[i] & x$SCRNDT <= y$DT_ETP[i+1])}}),? >???????????? b$TMPT)}) > > > > >I really hope you can teach me on this one again! Thank you so much!! > >W > > > > > >2013/10/15 arun <smartpink111 at yahoo.com> > >HI Weijia, >>No problem. >>Let me know if it works.? One slight modification as I noticed that your weeks are not overlapping >>#change >> >> >> >>sum(x$Date > y$from & x$Date <= y$to) >> >>#to >>sum(x$Date >= y$from & x$Date <= y$to >> >>Regards, >>Arun >> >> >> >> >> >> >>On Tuesday, October 15, 2013 8:14 PM, Weijia wang <zeleehom at gmail.com> wrote: >>Thank you Arun! I will try it and get back to you! You are amazing! >> >>Weijia Wang >> >> >>> On Oct 15, 2013, at 8:00 PM, arun <smartpink111 at yahoo.com> wrote: >>> >>> Hi, >>> >>> Please use ?dput() to show the dataset.? Also, it is not clear about how you store the time interval. >>> dat <- read.table(text=" >>> GroupID? ? ? ? Date >>> 1? ? ? 1? ? ? 10-Dec-12 >>> 2? ? ? 1? ? ? 11-Dec-12 >>> 3? ? ? 2? ? ? 13-Dec-12 >>> 4? ? ? 2? ? ? 15-Dec-12 >>> 5? ? ? 3? ? ? 06-Dec-12 >>> 6? ? ? 3? ? ? 19-Dec-12",sep="",header=TRUE,stringsAsFactors=FALSE) >>> >>> >>> >>>? dat2 <- data.frame(Week=1:2, from= c("9-Dec-12", "16-Dec-12"), to=c("15-Dec-12","22-Dec-12"),stringsAsFactors=FALSE) >>> >>> #Check ?findInterval() >>> >>> res <- t(sapply(split(dat,dat$GroupID), function(x) { >>>? ? ? ? ? ? ? ? ? ? ? ? ? ? ?x$Date <-as.Date(x$Date,"%d-%b-%y") >>>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsplit(lapply(split(dat2,dat2$Week),function(y) { >>>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?y$from <- as.Date(y$from, "%d-%b-%y") >>>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? y$to <- as.Date(y$to, "%d-%b-%y") >>>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? sum(x$Date > y$from & x$Date <= y$to)}), >>>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?dat2$Week) >>>? ? ? ? ? ? ? ? ? ? ? ? ? ?})) >>> >>> colnames(res) <- paste0("Week",1:2) >>>? res >>> #? Week1 Week2 >>> #1? ? ?2? ? ?0 >>> #2? ? ?2? ? ?0 >>> #3? ? ?0? ? ?1 >>> >>> >>> A.K. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tuesday, October 15, 2013 6:24 PM, Weijia Wang <zeleehom at gmail.com> wrote: >>> Hi, I have something very interesting: >>> >>> Say I have this: >>> >>> GroupID? ? ? ? ?Date >>> 1? ? ? 1? ? ? ?10-Dec-12 >>> 2? ? ? 1? ? ? ?11-Dec-12 >>> 3? ? ? 2? ? ? ?13-Dec-12 >>> 4? ? ? 2? ? ? ?15-Dec-12 >>> 5? ? ? 3? ? ? ?06-Dec-12 >>> 6? ? ? 3? ? ? ?19-Dec-12 >>> >>> Now, I have time interval, >>> >>> week 1: from 9-Dec-12 to 15-Dec-12, >>> >>> week 2: from 16-Dec-12 to 22-Dec-12, and so on. >>> >>> Obviously, the 1st, 2nd, 3rd, 4th row falls to week 1, 5th rows should not >>> be counted, 6th row falls into week2. >>> >>> Therefore, by GroupID, I will have >>> >>> GroupID=1, Week1=2, Week2=0 >>> GroupID=2, Week1=2, Week2=0 >>> GroupID=3, Week1=0, Week2=1. >>> >>> I just want to count the valid date that falls into a 7-day week interval, >>> and I shall have new variables for EACH WEEK, and the counts for dates that >>> fall into this week interval. >>> >>> Can anyone please help me on programming this? >>> >>> W >>> >>>? ? ?[[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > >-- > >--Sent via Gmail > >Weijia Wang >Division of Epidemiology > >Department of Population Health, School of Medicine >New York University, NY, 10016 >-- --Sent via Gmail Weijia Wang Division of Epidemiology Department of Population Health, School of Medicine New York University, NY, 10016