Benjamin Gillespie
2013-Oct-10 22:35 UTC
[R] Splitting times into groups based on a range of times
Hi all, I hope you can help with this one! I have a dataframe: 'df' that consists of a vector of times: 'dt2' and a vector of group id's: 'group': dates2=rep("01/02/13",times=8) times2=c("12:00:00","12:30:00","12:45:00","13:15:00","13:30:00","14:00:00","14:45:00","17:30:00") y =paste(dates2, times2) dt2=strptime(y, "%m/%d/%y %H:%M:%S") group=c(1,1,2,2,3,3,4,4) df=data.frame(dt2,group) I also have a vector: 'dt' which is a series of times: dates=rep("01/02/13",times=20) times=c("12:00:00","12:15:00","12:30:00","12:45:00","13:00:00","13:15:00","13:30:00","13:45:00","14:00:00","14:15:00","14:30:00","14:45:00","15:00:00","15:15:00","15:30:00","15:45:00","16:00:00","16:15:00","16:30:00","16:45:00","17:00:00","17:15:00","17:30:00","17:45:00") x =paste(dates, times) dt=strptime(x, "%m/%d/%y %H:%M:%S") I wish to create a vector which looks like 'id': id=c(1,1,1,2,2,2,3,3,3,0,0,4,4,4,4,4,4,4,4,4,4,4,4,0) The rules I wish to follow to create 'id' are: 1. If a value in 'dt' is either equal to, or, within the range of times within group x in dataframe 'df', then, the value in 'id' will equal x. So, for example, in 'df', group 4 is between the times of "14:45:00" and "17:30:00" on the "01/02/13". Thus, the 12th to 23rd value in 'id' equals 4 as these values correspond to times within 'dt' that are equal to and within the range of "14:45:00" and "17:30:00" on the "01/02/13". If this doesn't make sense, please ask, I'm not sure where to even start with this... possibly the 'cut' function? Many thanks in advance, Ben Gillespie, Research Postgraduate o-------------------------------------------------------------------o School of Geography, University of Leeds, Leeds, LS2 9JT o-------------------------------------------------------------------o Tel: +44(0)113 34 33345 Mob: +44(0)770 868 7641 o-------------------------------o http://www.geog.leeds.ac.uk/ o-------------------------------------o @RiversBenG o--------------o
Hi Ben, I would look into ?findInterval() or ?cut() for an easier solution. indx<- match(df[,1],as.POSIXct(dt)) ?indx2<- unique(df[,2]) lst1<- lapply(split(indx,((seq_along(indx)-1)%/%2)+1),function(x) seq(x[1], x[2])) ?res <- unlist(lapply(seq_along(lst1),function(i) { ??? ??? ??? ??? ??? ??? ??? val<-rep(indx2[i],length(lst1[[i]])) ???????????????????????????????????????????????????? names(val)<-lst1[[i]] ?????????????????????????????????????????????????????????? val ??? ??? ??? ??? ??? ??? ??? ? })) res1<-res[match(seq_along(dt),names(res))] ?res1[is.na(res1)]<- 0 ?names(res1)<- NULL ?res1 # [1] 1 1 1 2 2 2 3 3 3 0 0 4 4 4 4 4 4 4 4 4 4 4 4 0 identical(id,res1) #[1] TRUE On Thursday, October 10, 2013 8:10 PM, Benjamin Gillespie <gybrg at leeds.ac.uk> wrote: Hi all, I hope you can help with this one! I have a dataframe: 'df' that consists of a vector of times: 'dt2' and a vector of group id's: 'group': dates2=rep("01/02/13",times=8) times2=c("12:00:00","12:30:00","12:45:00","13:15:00","13:30:00","14:00:00","14:45:00","17:30:00") y =paste(dates2, times2) dt2=strptime(y, "%m/%d/%y %H:%M:%S") group=c(1,1,2,2,3,3,4,4) df=data.frame(dt2,group) I also have a vector: 'dt' which is a series of times: dates=rep("01/02/13",times=20) times=c("12:00:00","12:15:00","12:30:00","12:45:00","13:00:00","13:15:00","13:30:00","13:45:00","14:00:00","14:15:00","14:30:00","14:45:00","15:00:00","15:15:00","15:30:00","15:45:00","16:00:00","16:15:00","16:30:00","16:45:00","17:00:00","17:15:00","17:30:00","17:45:00") x =paste(dates, times) dt=strptime(x, "%m/%d/%y %H:%M:%S") I wish to create a vector which looks like 'id': id=c(1,1,1,2,2,2,3,3,3,0,0,4,4,4,4,4,4,4,4,4,4,4,4,0) The rules I wish to follow to create 'id' are: 1. If a value in 'dt' is either equal to, or, within the range of times within group x in dataframe 'df', then, the value in 'id' will equal x. So, for example, in 'df', group 4 is between the times of "14:45:00" and "17:30:00" on the "01/02/13". Thus, the 12th to 23rd value in 'id' equals 4 as these values correspond to times within 'dt' that are equal to and within the range of? "14:45:00" and "17:30:00" on the "01/02/13". If this doesn't make sense, please ask, I'm not sure where to even start with this... possibly the 'cut' function? Many thanks in advance, Ben Gillespie, Research Postgraduate o-------------------------------------------------------------------o School of Geography, University of Leeds, Leeds, LS2 9JT o-------------------------------------------------------------------o Tel: +44(0)113 34 33345 Mob: +44(0)770 868 7641 o-------------------------------o http://www.geog.leeds.ac.uk/ o-------------------------------------o @RiversBenG o--------------o ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Ben, It looks like the condition is not met in majority of the split elements.? So, when you create a dataframe with the a column with 0 element and another column with an element, it shows the Error message. data.frame(dt2=NULL,group=1) #Error in data.frame(dt2 = NULL, group = 1) : ?# arguments imply differing number of rows: 0, 1 You can do this: lst1 <- split(df.new,df.new$start.end.group) lst2 <- lapply(lst1,function(x){dt2=dt.new[dt.new>=x[1,1] & dt.new < x[length(x),1]]}) lst3 <- lst2[lapply(lst2,length)>0] df1.new <- do.call(rbind,lapply(lst1[names(lst1)%in% names(lst3)],function(x) {data.frame(dt2=dt.new[dt.new>= x[1,1] & dt.new < x[length(x),1]],group=x[1,2])})) #You could also do this from `lst3` and create groups as the names of the list elements as both are the same. ?head(df1.new) #???????????????????? dt2 group #61.1 2012-08-02 19:16:14??? 61 #61.2 2012-08-02 19:18:14??? 61 #61.3 2012-08-02 19:20:14??? 61 #61.4 2012-08-02 19:22:14??? 61 #61.5 2012-08-02 19:24:14??? 61 #61.6 2012-08-02 19:26:14??? 61 ?tail(df1.new[df1.new$group==61,],2) #?????????????????????? dt2 group #61.366 2012-08-03 07:26:14??? 61 #61.367 2012-08-03 07:28:14??? 61 lst1[[61]] ? # ???????????? dt2.new start.end.group #61? 2012-08-02 19:15:00????????????? 61 #200 2012-08-03 07:30:00????????????? 61 ? A.K. On Sunday, October 13, 2013 3:55 PM, Benjamin Gillespie <gybrg at leeds.ac.uk> wrote: Hi Arun, This is great - it works perfectly for the data I provided you. However, I've spent almost all today trying to apply it to my real world dataset... and for some reason I keep getting the error: "Error in data.frame(dt2 = dt.new[dt.new >= x[1, 1] & dt.new < x[length(x),? : arguments imply differing number of rows: 0, 1" when trying to build df1. It's quite odd and I can't figure out why!! I have attached my script file and two data files. logger2.csv is used to create 'df' and discharge.csv is used to define the 'floods' (it includes river discharge data) which are then assigned ID's. As before, I want to then assign these flood id's to the relevant times in 'df'.