With a data.frame sorted by id, with ties broken by date, as in your example, you can select rows that are either the start of a new id group or the start of run of consecutive dates with:> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0) > which(w)[1] 1 4 5 7> uci[w,]id date value 1 1 2005-10-28 1 4 1 2005-11-07 3 5 1 2007-03-19 1 7 2 2004-06-02 2 I'll leave it to you to translate that R syntax into data.table syntax - it just involves comparing the current row with the previous row. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at hotmail.com> wrote:> Dear R users, > > I usually work with data.table package, but I'm sure that muy question can also be answered working with R data frame. > Working with grouped data (by "id"), I wonder if it is possible to keep in a R data.frame (or R data.table): > a) Only the first row if there is a row which belongs to a a group of rows (from same "id") that have consecutive dates. > b) All the rows which do not belong to the above groups. > > As an example, I have "uci" data.frame: > > uci <- data.table(id=c(rep(1,6),2), > date = as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")), > value = c(1, 2, 1, 3, 1, 2, 2)) > > id date value > 1 2005-10-28 1 > 1 2005-10-29 2 > 1 2005-10-30 1 > 1 2005-11-07 3 > 1 2007-03-19 1 > 1 2007-03-20 2 > 2 2004-06-02 2 > > And the desired output would be: > > id date value > 1 2005-10-28 1 > 1 2005-11-07 3 > 1 2007-03-19 1 > 2 2004-06-02 2 > > # From the following link, I have tried: > http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the > > setDT(uci)[ ,list(date=date[1L], value = value[1L]), by = .(ind=rleid(date), id)][, ind:=NULL][] > > But I get the same data frame, and I do not know the reason. > > Thank you very much for any help!! > > Frank S. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> On Dec 4, 2015, at 1:10 PM, William Dunlap <wdunlap at tibco.com> wrote: > > With a data.frame sorted by id, with ties broken by date, as in > your example, you can select rows that are either the start > of a new id group or the start of run of consecutive dates with: > >> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0) >> which(w) > [1] 1 4 5 7 >> uci[w,] > id date value > 1 1 2005-10-28 1 > 4 1 2005-11-07 3 > 5 1 2007-03-19 1 > 7 2 2004-06-02 2 > > I'll leave it to you to translate that R syntax into data.table syntax - > it just involves comparing the current row with the previous row. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at hotmail.com> wrote: >> Dear R users, >> >> I usually work with data.table package, but I'm sure that muy question can also be answered working with R data frame. >> Working with grouped data (by "id"), I wonder if it is possible to keep in a R data.frame (or R data.table): >> a) Only the first row if there is a row which belongs to a a group of rows (from same "id") that have consecutive dates. >> b) All the rows which do not belong to the above groups. >> >> As an example, I have "uci" data.frame: >> >> uci <- data.table(id=c(rep(1,6),2), >> date = as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")), >> value = c(1, 2, 1, 3, 1, 2, 2)) >> >> id date value >> 1 2005-10-28 1 >> 1 2005-10-29 2 >> 1 2005-10-30 1 >> 1 2005-11-07 3 >> 1 2007-03-19 1 >> 1 2007-03-20 2 >> 2 2004-06-02 2 >> >> And the desired output would be: >> >> id date value >> 1 2005-10-28 1 >> 1 2005-11-07 3 >> 1 2007-03-19 1 >> 2 2004-06-02 2The syntax of `[.data.table` is a bit odd; You can refer to columns by name; I never trust my intuition, though. Selection is usually done with a logical vector in the ?i?-position. The diff operator does succeed in the ?i? position with the obvious need to prepend with a starting value..> uci[ c(0,diff(date))!=1, ]id date value 1: 1 2005-10-28 1 2: 1 2005-11-07 3 3: 1 2007-03-19 1 4: 2 2004-06-02 2 The other cases are handle with the converse-expression> uci[c(0,diff(date)) == 1, ]id date value 1: 1 2005-10-29 2 2: 1 2005-10-30 1 3: 1 2007-03-20 2>> >> # From the following link, I have tried: >> http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the >> >> setDT(uci)[ ,list(date=date[1L], value = value[1L]), by = .(ind=rleid(date), id)][, ind:=NULL][] >> >> But I get the same data frame, and I do not know the reason. >> >> Thank you very much for any help!! >> >> Frank S. >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Many thanks to: William Dunlap, Dennis Murphy and David Winsemius for your quick and efficient answers!! Best regards, Frank S.> Subject: Re: [R] Keep only first date from consecutive dates > From: dwinsemius at comcast.net > Date: Fri, 4 Dec 2015 16:34:38 -0800 > CC: f_j_rod at hotmail.com; r-help at r-project.org > To: wdunlap at tibco.com > > > > On Dec 4, 2015, at 1:10 PM, William Dunlap <wdunlap at tibco.com> wrote: > > > > With a data.frame sorted by id, with ties broken by date, as in > > your example, you can select rows that are either the start > > of a new id group or the start of run of consecutive dates with: > > > >> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0) > >> which(w) > > [1] 1 4 5 7 > >> uci[w,] > > id date value > > 1 1 2005-10-28 1 > > 4 1 2005-11-07 3 > > 5 1 2007-03-19 1 > > 7 2 2004-06-02 2 > > > > I'll leave it to you to translate that R syntax into data.table syntax - > > it just involves comparing the current row with the previous row. > > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com > > > > > > On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at hotmail.com> wrote: > >> Dear R users, > >> > >> I usually work with data.table package, but I'm sure that muy question can also be answered working with R data frame. > >> Working with grouped data (by "id"), I wonder if it is possible to keep in a R data.frame (or R data.table): > >> a) Only the first row if there is a row which belongs to a a group of rows (from same "id") that have consecutive dates. > >> b) All the rows which do not belong to the above groups. > >> > >> As an example, I have "uci" data.frame: > >> > >> uci <- data.table(id=c(rep(1,6),2), > >> date = as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")), > >> value = c(1, 2, 1, 3, 1, 2, 2)) > >> > >> id date value > >> 1 2005-10-28 1 > >> 1 2005-10-29 2 > >> 1 2005-10-30 1 > >> 1 2005-11-07 3 > >> 1 2007-03-19 1 > >> 1 2007-03-20 2 > >> 2 2004-06-02 2 > >> > >> And the desired output would be: > >> > >> id date value > >> 1 2005-10-28 1 > >> 1 2005-11-07 3 > >> 1 2007-03-19 1 > >> 2 2004-06-02 2 > > The syntax of `[.data.table` is a bit odd; You can refer to columns by name; I never trust my intuition, though. > > Selection is usually done with a logical vector in the ?i?-position. The diff operator does succeed in the ?i? position with the obvious need to prepend with a starting value.. > > > uci[ c(0,diff(date))!=1, ] > id date value > 1: 1 2005-10-28 1 > 2: 1 2005-11-07 3 > 3: 1 2007-03-19 1 > 4: 2 2004-06-02 2 > > The other cases are handle with the converse-expression > > > uci[c(0,diff(date)) == 1, ] > id date value > 1: 1 2005-10-29 2 > 2: 1 2005-10-30 1 > 3: 1 2007-03-20 2 > > > >> > >> # From the following link, I have tried: > >> http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the > >> > >> setDT(uci)[ ,list(date=date[1L], value = value[1L]), by = .(ind=rleid(date), id)][, ind:=NULL][] > >> > >> But I get the same data frame, and I do not know the reason. > >> > >> Thank you very much for any help!! > >> > >> Frank S. > >> > >> > >> > >> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA >[[alternative HTML version deleted]]