thr3ads.net - R help - [R] Keep only first date from consecutive dates [Dec 2015]

If this information is useful, please help other people find it:
Share via:

William Dunlap

2015-Dec-04 21:10 UTC

[R] Keep only first date from consecutive dates

With a data.frame sorted by id, with ties broken by date, as in
your example, you can select rows that are either the start
of a new id group or the start of run of consecutive dates with:
> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0)
> which(w)
[1] 1 4 5 7> uci[w,]  id       date value
1  1 2005-10-28     1
4  1 2005-11-07     3
5  1 2007-03-19     1
7  2 2004-06-02     2

I'll leave it to you to translate that R syntax into data.table syntax -
it just involves comparing the current row with the previous row.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at hotmail.com>
wrote:> Dear R users,
>
> I usually work with data.table package, but I'm sure that muy question
can also be answered working with R data frame.
> Working with grouped data (by "id"),  I wonder if it is possible
to keep in a R data.frame (or R data.table):
> a) Only the first row if there is a row which belongs to a a group of rows
(from same "id") that have consecutive dates.
> b) All the rows which do not belong to the above groups.
>
> As an example, I have "uci" data.frame:
>
> uci <- data.table(id=c(rep(1,6),2),
>                 date =
as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")),
>                 value = c(1, 2, 1, 3, 1, 2, 2))
>
>    id              date   value
>     1  2005-10-28        1
>     1  2005-10-29        2
>     1  2005-10-30        1
>     1  2005-11-07        3
>     1  2007-03-19        1
>     1  2007-03-20        2
>     2  2004-06-02        2
>
> And the desired output would be:
>
>    id              date   value
>     1  2005-10-28        1
>     1  2005-11-07        3
>     1  2007-03-19        1
>     2  2004-06-02        2
>
> # From the following link, I have tried:
>
http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the
>
> setDT(uci)[ ,list(date=date[1L], value = value[1L]),  by =
.(ind=rleid(date), id)][, ind:=NULL][]
>
> But I get the same data frame, and I do not know the reason.
>
> Thank you very much for any help!!
>
> Frank S.
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2015-Dec-05 00:34 UTC

head link

[R] Keep only first date from consecutive dates

> On Dec 4, 2015, at 1:10 PM, William Dunlap <wdunlap at tibco.com>
wrote:
> 
> With a data.frame sorted by id, with ties broken by date, as in
> your example, you can select rows that are either the start
> of a new id group or the start of run of consecutive dates with:
> 
>> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0)
>> which(w)
> [1] 1 4 5 7
>> uci[w,]
>  id       date value
> 1  1 2005-10-28     1
> 4  1 2005-11-07     3
> 5  1 2007-03-19     1
> 7  2 2004-06-02     2
> 
> I'll leave it to you to translate that R syntax into data.table syntax
-
> it just involves comparing the current row with the previous row.
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> 
> On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at hotmail.com>
wrote:
>> Dear R users,
>> 
>> I usually work with data.table package, but I'm sure that muy
question can also be answered working with R data frame.
>> Working with grouped data (by "id"),  I wonder if it is
possible to keep in a R data.frame (or R data.table):
>> a) Only the first row if there is a row which belongs to a a group of
rows (from same "id") that have consecutive dates.
>> b) All the rows which do not belong to the above groups.
>> 
>> As an example, I have "uci" data.frame:
>> 
>> uci <- data.table(id=c(rep(1,6),2),
>>                date =
as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")),
>>                value = c(1, 2, 1, 3, 1, 2, 2))
>> 
>>   id              date   value
>>    1  2005-10-28        1
>>    1  2005-10-29        2
>>    1  2005-10-30        1
>>    1  2005-11-07        3
>>    1  2007-03-19        1
>>    1  2007-03-20        2
>>    2  2004-06-02        2
>> 
>> And the desired output would be:
>> 
>>   id              date   value
>>    1  2005-10-28        1
>>    1  2005-11-07        3
>>    1  2007-03-19        1
>>    2  2004-06-02        2
The syntax of `[.data.table` is a bit odd; You can refer to columns by name; I
never trust my intuition, though.

Selection is usually done with a logical vector in the ?i?-position. The diff
operator does succeed in the ?i? position with the obvious need to prepend with
a starting value..
> uci[ c(0,diff(date))!=1, ]   id       date value
1:  1 2005-10-28     1
2:  1 2005-11-07     3
3:  1 2007-03-19     1
4:  2 2004-06-02     2

The other cases are handle with the converse-expression
> uci[c(0,diff(date)) == 1, ]   id       date value
1:  1 2005-10-29     2
2:  1 2005-10-30     1
3:  1 2007-03-20     2

>> 
>> # From the following link, I have tried:
>>
http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the
>> 
>> setDT(uci)[ ,list(date=date[1L], value = value[1L]),  by =
.(ind=rleid(date), id)][, ind:=NULL][]
>> 
>> But I get the same data frame, and I do not know the reason.
>> 
>> Thank you very much for any help!!
>> 
>> Frank S.
>> 
>> 
>> 
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Frank S.

2015-Dec-09 09:38 UTC

head link

[R] Keep only first date from consecutive dates

Many thanks to: William Dunlap, Dennis Murphy and David Winsemius for your quick
and efficient answers!!
 
Best regards,
 
Frank S.
 
 > Subject: Re: [R] Keep only first date from consecutive dates
> From: dwinsemius at comcast.net
> Date: Fri, 4 Dec 2015 16:34:38 -0800
> CC: f_j_rod at hotmail.com; r-help at r-project.org
> To: wdunlap at tibco.com
> 
> 
> > On Dec 4, 2015, at 1:10 PM, William Dunlap <wdunlap at
tibco.com> wrote:
> > 
> > With a data.frame sorted by id, with ties broken by date, as in
> > your example, you can select rows that are either the start
> > of a new id group or the start of run of consecutive dates with:
> > 
> >> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0)
> >> which(w)
> > [1] 1 4 5 7
> >> uci[w,]
> >  id       date value
> > 1  1 2005-10-28     1
> > 4  1 2005-11-07     3
> > 5  1 2007-03-19     1
> > 7  2 2004-06-02     2
> > 
> > I'll leave it to you to translate that R syntax into data.table
syntax -
> > it just involves comparing the current row with the previous row.
> > 
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> > 
> > 
> > On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at
hotmail.com> wrote:
> >> Dear R users,
> >> 
> >> I usually work with data.table package, but I'm sure that muy
question can also be answered working with R data frame.
> >> Working with grouped data (by "id"),  I wonder if it is
possible to keep in a R data.frame (or R data.table):
> >> a) Only the first row if there is a row which belongs to a a group
of rows (from same "id") that have consecutive dates.
> >> b) All the rows which do not belong to the above groups.
> >> 
> >> As an example, I have "uci" data.frame:
> >> 
> >> uci <- data.table(id=c(rep(1,6),2),
> >>                date =
as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")),
> >>                value = c(1, 2, 1, 3, 1, 2, 2))
> >> 
> >>   id              date   value
> >>    1  2005-10-28        1
> >>    1  2005-10-29        2
> >>    1  2005-10-30        1
> >>    1  2005-11-07        3
> >>    1  2007-03-19        1
> >>    1  2007-03-20        2
> >>    2  2004-06-02        2
> >> 
> >> And the desired output would be:
> >> 
> >>   id              date   value
> >>    1  2005-10-28        1
> >>    1  2005-11-07        3
> >>    1  2007-03-19        1
> >>    2  2004-06-02        2
> 
> The syntax of `[.data.table` is a bit odd; You can refer to columns by
name; I never trust my intuition, though.
> 
> Selection is usually done with a logical vector in the ?i?-position. The
diff operator does succeed in the ?i? position with the obvious need to prepend
with a starting value..
> 
> > uci[ c(0,diff(date))!=1, ]
>    id       date value
> 1:  1 2005-10-28     1
> 2:  1 2005-11-07     3
> 3:  1 2007-03-19     1
> 4:  2 2004-06-02     2
> 
> The other cases are handle with the converse-expression
> 
> > uci[c(0,diff(date)) == 1, ]
>    id       date value
> 1:  1 2005-10-29     2
> 2:  1 2005-10-30     1
> 3:  1 2007-03-20     2
> 
> 
> >> 
> >> # From the following link, I have tried:
> >>
http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the
> >> 
> >> setDT(uci)[ ,list(date=date[1L], value = value[1L]),  by =
.(ind=rleid(date), id)][, ind:=NULL][]
> >> 
> >> But I get the same data frame, and I do not know the reason.
> >> 
> >> Thank you very much for any help!!
> >> 
> >> Frank S.
> >> 
> >> 
> >> 
> >> 
> >> 
> >>        [[alternative HTML version deleted]]
> >> 
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
>  		 	   		  
	[[alternative HTML version deleted]]

R help - Dec 2015 - Keep only first date from consecutive dates

[R] Keep only first date from consecutive dates

[R] Keep only first date from consecutive dates

[R] Keep only first date from consecutive dates