thr3ads.net - R help - [R] applying cumsum within groups [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Morway, Eric

2015-Apr-03 12:17 UTC

[R] applying cumsum within groups

This small example will be applied to a problem with 1.4e6 lines of data.
First, here is the dataset and a few lines of R script, followed by an
explanation of what I'd like to get:

dat <- read.table(textConnection("ISEG  IRCH  val
 1    1   265
 1    2   260
 1    3   234
54   39   467
54   40   468
54   41   460
54   42   489
 1    1   265
 1    2   276
 1    3   217
54   39   456
54   40   507
54   41   483
54   42   457
 1    1   265
 1    2   287
 1    3   224
54   39   473
54   40   502
54   41   497
54   42   447
 1    1   230
 1    2   251
 1    3   199
54   39   439
54   40   474
54   41   477
54   42   413
 1    1   230
 1    2   262
 1    3   217
54   39   455
54   40   493
54   41   489
54   42   431
 1    1   1002
 1    2   1222
 1    3   1198
54   39   1876
54   40   1565
54   41   1455
54   42   1427
 1    1   1002
 1    2   1246
 1    3   1153
54   39   1813
54   40   1490
54   41   1518
54   42   1486
 1    1   1002
 1    2   1229
 1    3   1142
54   39   1797
54   40   1517
54   41   1527
54   42   1514"),header=TRUE)

dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
tmp <- diff(dat[dat$seq==1,]$val)!=0
dat$idx <- 0
dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
dat$ts <- cumsum(dat$idx)

At this point, I'd like to add one more column called "iter" that
counts up
by 1 based on "seq", but within each "ts".  So, the result
would look like
this (undoubtedly this is a simple problem with something like ddply, but
I've been unable to construct the R for it):

dat
 ISEG IRCH  val seq idx ts iter
    1    1  265   1   1  1    1
    1    2  260   0   0  1    1
    1    3  234   0   0  1    1
   54   39  467   0   0  1    1
   54   40  468   0   0  1    1
   54   41  460   0   0  1    1
   54   42  489   0   0  1    1
    1    1  265   1   0  1    2
    1    2  276   0   0  1    2
    1    3  217   0   0  1    2
   54   39  456   0   0  1    2
   54   40  507   0   0  1    2
   54   41  483   0   0  1    2
   54   42  457   0   0  1    2
    1    1  265   1   0  1    3
    1    2  287   0   0  1    3
    1    3  224   0   0  1    3
   54   39  473   0   0  1    3
   54   40  502   0   0  1    3
   54   41  497   0   0  1    3
   54   42  447   0   0  1    3
    1    1  230   1   1  2    1
    1    2  251   0   0  2    1
    1    3  199   0   0  2    1
   54   39  439   0   0  2    1
   54   40  474   0   0  2    1
   54   41  477   0   0  2    1
   54   42  413   0   0  2    1
    1    1  230   1   0  2    2
    1    2  262   0   0  2    2
    1    3  217   0   0  2    2
   54   39  455   0   0  2    2
   54   40  493   0   0  2    2
   54   41  489   0   0  2    2
   54   42  431   0   0  2    2
    1    1 1002   1   1  3    1
    1    2 1222   0   0  3    1
    1    3 1198   0   0  3    1
   54   39 1876   0   0  3    1
   54   40 1565   0   0  3    1
   54   41 1455   0   0  3    1
   54   42 1427   0   0  3    1
    1    1 1002   1   0  3    2
    1    2 1246   0   0  3    2
    1    3 1153   0   0  3    2
   54   39 1813   0   0  3    2
   54   40 1490   0   0  3    2
   54   41 1518   0   0  3    2
   54   42 1486   0   0  3    2
    1    1 1002   1   0  3    3
    1    2 1229   0   0  3    3
    1    3 1142   0   0  3    3
   54   39 1797   0   0  3    3
   54   40 1517   0   0  3    3
   54   41 1527   0   0  3    3
   54   42 1514   0   0  3    3

	[[alternative HTML version deleted]]

David Winsemius

2015-Apr-03 16:17 UTC

head link

[R] applying cumsum within groups

On Apr 3, 2015, at 5:17 AM, Morway, Eric wrote:
> This small example will be applied to a problem with 1.4e6 lines of data.
> First, here is the dataset and a few lines of R script, followed by an
> explanation of what I'd like to get:
> 
> dat <- read.table(textConnection("ISEG  IRCH  val
> 1    1   265
> 1    2   260
> 1    3   234
> 54   39   467
> 54   40   468
> 54   41   460
> 54   42   489
> 1    1   265
> 1    2   276
> 1    3   217
> 54   39   456
> 54   40   507
> 54   41   483
> 54   42   457
> 1    1   265
> 1    2   287
> 1    3   224
> 54   39   473
> 54   40   502
> 54   41   497
> 54   42   447
> 1    1   230
> 1    2   251
> 1    3   199
> 54   39   439
> 54   40   474
> 54   41   477
> 54   42   413
> 1    1   230
> 1    2   262
> 1    3   217
> 54   39   455
> 54   40   493
> 54   41   489
> 54   42   431
> 1    1   1002
> 1    2   1222
> 1    3   1198
> 54   39   1876
> 54   40   1565
> 54   41   1455
> 54   42   1427
> 1    1   1002
> 1    2   1246
> 1    3   1153
> 54   39   1813
> 54   40   1490
> 54   41   1518
> 54   42   1486
> 1    1   1002
> 1    2   1229
> 1    3   1142
> 54   39   1797
> 54   40   1517
> 54   41   1527
> 54   42   1514"),header=TRUE)
> 
> dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
> tmp <- diff(dat[dat$seq==1,]$val)!=0
> dat$idx <- 0
> dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
> dat$ts <- cumsum(dat$idx)
> 
> At this point, I'd like to add one more column called "iter"
that counts up
> by 1 based on "seq", but within each "ts".  So, the
result would look like
> this (undoubtedly this is a simple problem with something like ddply, but
> I've been unable to construct the R for it):
> dat$iter2 <- ave(dat$seq, dat$ts,FUN=cumsum)
> dat   ISEG IRCH  val seq idx ts iter iter2
1     1    1  265   1   1  1  1_1     1
2     1    2  260   0   0  1  1_1     1
3     1    3  234   0   0  1  1_1     1
4    54   39  467   0   0  1  1_1     1
5    54   40  468   0   0  1  1_1     1
6    54   41  460   0   0  1  1_1     1
7    54   42  489   0   0  1  1_1     1
8     1    1  265   1   0  1  1_2     2
9     1    2  276   0   0  1  1_2     2
10    1    3  217   0   0  1  1_2     2
11   54   39  456   0   0  1  1_2     2
12   54   40  507   0   0  1  1_2     2
13   54   41  483   0   0  1  1_2     2
14   54   42  457   0   0  1  1_2     2
15    1    1  265   1   0  1  1_3     3
16    1    2  287   0   0  1  1_3     3
17    1    3  224   0   0  1  1_3     3
18   54   39  473   0   0  1  1_3     3
19   54   40  502   0   0  1  1_3     3
20   54   41  497   0   0  1  1_3     3
21   54   42  447   0   0  1  1_3     3
22    1    1  230   1   1  2  2_4     1
23    1    2  251   0   0  2  2_4     1
snipped----->

-- 
David> 
> dat
> ISEG IRCH  val seq idx ts iter
>    1    1  265   1   1  1    1
>    1    2  260   0   0  1    1
>    1    3  234   0   0  1    1
>   54   39  467   0   0  1    1
>   54   40  468   0   0  1    1
>   54   41  460   0   0  1    1
>   54   42  489   0   0  1    1
>    1    1  265   1   0  1    2
>    1    2  276   0   0  1    2
>    1    3  217   0   0  1    2
>   54   39  456   0   0  1    2
>   54   40  507   0   0  1    2
>   54   41  483   0   0  1    2
>   54   42  457   0   0  1    2
>    1    1  265   1   0  1    3
>    1    2  287   0   0  1    3
>    1    3  224   0   0  1    3
>   54   39  473   0   0  1    3
>   54   40  502   0   0  1    3
>   54   41  497   0   0  1    3
>   54   42  447   0   0  1    3
>    1    1  230   1   1  2    1
>    1    2  251   0   0  2    1
>    1    3  199   0   0  2    1
>   54   39  439   0   0  2    1
>   54   40  474   0   0  2    1
>   54   41  477   0   0  2    1
>   54   42  413   0   0  2    1
>    1    1  230   1   0  2    2
>    1    2  262   0   0  2    2
>    1    3  217   0   0  2    2
>   54   39  455   0   0  2    2
>   54   40  493   0   0  2    2
>   54   41  489   0   0  2    2
>   54   42  431   0   0  2    2
>    1    1 1002   1   1  3    1
>    1    2 1222   0   0  3    1
>    1    3 1198   0   0  3    1
>   54   39 1876   0   0  3    1
>   54   40 1565   0   0  3    1
>   54   41 1455   0   0  3    1
>   54   42 1427   0   0  3    1
>    1    1 1002   1   0  3    2
>    1    2 1246   0   0  3    2
>    1    3 1153   0   0  3    2
>   54   39 1813   0   0  3    2
>   54   40 1490   0   0  3    2
>   54   41 1518   0   0  3    2
>   54   42 1486   0   0  3    2
>    1    1 1002   1   0  3    3
>    1    2 1229   0   0  3    3
>    1    3 1142   0   0  3    3
>   54   39 1797   0   0  3    3
>   54   40 1517   0   0  3    3
>   54   41 1527   0   0  3    3
>   54   42 1514   0   0  3    3
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

peter dalgaard

2015-Apr-03 17:12 UTC

head link

[R] applying cumsum within groups

ave() is your friend (unfortunately named as it may be):
> ave(dat$seq, dat$ts, FUN=cumsum) [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1
[39] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3

> On 03 Apr 2015, at 14:17 , Morway, Eric <emorway at usgs.gov> wrote:
> 
> This small example will be applied to a problem with 1.4e6 lines of data.
> First, here is the dataset and a few lines of R script, followed by an
> explanation of what I'd like to get:
> 
> dat <- read.table(textConnection("ISEG  IRCH  val
> 1    1   265
> 1    2   260
> 1    3   234
> 54   39   467
> 54   40   468
> 54   41   460
> 54   42   489
> 1    1   265
> 1    2   276
> 1    3   217
> 54   39   456
> 54   40   507
> 54   41   483
> 54   42   457
> 1    1   265
> 1    2   287
> 1    3   224
> 54   39   473
> 54   40   502
> 54   41   497
> 54   42   447
> 1    1   230
> 1    2   251
> 1    3   199
> 54   39   439
> 54   40   474
> 54   41   477
> 54   42   413
> 1    1   230
> 1    2   262
> 1    3   217
> 54   39   455
> 54   40   493
> 54   41   489
> 54   42   431
> 1    1   1002
> 1    2   1222
> 1    3   1198
> 54   39   1876
> 54   40   1565
> 54   41   1455
> 54   42   1427
> 1    1   1002
> 1    2   1246
> 1    3   1153
> 54   39   1813
> 54   40   1490
> 54   41   1518
> 54   42   1486
> 1    1   1002
> 1    2   1229
> 1    3   1142
> 54   39   1797
> 54   40   1517
> 54   41   1527
> 54   42   1514"),header=TRUE)
> 
> dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
> tmp <- diff(dat[dat$seq==1,]$val)!=0
> dat$idx <- 0
> dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
> dat$ts <- cumsum(dat$idx)
> 
> At this point, I'd like to add one more column called "iter"
that counts up
> by 1 based on "seq", but within each "ts".  So, the
result would look like
> this (undoubtedly this is a simple problem with something like ddply, but
> I've been unable to construct the R for it):
> 
> dat
> ISEG IRCH  val seq idx ts iter
>    1    1  265   1   1  1    1
>    1    2  260   0   0  1    1
>    1    3  234   0   0  1    1
>   54   39  467   0   0  1    1
>   54   40  468   0   0  1    1
>   54   41  460   0   0  1    1
>   54   42  489   0   0  1    1
>    1    1  265   1   0  1    2
>    1    2  276   0   0  1    2
>    1    3  217   0   0  1    2
>   54   39  456   0   0  1    2
>   54   40  507   0   0  1    2
>   54   41  483   0   0  1    2
>   54   42  457   0   0  1    2
>    1    1  265   1   0  1    3
>    1    2  287   0   0  1    3
>    1    3  224   0   0  1    3
>   54   39  473   0   0  1    3
>   54   40  502   0   0  1    3
>   54   41  497   0   0  1    3
>   54   42  447   0   0  1    3
>    1    1  230   1   1  2    1
>    1    2  251   0   0  2    1
>    1    3  199   0   0  2    1
>   54   39  439   0   0  2    1
>   54   40  474   0   0  2    1
>   54   41  477   0   0  2    1
>   54   42  413   0   0  2    1
>    1    1  230   1   0  2    2
>    1    2  262   0   0  2    2
>    1    3  217   0   0  2    2
>   54   39  455   0   0  2    2
>   54   40  493   0   0  2    2
>   54   41  489   0   0  2    2
>   54   42  431   0   0  2    2
>    1    1 1002   1   1  3    1
>    1    2 1222   0   0  3    1
>    1    3 1198   0   0  3    1
>   54   39 1876   0   0  3    1
>   54   40 1565   0   0  3    1
>   54   41 1455   0   0  3    1
>   54   42 1427   0   0  3    1
>    1    1 1002   1   0  3    2
>    1    2 1246   0   0  3    2
>    1    3 1153   0   0  3    2
>   54   39 1813   0   0  3    2
>   54   40 1490   0   0  3    2
>   54   41 1518   0   0  3    2
>   54   42 1486   0   0  3    2
>    1    1 1002   1   0  3    3
>    1    2 1229   0   0  3    3
>    1    3 1142   0   0  3    3
>   54   39 1797   0   0  3    3
>   54   40 1517   0   0  3    3
>   54   41 1527   0   0  3    3
>   54   42 1514   0   0  3    3
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Apr 2015 - applying cumsum within groups

[R] applying cumsum within groups

[R] applying cumsum within groups

[R] applying cumsum within groups