Dear R-users, I have a dataset with categories and numbers. I would like to compute and add cumulative numbers to the dataset. I do not understand the structure of by(...) or tapply(...) output enough to handle it. Here a small example -------------- d<-expand.grid(a=1:5,b=1:3,c=1:2) d$n = 10 * d$a + d$b +0.1* d$c Sn<-by(d$n,list(d$a,d$c),cumsum) str(Sn) --------- List of 10 $ : num [1:3] 11.1 23.2 36.3 $ : num [1:3] 21.1 43.2 66.3 $ : num [1:3] 31.1 63.2 96.3 $ : num [1:3] 41.1 83.2 126.3 $ : num [1:3] 51.1 103.2 156.3 $ : num [1:3] 11.2 23.4 36.6 $ : num [1:3] 21.2 43.4 66.6 $ : num [1:3] 31.2 63.4 96.6 $ : num [1:3] 41.2 83.4 126.6 $ : num [1:3] 51.2 103.4 156.6 - attr(*, "dim")= int [1:2] 5 2 - attr(*, "dimnames")=List of 2 ..$ : chr [1:5] "1" "2" "3" "4" ... ..$ : chr [1:2] "1" "2" - attr(*, "call")= language by.default(data = d$n, INDICES = list(d$a, d$c), FUN = cumsum) - attr(*, "class")= chr "by --------- # these give (a) lists of one numerical vector(a) Sn[5,2] Sn[cbind(d$a,d$c)] # how to access the individual cumsum values? # and assign them to d$Sn? -------------- Thanks, Gerrit. --- Gerrit Draisma Department of Public Health Erasmus MC, University Medical Center Rotterdam Room AE-235 P.O. Box 2040 3000 CA Rotterdam The Netherlands Phone: +31 10 7043787 Fax: +31 10 7038474 http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
Maybe 'ave' is what you were looking for:> d$cum <- ave(d$n, d$a, d$c, FUN = cumsum) > da b c n cum 1 1 1 1 11.1 11.1 2 2 1 1 21.1 21.1 3 3 1 1 31.1 31.1 4 4 1 1 41.1 41.1 5 5 1 1 51.1 51.1 6 1 2 1 12.1 23.2 7 2 2 1 22.1 43.2 8 3 2 1 32.1 63.2 9 4 2 1 42.1 83.2 10 5 2 1 52.1 103.2 11 1 3 1 13.1 36.3 12 2 3 1 23.1 66.3 13 3 3 1 33.1 96.3 14 4 3 1 43.1 126.3 15 5 3 1 53.1 156.3 16 1 1 2 11.2 11.2 17 2 1 2 21.2 21.2 18 3 1 2 31.2 31.2 19 4 1 2 41.2 41.2 20 5 1 2 51.2 51.2 21 1 2 2 12.2 23.4 22 2 2 2 22.2 43.4 23 3 2 2 32.2 63.4 24 4 2 2 42.2 83.4 25 5 2 2 52.2 103.4 26 1 3 2 13.2 36.6 27 2 3 2 23.2 66.6 28 3 3 2 33.2 96.6 29 4 3 2 43.2 126.6 30 5 3 2 53.2 156.6>On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma <gdraisma at xs4all.nl> wrote:> Dear R-users, > > I have a dataset with categories and numbers. > I would like to compute and add cumulative numbers > to the dataset. > I do not understand the structure of by(...) or > tapply(...) output enough to handle it. > > Here a small example > -------------- > d<-expand.grid(a=1:5,b=1:3,c=1:2) > d$n = 10 * d$a + d$b +0.1* d$c > Sn<-by(d$n,list(d$a,d$c),cumsum) > str(Sn) > --------- > List of 10 > ?$ : num [1:3] 11.1 23.2 36.3 > ?$ : num [1:3] 21.1 43.2 66.3 > ?$ : num [1:3] 31.1 63.2 96.3 > ?$ : num [1:3] ?41.1 ?83.2 126.3 > ?$ : num [1:3] ?51.1 103.2 156.3 > ?$ : num [1:3] 11.2 23.4 36.6 > ?$ : num [1:3] 21.2 43.4 66.6 > ?$ : num [1:3] 31.2 63.4 96.6 > ?$ : num [1:3] ?41.2 ?83.4 126.6 > ?$ : num [1:3] ?51.2 103.4 156.6 > ?- attr(*, "dim")= int [1:2] 5 2 > ?- attr(*, "dimnames")=List of 2 > ?..$ : chr [1:5] "1" "2" "3" "4" ... > ?..$ : chr [1:2] "1" "2" > ?- attr(*, "call")= language by.default(data = d$n, INDICES = list(d$a, > d$c), FUN = cumsum) > ?- attr(*, "class")= chr "by > --------- > # these give (a) lists of one numerical vector(a) > Sn[5,2] > Sn[cbind(d$a,d$c)] > # how to access the individual cumsum values? > # and assign them to d$Sn? > -------------- > > Thanks, > Gerrit. > > --- > Gerrit Draisma > Department of Public Health > Erasmus MC, University Medical Center Rotterdam > Room AE-235 > P.O. Box 2040 3000 CA ?Rotterdam The Netherlands > Phone: +31 10 7043787 Fax: +31 10 7038474 > http://mgzlx4.erasmusmc.nl/pwp/?gdraisma > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
You can also use 'split' to separate each group:> split(d, list(d$a, d$c))$`1.1` a b c n cum 1 1 1 1 11.1 11.1 6 1 2 1 12.1 23.2 11 1 3 1 13.1 36.3 $`2.1` a b c n cum 2 2 1 1 21.1 21.1 7 2 2 1 22.1 43.2 12 2 3 1 23.1 66.3 $`3.1` a b c n cum 3 3 1 1 31.1 31.1 8 3 2 1 32.1 63.2 13 3 3 1 33.1 96.3 $`4.1` a b c n cum 4 4 1 1 41.1 41.1 9 4 2 1 42.1 83.2 14 4 3 1 43.1 126.3 $`5.1` a b c n cum 5 5 1 1 51.1 51.1 10 5 2 1 52.1 103.2 15 5 3 1 53.1 156.3 $`1.2` a b c n cum 16 1 1 2 11.2 11.2 21 1 2 2 12.2 23.4 26 1 3 2 13.2 36.6 $`2.2` a b c n cum 17 2 1 2 21.2 21.2 22 2 2 2 22.2 43.4 27 2 3 2 23.2 66.6 $`3.2` a b c n cum 18 3 1 2 31.2 31.2 23 3 2 2 32.2 63.4 28 3 3 2 33.2 96.6 $`4.2` a b c n cum 19 4 1 2 41.2 41.2 24 4 2 2 42.2 83.4 29 4 3 2 43.2 126.6 $`5.2` a b c n cum 20 5 1 2 51.2 51.2 25 5 2 2 52.2 103.4 30 5 3 2 53.2 156.6>On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma <gdraisma at xs4all.nl> wrote:> Dear R-users, > > I have a dataset with categories and numbers. > I would like to compute and add cumulative numbers > to the dataset. > I do not understand the structure of by(...) or > tapply(...) output enough to handle it. > > Here a small example > -------------- > d<-expand.grid(a=1:5,b=1:3,c=1:2) > d$n = 10 * d$a + d$b +0.1* d$c > Sn<-by(d$n,list(d$a,d$c),cumsum) > str(Sn) > --------- > List of 10 > ?$ : num [1:3] 11.1 23.2 36.3 > ?$ : num [1:3] 21.1 43.2 66.3 > ?$ : num [1:3] 31.1 63.2 96.3 > ?$ : num [1:3] ?41.1 ?83.2 126.3 > ?$ : num [1:3] ?51.1 103.2 156.3 > ?$ : num [1:3] 11.2 23.4 36.6 > ?$ : num [1:3] 21.2 43.4 66.6 > ?$ : num [1:3] 31.2 63.4 96.6 > ?$ : num [1:3] ?41.2 ?83.4 126.6 > ?$ : num [1:3] ?51.2 103.4 156.6 > ?- attr(*, "dim")= int [1:2] 5 2 > ?- attr(*, "dimnames")=List of 2 > ?..$ : chr [1:5] "1" "2" "3" "4" ... > ?..$ : chr [1:2] "1" "2" > ?- attr(*, "call")= language by.default(data = d$n, INDICES = list(d$a, > d$c), FUN = cumsum) > ?- attr(*, "class")= chr "by > --------- > # these give (a) lists of one numerical vector(a) > Sn[5,2] > Sn[cbind(d$a,d$c)] > # how to access the individual cumsum values? > # and assign them to d$Sn? > -------------- > > Thanks, > Gerrit. > > --- > Gerrit Draisma > Department of Public Health > Erasmus MC, University Medical Center Rotterdam > Room AE-235 > P.O. Box 2040 3000 CA ?Rotterdam The Netherlands > Phone: +31 10 7043787 Fax: +31 10 7038474 > http://mgzlx4.erasmusmc.nl/pwp/?gdraisma > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Thanks Jim, "Ave" does what I wanted. It is simpler and probably more efficient than unlisting Sn as I tried. Still I remain puzzled with the structure of the by() or tapply() output and how to access the individual cumsums. Yes the split command is useful for checking the result. Gerrit. Op 12/7/2010 1:43 PM, jim holtman schreef:> Maybe 'ave' is what you were looking for: > >> d$cum<- ave(d$n, d$a, d$c, FUN = cumsum) >> d > a b c n cum > 1 1 1 1 11.1 11.1 > 2 2 1 1 21.1 21.1 > 3 3 1 1 31.1 31.1 > 4 4 1 1 41.1 41.1 > 5 5 1 1 51.1 51.1 > 6 1 2 1 12.1 23.2 > 7 2 2 1 22.1 43.2 > 8 3 2 1 32.1 63.2 > 9 4 2 1 42.1 83.2 > 10 5 2 1 52.1 103.2 > 11 1 3 1 13.1 36.3 > 12 2 3 1 23.1 66.3 > 13 3 3 1 33.1 96.3 > 14 4 3 1 43.1 126.3 > 15 5 3 1 53.1 156.3 > 16 1 1 2 11.2 11.2 > 17 2 1 2 21.2 21.2 > 18 3 1 2 31.2 31.2 > 19 4 1 2 41.2 41.2 > 20 5 1 2 51.2 51.2 > 21 1 2 2 12.2 23.4 > 22 2 2 2 22.2 43.4 > 23 3 2 2 32.2 63.4 > 24 4 2 2 42.2 83.4 > 25 5 2 2 52.2 103.4 > 26 1 3 2 13.2 36.6 > 27 2 3 2 23.2 66.6 > 28 3 3 2 33.2 96.6 > 29 4 3 2 43.2 126.6 > 30 5 3 2 53.2 156.6 >> > > > On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma<gdraisma at xs4all.nl> wrote: >> Dear R-users, >> >> I have a dataset with categories and numbers. >> I would like to compute and add cumulative numbers >> to the dataset. >> I do not understand the structure of by(...) or >> tapply(...) output enough to handle it. >> >> Here a small example >> -------------- >> d<-expand.grid(a=1:5,b=1:3,c=1:2) >> d$n = 10 * d$a + d$b +0.1* d$c >> Sn<-by(d$n,list(d$a,d$c),cumsum) >> str(Sn) >> --------- >> List of 10 >> $ : num [1:3] 11.1 23.2 36.3 >> $ : num [1:3] 21.1 43.2 66.3 >> $ : num [1:3] 31.1 63.2 96.3 >> $ : num [1:3] 41.1 83.2 126.3 >> $ : num [1:3] 51.1 103.2 156.3 >> $ : num [1:3] 11.2 23.4 36.6 >> $ : num [1:3] 21.2 43.4 66.6 >> $ : num [1:3] 31.2 63.4 96.6 >> $ : num [1:3] 41.2 83.4 126.6 >> $ : num [1:3] 51.2 103.4 156.6 >> - attr(*, "dim")= int [1:2] 5 2 >> - attr(*, "dimnames")=List of 2 >> ..$ : chr [1:5] "1" "2" "3" "4" ... >> ..$ : chr [1:2] "1" "2" >> - attr(*, "call")= language by.default(data = d$n, INDICES = list(d$a, >> d$c), FUN = cumsum) >> - attr(*, "class")= chr "by >> --------- >> # these give (a) lists of one numerical vector(a) >> Sn[5,2] >> Sn[cbind(d$a,d$c)] >> # how to access the individual cumsum values? >> # and assign them to d$Sn? >> -------------- >> >> Thanks, >> Gerrit. >> >> --- >> Gerrit Draisma >> Department of Public Health >> Erasmus MC, University Medical Center Rotterdam >> Room AE-235 >> P.O. Box 2040 3000 CA Rotterdam The Netherlands >> Phone: +31 10 7043787 Fax: +31 10 7038474 >> http://mgzlx4.erasmusmc.nl/pwp/?gdraisma >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > >