This one should be easy but it's giving me a hard time mostly because tapply puts the results in a list. I want to calculate the cumulative sum of a variable in a dataframe, but with the accumulation only within each level of a factor. For a very simple example, take:> df <-data.frame(x=c(rep(1,5),rep(2,5),rep(3,5)),fac=gl(3,5,labels=letters[1:3]))> dfx fac 1 1 a 2 1 a 3 1 a 4 1 a 5 1 a 6 2 b 7 2 b 8 2 b 9 2 b 10 2 b 11 3 c 12 3 c 13 3 c 14 3 c 15 3 c I'd like to create another column in the dataframe so it looks like this, and make sure that the cumulative sums still match the right levels of the factor. I've included a "willdo" column that's just a cumulative sum, and an "ideal" column that's the cumulative sum minus the current value - the column headings are self explanatory.> answerx fac willdo ideal 1 1 a 1 0 2 1 a 2 1 3 1 a 3 2 4 1 a 4 3 5 1 a 5 4 6 2 b 2 0 7 2 b 4 2 8 2 b 6 4 9 2 b 8 6 10 2 b 10 8 11 3 c 3 0 12 3 c 6 3 13 3 c 9 6 14 3 c 12 9 15 3 c 15 12 [[alternative HTML version deleted]]
Just use ?unlist df$willdo <- unlist(tapply(df$x, df$fac, cumsum)) df$ideal <- df$willdo - df$x Levi Waldron wrote:> This one should be easy but it's giving me a hard time mostly because tapply > puts the results in a list. I want to calculate the cumulative sum of a > variable in a dataframe, but with the accumulation only within each level of > a factor. For a very simple example, take: > >> df <- > data.frame(x=c(rep(1,5),rep(2,5),rep(3,5)),fac=gl(3,5,labels=letters[1:3])) >> df > x fac > 1 1 a > 2 1 a > 3 1 a > 4 1 a > 5 1 a > 6 2 b > 7 2 b > 8 2 b > 9 2 b > 10 2 b > 11 3 c > 12 3 c > 13 3 c > 14 3 c > 15 3 c > > I'd like to create another column in the dataframe so it looks like this, > and make sure that the cumulative sums still match the right levels of the > factor. I've included a "willdo" column that's just a cumulative sum, and > an "ideal" column that's the cumulative sum minus the current value - the > column headings are self explanatory. > >> answer > x fac willdo ideal > 1 1 a 1 0 > 2 1 a 2 1 > 3 1 a 3 2 > 4 1 a 4 3 > 5 1 a 5 4 > 6 2 b 2 0 > 7 2 b 4 2 > 8 2 b 6 4 > 9 2 b 8 6 > 10 2 b 10 8 > 11 3 c 3 0 > 12 3 c 6 3 > 13 3 c 9 6 > 14 3 c 12 9 > 15 3 c 15 12 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Fri, 2008-06-27 at 16:52 -0400, Levi Waldron wrote:> This one should be easy but it's giving me a hard time mostly because tapply > puts the results in a list. I want to calculate the cumulative sum of a > variable in a dataframe, but with the accumulation only within each level of > a factor. For a very simple example, take:> df$willdo <- unlist(tapply(df$x, df$fac, cumsum)) > df$ideal <- df$willdo - df$x > dfx fac willdo ideal 1 1 a 1 0 2 1 a 2 1 3 1 a 3 2 4 1 a 4 3 5 1 a 5 4 6 2 b 2 0 7 2 b 4 2 8 2 b 6 4 9 2 b 8 6 10 2 b 10 8 11 3 c 3 0 12 3 c 6 3 13 3 c 9 6 14 3 c 12 9 15 3 c 15 12 HTH G> > > df <- > data.frame(x=c(rep(1,5),rep(2,5),rep(3,5)),fac=gl(3,5,labels=letters[1:3])) > > df > x fac > 1 1 a > 2 1 a > 3 1 a > 4 1 a > 5 1 a > 6 2 b > 7 2 b > 8 2 b > 9 2 b > 10 2 b > 11 3 c > 12 3 c > 13 3 c > 14 3 c > 15 3 c > > I'd like to create another column in the dataframe so it looks like this, > and make sure that the cumulative sums still match the right levels of the > factor. I've included a "willdo" column that's just a cumulative sum, and > an "ideal" column that's the cumulative sum minus the current value - the > column headings are self explanatory. > > > answer > x fac willdo ideal > 1 1 a 1 0 > 2 1 a 2 1 > 3 1 a 3 2 > 4 1 a 4 3 > 5 1 a 5 4 > 6 2 b 2 0 > 7 2 b 4 2 > 8 2 b 6 4 > 9 2 b 8 6 > 10 2 b 10 8 > 11 3 c 3 0 > 12 3 c 6 3 > 13 3 c 9 6 > 14 3 c 12 9 > 15 3 c 15 12 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gabor Grothendieck
2008-Jun-27 21:17 UTC
[R] cumulative sum of within levels of a dataframe
Try this: df$wildo <- ave(df$x, df$fac, FUN = cumsum) On Fri, Jun 27, 2008 at 4:52 PM, Levi Waldron <leviwaldron at gmail.com> wrote:> This one should be easy but it's giving me a hard time mostly because tapply > puts the results in a list. I want to calculate the cumulative sum of a > variable in a dataframe, but with the accumulation only within each level of > a factor. For a very simple example, take: > >> df <- > data.frame(x=c(rep(1,5),rep(2,5),rep(3,5)),fac=gl(3,5,labels=letters[1:3])) >> df > x fac > 1 1 a > 2 1 a > 3 1 a > 4 1 a > 5 1 a > 6 2 b > 7 2 b > 8 2 b > 9 2 b > 10 2 b > 11 3 c > 12 3 c > 13 3 c > 14 3 c > 15 3 c > > I'd like to create another column in the dataframe so it looks like this, > and make sure that the cumulative sums still match the right levels of the > factor. I've included a "willdo" column that's just a cumulative sum, and > an "ideal" column that's the cumulative sum minus the current value - the > column headings are self explanatory. > >> answer > x fac willdo ideal > 1 1 a 1 0 > 2 1 a 2 1 > 3 1 a 3 2 > 4 1 a 4 3 > 5 1 a 5 4 > 6 2 b 2 0 > 7 2 b 4 2 > 8 2 b 6 4 > 9 2 b 8 6 > 10 2 b 10 8 > 11 3 c 3 0 > 12 3 c 6 3 > 13 3 c 9 6 > 14 3 c 12 9 > 15 3 c 15 12 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >