Hello everybody, Data is myd <- data.frame(id1=rep(c("a","b","c"),each=3),id2=rep(1:3,3),val=rnorm(9)) I want to get a cumulative sum over each of id1. trying aggregate does not work myd$pcum <- aggregate(myd[,c("val")],list(orig=myd$id1),cumsum) Please suggest a solution. In real the dataframe is huge so looping with for and subsetting is not a great idea (still doable, though). Thank you Stephen B [[alternative HTML version deleted]]
Stephen - In version R-2.11.1, I get> aggregate(myd[,c("val")],list(orig=myd$id1),cumsum)orig x.1 x.2 x.3 1 a -0.62754524 -1.16194135 -0.05975811 2 b 0.21954618 -0.21355521 -0.62970082 3 c -0.30296239 1.44111610 0.30121880 Since myd has several observations for each value of orig, you obviously can't just assign the output of aggregate as a column of myd. Were you thinking of merge(myd,aggregate(myd[,c("val")],list(orig=myd$id1),cumsum)) ? (Note that before version 2.11.1, aggregate *would* fail.)) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Tue, 12 Oct 2010, Bond, Stephen wrote:> Hello everybody, > > Data is > myd <- data.frame(id1=rep(c("a","b","c"),each=3),id2=rep(1:3,3),val=rnorm(9)) > > I want to get a cumulative sum over each of id1. trying aggregate does not work > > myd$pcum <- aggregate(myd[,c("val")],list(orig=myd$id1),cumsum) > > Please suggest a solution. In real the dataframe is huge so looping with for and subsetting is not a great idea (still doable, though). > Thank you > > > Stephen B > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Try ave instead of aggregate. If that does not do it, then look at the plyr package, probably the ddply function in that package. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Bond, Stephen > Sent: Tuesday, October 12, 2010 11:40 AM > To: r-help at r-project.org > Subject: [R] aggregate with cumsum > > Hello everybody, > > Data is > myd <- > data.frame(id1=rep(c("a","b","c"),each=3),id2=rep(1:3,3),val=rnorm(9)) > > I want to get a cumulative sum over each of id1. trying aggregate does > not work > > myd$pcum <- aggregate(myd[,c("val")],list(orig=myd$id1),cumsum) > > Please suggest a solution. In real the dataframe is huge so looping > with for and subsetting is not a great idea (still doable, though). > Thank you > > > Stephen B > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
On Oct 12, 2010, at 1:40 PM, Bond, Stephen wrote:> Hello everybody, > > Data is > myd <- > data.frame(id1=rep(c("a","b","c"),each=3),id2=rep(1:3,3),val=rnorm(9)) > > I want to get a cumulative sum over each of id1. trying aggregate > does not work > > myd$pcum <- aggregate(myd[,c("val")],list(orig=myd$id1),cumsum) >Use ave instead of aggregate: > ave(myd$val, list(myd$id1), FUN=cumsum) [1] 0.362123399 -1.538797831 -2.061733393 -2.038050242 -0.344382401 -1.365281650 [7] 0.391181119 -0.258668053 -0.007736216 myd$pcum <- ave(myd$val, list(myd$id1), FUN=cumsum)> Please suggest a solution. In real the dataframe is huge so looping > with for and subsetting is not a great idea (still doable, though). > Thank you > > > Stephen B-- David Winsemius, MD West Hartford, CT
On Tue, Oct 12, 2010 at 1:40 PM, Bond, Stephen <Stephen.Bond at cibc.com> wrote:> Hello everybody, > > Data is > myd <- data.frame(id1=rep(c("a","b","c"),each=3),id2=rep(1:3,3),val=rnorm(9)) > > I want to get a cumulative sum over each of id1. trying aggregate does not work > > myd$pcum <- aggregate(myd[,c("val")],list(orig=myd$id1),cumsum) > > Please suggest a solution. In real the dataframe is huge so looping with for and subsetting is not a great idea (still doable, though).Looping can be slow but its not necessarily so. Here are three approaches to using ave with cumsum to solve this problem. The benchmark shows that the loop is actually the fastest: N <- 1e4 k <- 10 myd <- data.frame(id1=rep(letters[1:k],each=N),id2=rep(1:k,N),val=rnorm(k*N)) library(rbenchmark) benchmark(order = "relative", replications = 100, loop = { loop <- myd for(i in 2:3) loop[, i] <- ave(myd[, i], myd[, 1], FUN = cumsum) }, nonloop1 = { nonloop1 <- transform(myd, id2 = ave(id2, id1, FUN = cumsum), val = ave(val, id1, FUN = cumsum) )}, nonloop2 = { f <- function(i) ave(myd[, i], myd[, 1], FUN = cumsum) nonloop2 <- replace(myd, 2:3, lapply(2:3, f)) } ) identical(loop, nonloop1) identical(loop, nonloop2) The output on my laptop is: test replications elapsed relative user.self sys.self user.child sys.child 1 loop 100 8.52 1.000000 8.07 0.10 NA NA 3 nonloop2 100 8.94 1.049296 8.29 0.17 NA NA 2 nonloop1 100 11.65 1.367371 10.71 0.22 NA NA -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On 25.02.2011 16:16, stephenb wrote:> > Bill, > > what will be the fastest way to output not just single lines but small data > frames of about 60 rows? > > I prefer writing to a text file because the final output is large 47k times > 60 rows and since I do not know the size of it I have to use rbind to build > the object which creates the memory problems described here: > > http://www.matthewckeller.com/html/memory.html > > look at the swiss cheese paragraph. > > kind regards > StephenStephen Bond, 1. You have written to the R-help mailing list rather than to "Bill". Please answer also directly: not all posters are on the mailing list. 2. Please quote the original question. Uwe Ligges