I have a data frame called "pose": DESCRIPTION QUANITY CLOSING.PRICE 1 WHEAT May/10 1 467.75 2 WHEAT May/10 2 467.75 3 WHEAT May/10 1 467.75 4 WHEAT May/10 1 467.75 5 COTTON NO.2 May/10 1 78.13 6 COTTON NO.2 May/10 3 78.13 7 COTTON NO.2 May/10 1 78.13 I would like to sum the quantity for each category (i.e WHEAT and COTTON),but I have no idea how to write it in a simple manner. The number or rows will change every day. TY for any help.
Hi Arnaud, Try aggregate function regards M arnaud Gaboury a ?crit :> I have a data frame called "pose": > > > DESCRIPTION QUANITY CLOSING.PRICE > 1 WHEAT May/10 1 467.75 > 2 WHEAT May/10 2 467.75 > 3 WHEAT May/10 1 467.75 > 4 WHEAT May/10 1 467.75 > 5 COTTON NO.2 May/10 1 78.13 > 6 COTTON NO.2 May/10 3 78.13 > 7 COTTON NO.2 May/10 1 78.13 > > I would like to sum the quantity for each category (i.e WHEAT and > COTTON),but I have no idea how to write it in a simple manner. The number or > rows will change every day. > TY for any help. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Mohamed Lajnef,IE INSERM U955 eq 15 P?le de Psychiatrie H?pital CHENEVIER 40, rue Mesly 94010 CRETEIL Cedex FRANCE Mohamed.lajnef at inserm.fr tel : 01 49 81 31 31 (poste 18470) Sec : 01 49 81 32 90 fax : 01 49 81 30 99
Thank you for your help. The best I have found is to use the ddply function.> poseDESCRIPTION QUANITY CLOSING.PRICE 1 WHEAT May/10 1 467.75 2 WHEAT May/10 1 467.75 3 WHEAT May/10 1 467.75 4 WHEAT May/10 1 467.75 5 COTTON NO.2 May/10 1 78.13 6 COTTON NO.2 May/10 1 78.13 7 COTTON NO.2 May/10 1 78.13> library(plyr) > op=ddply(pose, c("DESCRIPTION","CLOSING.PRICE"),summarise, POSITIONsum(QUANITY)) > opDESCRIPTION CLOSING.PRICE POSITION 1 COTTON NO.2 May/10 78.13 3 2 WHEAT May/10 467.75 4 Op is a data.frame object.The trick is done! *************************** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F
Depending on the size of the dataframe and the operations you are trying to perform, aggregate or ddply may be better. In the function below, df has the same structure as your dataframe. Check out this code which runs aggregate and ddply for different dataframe sizes. ===========================require(plyr) CompareAggregation <- function(n) { df = data.frame(id=c(rep("A",15*n), rep("B",10*n), rep("C", 20*n))) df$fltval = rnorm(nrow(df)) df$intval = rbinom(nrow(df), 1000, 0.8) t1 <- system.time(zz1 <- aggregate(list(fltsum=df$fltval,intsum=df $intval), list(id=df$id), sum)) t2 <- system.time(zz2 <- ddply(df, .(id), function(x) c(sum(x $fltval), sum(x$intval)) )) return(c(agg=t1[[1]],ddply=t2[[1]])) } z <- c(10^seq(1,5)) names(z) <- as.character(z) res.df <- t(data.frame(lapply(z, CompareAggregation))) print(res.df) =========================== On Apr 14, 11:43?am, "arnaud Gaboury" <arnaud.gabo... at gmail.com> wrote:> Thank you for your help. The best I have found is to use the ddply function. > > > pose
On Thu, Apr 15, 2010 at 1:16 AM, Chuck <vijay.nori at gmail.com> wrote:> Depending on the size of the dataframe and the operations you are > trying to perform, aggregate or ddply may be better. ?In the function > below, df has the same structure as your dataframe.Current version of plyr: agg ddply X10 0.005 0.007 X100 0.007 0.026 X1000 0.086 0.248 X10000 0.577 3.136 X1e.05 4.493 44.147 Development version of plyr: agg ddply X10 0.003 0.005 X100 0.007 0.007 X1000 0.042 0.044 X10000 0.410 0.443 X1e.05 4.479 4.237 So there are some big speed improvements in the works. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
This is good news, although I have recently encountered what I consider excessive memory usage in the addition of key columns that don't affect the number of groups. For example, grouping by Year and Month, if I add MonthBegin, a POSIXct column from which the Year and Month columns were derived, I run out of memory. hadley wickham <h.wickham at gmail.com> wrote:>On Thu, Apr 15, 2010 at 1:16 AM, Chuck <vijay.nori at gmail.com> wrote: >> Depending on the size of the dataframe and the operations you are >> trying to perform, aggregate or ddply may be better. ?In the function >> below, df has the same structure as your dataframe. > >Current version of plyr: > > agg ddply >X10 0.005 0.007 >X100 0.007 0.026 >X1000 0.086 0.248 >X10000 0.577 3.136 >X1e.05 4.493 44.147 > >Development version of plyr: > > agg ddply >X10 0.003 0.005 >X100 0.007 0.007 >X1000 0.042 0.044 >X10000 0.410 0.443 >X1e.05 4.479 4.237 > >So there are some big speed improvements in the works. > >Hadley > > >-- >Assistant Professor / Dobelman Family Junior Chair >Department of Statistics / Rice University >http://had.co.nz/ > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Or try data.table 1.4 on r-forge, its grouping is faster than aggregate : agg datatable X10 0.012 0.008 X100 0.020 0.008 X1000 0.172 0.020 X10000 1.164 0.144 X1e.05 9.397 1.180 install.packages("data.table", repos="http://R-Forge.R-project.org") require(data.table) dt = as.data.table(df) t3 <- system.time(zz3 <- dt[, list(sumflt=sum(fltval), sumint=sum (intval)), by=id]) Matthew On Thu, 15 Apr 2010 13:09:17 +0000, hadley wickham wrote:> On Thu, Apr 15, 2010 at 1:16 AM, Chuck <vijay.nori at gmail.com> wrote: >> Depending on the size of the dataframe and the operations you are >> trying to perform, aggregate or ddply may be better. ?In the function >> below, df has the same structure as your dataframe. > > Current version of plyr: > > agg ddply > X10 0.005 0.007 > X100 0.007 0.026 > X1000 0.086 0.248 > X10000 0.577 3.136 > X1e.05 4.493 44.147 > > Development version of plyr: > > agg ddply > X10 0.003 0.005 > X100 0.007 0.007 > X1000 0.042 0.044 > X10000 0.410 0.443 > X1e.05 4.479 4.237 > > So there are some big speed improvements in the works. > > Hadley