Hi, I''m having a dataframe ''Subset1'' with a number of factor variables and 160 numerical variables Now I want to make sums for all rows that have the same values for the different factor variables, except for the factor variables: VAR1,VAR2,VAR3 who may have the same values. With the formula given below this works great, but in a situation with 15000 rows and 13 factor variables the calculation takes more than 2 minutes. So my question is: Does anyone knows if there exists a faster alternative? Subset1.AGG <- as.data.frame(aggregate(Subset1[,(ncol(Subset1)-159):ncol(Subset1)], list(VAR1 = Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3), FUN=sum) ) Thank you very much for helping me out, Bert [[alternative HTML version deleted]]
Mohamed Lajnef
2010-Feb-17 09:01 UTC
[R] Is the aggregate function the best way to do this?
Hi Bret, Try to use rowSums function Regrads M Bert Jacobs a ?crit :> Hi, > > > > I'm having a dataframe 'Subset1' with a number of factor variables and 160 > numerical variables > > Now I want to make sums for all rows that have the same values for the > different factor variables, except for the factor variables: VAR1,VAR2,VAR3 > who may have the same values. > > With the formula given below this works great, but in a situation with 15000 > rows and 13 factor variables the calculation takes more than 2 minutes. > > So my question is: Does anyone knows if there exists a faster alternative? > > > > Subset1.AGG <- > as.data.frame(aggregate(Subset1[,(ncol(Subset1)-159):ncol(Subset1)], > list(VAR1 = Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3), FUN=sum) ) > > > > Thank you very much for helping me out, > > Bert > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Mohamed Lajnef,IE INSERM U955 eq 15 P?le de Psychiatrie H?pital CHENEVIER 40, rue Mesly 94010 CRETEIL Cedex FRANCE Mohamed.lajnef at inserm.fr tel : 01 49 81 31 31 (poste 18470) Sec : 01 49 81 32 90 fax : 01 49 81 30 99
Petr PIKAL
2010-Feb-17 09:18 UTC
[R] Odp: Is the aggregate function the best way to do this?
Hi r-help-bounces at r-project.org napsal dne 17.02.2010 09:36:45:> Hi, > > > > I'm having a dataframe 'Subset1' with a number of factor variables and160> numerical variables > > Now I want to make sums for all rows that have the same values for the > different factor variables, except for the factor variables:VAR1,VAR2,VAR3> who may have the same values. > > With the formula given below this works great, but in a situation with15000> rows and 13 factor variables the calculation takes more than 2 minutes. > > So my question is: Does anyone knows if there exists a fasteralternative? I believe plyr package has optimised code for such aggregations. But I do not use it myself much often so I am not sure. You probably could speed things by avoiding ncol(Subset1) in aggregate. Either use numbers or do selection <- Subset1[,(ncol(Subset1)-159):ncol(Subset1)] and avoiding unnecessary coercion to data.frame. Aggregate perform it for you :-) Subset1.AGG <- aggregate(selection, list(VAR1 = Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3), FUN=sum) Regards Petr> > > > Subset1.AGG <- > as.data.frame(aggregate(Subset1[,(ncol(Subset1)-159):ncol(Subset1)], > list(VAR1 = Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3),FUN=sum) )> > > > Thank you very much for helping me out, > > Bert > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.