thr3ads.net - R help - [R] Is the aggregate function the best way to do this? [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Bert Jacobs

2010-Feb-17 08:36 UTC

[R] Is the aggregate function the best way to do this?

Hi,

 

I''m having a dataframe ''Subset1'' with a number of
factor variables and 160
numerical variables

Now I want to make sums for all rows that have the same values for the
different factor variables, except for the factor variables: VAR1,VAR2,VAR3
who may have the same values.

With the formula given below this works great, but in a situation with 15000
rows and 13 factor variables the calculation takes more than 2 minutes. 

So my question is: Does anyone knows if there exists a faster alternative? 

 

Subset1.AGG <-
as.data.frame(aggregate(Subset1[,(ncol(Subset1)-159):ncol(Subset1)],
list(VAR1 = Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3), FUN=sum) )

 

Thank you very much for helping me out,

Bert


	[[alternative HTML version deleted]]

Mohamed Lajnef

2010-Feb-17 09:01 UTC

head link

[R] Is the aggregate function the best way to do this?

Hi Bret,

Try to use rowSums function

Regrads
M

Bert Jacobs a ?crit :> Hi,
>
>  
>
> I'm having a dataframe 'Subset1' with a number of factor
variables and 160
> numerical variables
>
> Now I want to make sums for all rows that have the same values for the
> different factor variables, except for the factor variables: VAR1,VAR2,VAR3
> who may have the same values.
>
> With the formula given below this works great, but in a situation with
15000
> rows and 13 factor variables the calculation takes more than 2 minutes. 
>
> So my question is: Does anyone knows if there exists a faster alternative? 
>
>  
>
> Subset1.AGG <-
> as.data.frame(aggregate(Subset1[,(ncol(Subset1)-159):ncol(Subset1)],
> list(VAR1 = Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3), FUN=sum) )
>
>  
>
> Thank you very much for helping me out,
>
> Bert
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>   

-- 


Mohamed Lajnef,IE 
INSERM U955 eq 15
P?le de Psychiatrie
H?pital CHENEVIER
40, rue Mesly
94010 CRETEIL Cedex FRANCE
Mohamed.lajnef at inserm.fr
tel : 01 49 81 31 31 (poste 18470)
Sec : 01 49 81 32 90
fax : 01 49 81 30 99

Petr PIKAL

2010-Feb-17 09:18 UTC

head link

[R] Odp: Is the aggregate function the best way to do this?

Hi

r-help-bounces at r-project.org napsal dne 17.02.2010 09:36:45:
> Hi,
> 
> 
> 
> I'm having a dataframe 'Subset1' with a number of factor
variables and
160> numerical variables
> 
> Now I want to make sums for all rows that have the same values for the
> different factor variables, except for the factor variables: 
VAR1,VAR2,VAR3> who may have the same values.
> 
> With the formula given below this works great, but in a situation with 
15000> rows and 13 factor variables the calculation takes more than 2 minutes. 
> 
> So my question is: Does anyone knows if there exists a faster alternative? 

I believe plyr package has optimised code for such aggregations. But I do 
not use it myself much often so I am not sure.

You probably could speed things by avoiding ncol(Subset1) in aggregate. 
Either use numbers or do

selection <- Subset1[,(ncol(Subset1)-159):ncol(Subset1)]

and avoiding unnecessary coercion to data.frame. Aggregate perform it for 
you :-)

Subset1.AGG <- aggregate(selection, list(VAR1 = 
Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3), FUN=sum)

Regards
Petr
> 
> 
> 
> Subset1.AGG <-
> as.data.frame(aggregate(Subset1[,(ncol(Subset1)-159):ncol(Subset1)],
> list(VAR1 = Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3), 
FUN=sum) )> 
> 
> 
> Thank you very much for helping me out,
> 
> Bert
> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more maybe matching threads

R help - Feb 2010 - Is the aggregate function the best way to do this?

[R] Is the aggregate function the best way to do this?

[R] Is the aggregate function the best way to do this?

[R] Odp: Is the aggregate function the best way to do this?

Maybe Matching Threads