thr3ads.net - R help - [R] Comparison of aggregate in R and group by in mysql [Jan 2008]

If this information is useful, please help other people find it:
Share via:

zhihuali

2008-Jan-26 23:45 UTC

[R] Comparison of aggregate in R and group by in mysql

Hi, netters,

First of all, thanks a lot for all the prompt replies to my earlier question
about "merging" data frames in R.
Actually that's an equivalence to the "join" clause in mysql.

Now I have another question. Suppose I have a data frame X with lots of
columns/variables:
Name, Age,Group, Type, Salary.
I wanna do a subtotal of salaries:
aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean)

When the levels of Group and Type are huge, it took R forever to finish the
aggregation.
And I used gc to find that the memory usage was big too.

However, in mysql, it took seconds to finish a similar job:
select Group,Age,Type ,avg(Salary)  from X group by  Group,Age,Type

Is it because mysql is superior in doing such kind of things? Or my R command is
not efficient enough? Why did R have to consume huge memories to do the
aggregation?

Thanks again!

Zhihua Li

_________________________________________________________________
ÌìÁ¹ÁË£¬ÌíÒÂÁË£¬ÐÄ¶¯ÁË£¬¡°Æß¼þ¡±ÁË 
http://get.live.cn
	[[alternative HTML version deleted]]

Wensui Liu

2008-Jan-26 23:54 UTC

head link

[R] Comparison of aggregate in R and group by in mysql

huali,
if i were you, i will create a view on the MySql server to aggregate
the data first and then use R to pull the data through this created
view. This is not only applicable to R but also a general guideline in
similar situation.
Per my understanding and experience, R is able to do data manipulation
reasonably well. However, we should always use the right tool to do
the right thing.

On Jan 26, 2008 6:45 PM, zhihuali <lzhtom at hotmail.com>
wrote:>
> Hi, netters,
>
> First of all, thanks a lot for all the prompt replies to my earlier
question about "merging" data frames in R.
> Actually that's an equivalence to the "join" clause in mysql.
>
> Now I have another question. Suppose I have a data frame X with lots of
columns/variables:
> Name, Age,Group, Type, Salary.
> I wanna do a subtotal of salaries:
> aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean)
>
> When the levels of Group and Type are huge, it took R forever to finish the
aggregation.
> And I used gc to find that the memory usage was big too.
>
> However, in mysql, it took seconds to finish a similar job:
> select Group,Age,Type ,avg(Salary)  from X group by  Group,Age,Type
>
> Is it because mysql is superior in doing such kind of things? Or my R
command is not efficient enough? Why did R have to consume huge memories to do
the aggregation?
>
> Thanks again!
>
> Zhihua Li
>
> _________________________________________________________________
> ????????????"??"?
> http://get.live.cn
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
==============================WenSui Liu
Statistical Project Manager
ChoicePoint Precision Marketing
(http://spaces.msn.com/statcompute/blog)
===============================

Gabor Grothendieck

2008-Jan-27 00:07 UTC

head link

[R] Comparison of aggregate in R and group by in mysql

How does the it compare if you read it into R and then do your
aggregate with sqldf:

library(sqldf)

# example using builtin data set CO2
CO2agg <- sqldf("select Plant, Type, Treatment, avg(conc) from CO2
group by Plant, Type, Treatment")

# or using your data:

Xagg <- sqldf("select Group, Age, Type, avg(Salary) from X group by
Group, Age, Type")


On Jan 26, 2008 6:45 PM, zhihuali <lzhtom at hotmail.com>
wrote:>
> Hi, netters,
>
> First of all, thanks a lot for all the prompt replies to my earlier
question about "merging" data frames in R.
> Actually that's an equivalence to the "join" clause in mysql.
>
> Now I have another question. Suppose I have a data frame X with lots of
columns/variables:
> Name, Age,Group, Type, Salary.
> I wanna do a subtotal of salaries:
> aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean)
>
> When the levels of Group and Type are huge, it took R forever to finish the
aggregation.
> And I used gc to find that the memory usage was big too.
>
> However, in mysql, it took seconds to finish a similar job:
> select Group,Age,Type ,avg(Salary)  from X group by  Group,Age,Type
>
> Is it because mysql is superior in doing such kind of things? Or my R
command is not efficient enough? Why did R have to consume huge memories to do
the aggregation?
>
> Thanks again!
>
> Zhihua Li
>
> _________________________________________________________________
> ????????????"??"?
> http://get.live.cn
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

jim holtman

2008-Jan-27 00:55 UTC

head link

[R] Comparison of aggregate in R and group by in mysql

How large is your dataframe?  How much memory do you have on your
system?  Are you paging?  Here is a test I ran with a data frame with
1,000,000 entries and it seems to be fast:
> n <- 1000000
> x <- data.frame(A=sample(LETTERS,n,TRUE), B=sample(letters[1:4],n,TRUE),
+     C=sample(LETTERS[1:4], n, TRUE), data=runif(n))> system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean))   user  system elapsed
   2.65    0.34    3.00>
On Jan 26, 2008 6:45 PM, zhihuali <lzhtom at hotmail.com>
wrote:>
> Hi, netters,
>
> First of all, thanks a lot for all the prompt replies to my earlier
question about "merging" data frames in R.
> Actually that's an equivalence to the "join" clause in mysql.
>
> Now I have another question. Suppose I have a data frame X with lots of
columns/variables:
> Name, Age,Group, Type, Salary.
> I wanna do a subtotal of salaries:
> aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean)
>
> When the levels of Group and Type are huge, it took R forever to finish the
aggregation.
> And I used gc to find that the memory usage was big too.
>
> However, in mysql, it took seconds to finish a similar job:
> select Group,Age,Type ,avg(Salary)  from X group by  Group,Age,Type
>
> Is it because mysql is superior in doing such kind of things? Or my R
command is not efficient enough? Why did R have to consume huge memories to do
the aggregation?
>
> Thanks again!
>
> Zhihua Li
>
> _________________________________________________________________
> ????????????"??"?
> http://get.live.cn
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jan 2008 - Comparison of aggregate in R and group by in mysql

[R] Comparison of aggregate in R and group by in mysql

[R] Comparison of aggregate in R and group by in mysql

[R] Comparison of aggregate in R and group by in mysql

[R] Comparison of aggregate in R and group by in mysql

Possibly Parallel Threads