thr3ads.net - R help - [R] Aggregating data (with more than one function) [Mar 2005]

If this information is useful, please help other people find it:
Share via:

Sivakumaran Raman

2005-Mar-29 01:15 UTC

[R] Aggregating data (with more than one function)

I have the data similar to the following in a data frame:
    LastName   Department  Salary
1   Johnson    IT          56000
2   James      HR          54223
3   Howe       Finance     80000
4   Jones      Finance     82000
5   Norwood    IT          67000
6   Benson     Sales       76000
7   Smith      Sales       65778
8   Baker      HR          56778
9   Dempsey    HR          78999
10  Nolan      Sales       45667
11  Garth      Finance     89777
12  Jameson    IT          56786

I want to calculate both the mean salary broken down by Department and 
also the
total amount paid out per department i.e. I want both sum(Salary) and
mean(Salary) for each Department. Right now, I am using aggregate.data.frame
twice, creating two data frames, and then combining them using data.frame.
However, this seems to be very memory and processor intensive and is 
taking a
very long time on my data set. Is there a quicker way to do this?

Thanks in advance,
Siv Raman

Liaw, Andy

2005-Mar-29 01:45 UTC

head link

[R] Aggregating data (with more than one function)

Here's one possible way, using the data you supplied:
> dat <- read.table("clipboard", header=T, row=1)
> do.call("rbind",by(dat$Salary, dat$Department, function(x)
c(mean=mean(x),total=sum(x))))
            mean  total
Finance 83925.67 251777
HR      63333.33 190000
IT      59928.67 179786
Sales   62481.67 187445

If you need the department names as a variable, you can add that easily.

HTH,
Andy
> From: Sivakumaran Raman
> 
> I have the data similar to the following in a data frame:
>     LastName   Department  Salary
> 1   Johnson    IT          56000
> 2   James      HR          54223
> 3   Howe       Finance     80000
> 4   Jones      Finance     82000
> 5   Norwood    IT          67000
> 6   Benson     Sales       76000
> 7   Smith      Sales       65778
> 8   Baker      HR          56778
> 9   Dempsey    HR          78999
> 10  Nolan      Sales       45667
> 11  Garth      Finance     89777
> 12  Jameson    IT          56786
> 
> I want to calculate both the mean salary broken down by 
> Department and 
> also the
> total amount paid out per department i.e. I want both sum(Salary) and
> mean(Salary) for each Department. Right now, I am using 
> aggregate.data.frame
> twice, creating two data frames, and then combining them 
> using data.frame.
> However, this seems to be very memory and processor intensive and is 
> taking a
> very long time on my data set. Is there a quicker way to do this?
> 
> Thanks in advance,
> Siv Raman
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

Marc Schwartz

2005-Mar-29 01:56 UTC

head link

[R] Aggregating data (with more than one function)

On Mon, 2005-03-28 at 19:15 -0600, Sivakumaran Raman
wrote:> I have the data similar to the following in a data frame:
>     LastName   Department  Salary
> 1   Johnson    IT          56000
> 2   James      HR          54223
> 3   Howe       Finance     80000
> 4   Jones      Finance     82000
> 5   Norwood    IT          67000
> 6   Benson     Sales       76000
> 7   Smith      Sales       65778
> 8   Baker      HR          56778
> 9   Dempsey    HR          78999
> 10  Nolan      Sales       45667
> 11  Garth      Finance     89777
> 12  Jameson    IT          56786
> 
> I want to calculate both the mean salary broken down by Department and 
> also the
> total amount paid out per department i.e. I want both sum(Salary) and
> mean(Salary) for each Department. Right now, I am using
aggregate.data.frame
> twice, creating two data frames, and then combining them using data.frame.
> However, this seems to be very memory and processor intensive and is 
> taking a
> very long time on my data set. Is there a quicker way to do this?
> 
> Thanks in advance,
> Siv Raman

Here is one approach:

Presuming that 'df' is your data frame:

# Create a function that returns both values
my.summ <- function(x)
{
  c(mean = mean(x), sum = sum(x))
}


# Now split() 'df'  by Department
df.s <- split(df$Salary, df$Department)


# Now run the summary, using sapply()> sapply(df.s, my.summ)     Finance  HR       IT       Sales   
mean 83925.67 63333.33 59928.67 62481.67
sum  251777   190000   179786   187445


HTH,

Marc Schwartz

Adaikalavan Ramasamy

2005-Mar-29 01:59 UTC

head link

[R] Aggregating data (with more than one function)

In the Arguments section of help(aggregate), you will find :

     FUN: a scalar function to compute the summary statistics which can
          be applied to all data subsets.


a) So you can try the 'by' function :
> by( df[ , 3], df$Department, function(x) c(mean(x), sum(x)) )  INDICES: Finance
  [1]  83925.67 251777.00
  ------------------------------------------------------------
  INDICES: HR
  [1]  63333.33 190000.00
  ------------------------------------------------------------
  INDICES: IT
  [1]  59928.67 179786.00
  ------------------------------------------------------------
  INDICES: Sales
  [1]  62481.67 187445.00


b) or use tapply more directly :
> tmp <- tapply(df$Salary, df$Department, function(x)                                             c( mean(x), sum(x) ) )
  $Finance
  [1]  83925.67 251777.00

  $HR
  [1]  63333.33 190000.00

  $IT
  [1]  59928.67 179786.00

  $Sales
  [1]  62481.67 187445.00


And using the 'sapply( tmp, c )' gives you a slightly more compact
output as

       Finance        HR        IT     Sales
[1,]  83925.67  63333.33  59928.67  62481.67
[2,] 251777.00 190000.00 179786.00 187445.00


Regards, Adai


On Mon, 2005-03-28 at 19:15 -0600, Sivakumaran Raman
wrote:> I have the data similar to the following in a data frame:
>     LastName   Department  Salary
> 1   Johnson    IT          56000
> 2   James      HR          54223
> 3   Howe       Finance     80000
> 4   Jones      Finance     82000
> 5   Norwood    IT          67000
> 6   Benson     Sales       76000
> 7   Smith      Sales       65778
> 8   Baker      HR          56778
> 9   Dempsey    HR          78999
> 10  Nolan      Sales       45667
> 11  Garth      Finance     89777
> 12  Jameson    IT          56786
> 
> I want to calculate both the mean salary broken down by Department and 
> also the
> total amount paid out per department i.e. I want both sum(Salary) and
> mean(Salary) for each Department. Right now, I am using
aggregate.data.frame
> twice, creating two data frames, and then combining them using data.frame.
> However, this seems to be very memory and processor intensive and is 
> taking a
> very long time on my data set. Is there a quicker way to do this?
> 
> Thanks in advance,
> Siv Raman
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Robin Schroeder

2005-Mar-29 21:57 UTC

head link

[R] Aggregating data (with more than one function)

Dear list & Andy, 

I am hopelessly stumped, how would one add the department names as a variable?

Robin
> Robin Tori Schroeder
> International Institute for Sustainability 
> P.O. Box 873211
> Arizona State University
> Tempe, Arizona 85287-3211
> Phone: (480) 727-7290
> 
> 

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Liaw, Andy
Sent: Monday, March 28, 2005 6:45 PM
To: 'Sivakumaran Raman'; r-help at stat.math.ethz.ch
Subject: RE: [R] Aggregating data (with more than one function)


Here's one possible way, using the data you supplied:
> dat <- read.table("clipboard", header=T, row=1)
> do.call("rbind",by(dat$Salary, dat$Department, function(x)
c(mean=mean(x),total=sum(x))))
            mean  total
Finance 83925.67 251777
HR      63333.33 190000
IT      59928.67 179786
Sales   62481.67 187445

If you need the department names as a variable, you can add that easily.

HTH,
Andy
> From: Sivakumaran Raman
> 
> I have the data similar to the following in a data frame:
>     LastName   Department  Salary
> 1   Johnson    IT          56000
> 2   James      HR          54223
> 3   Howe       Finance     80000
> 4   Jones      Finance     82000
> 5   Norwood    IT          67000
> 6   Benson     Sales       76000
> 7   Smith      Sales       65778
> 8   Baker      HR          56778
> 9   Dempsey    HR          78999
> 10  Nolan      Sales       45667
> 11  Garth      Finance     89777
> 12  Jameson    IT          56786
> 
> I want to calculate both the mean salary broken down by 
> Department and 
> also the
> total amount paid out per department i.e. I want both sum(Salary) and
> mean(Salary) for each Department. Right now, I am using 
> aggregate.data.frame
> twice, creating two data frames, and then combining them 
> using data.frame.
> However, this seems to be very memory and processor intensive and is 
> taking a
> very long time on my data set. Is there a quicker way to do this?
> 
> Thanks in advance,
> Siv Raman
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Liaw, Andy

2005-Mar-30 01:22 UTC

head link

[R] Aggregating data (with more than one function)

> agg.dat <- do.call("rbind", by(dat$Salary, dat$Department, +                function(x) c(mean=mean(x),
total=sum(x))))> agg.dat <- data.frame(dept=rownames(agg.dat), agg.dat)
> agg.dat           dept     mean  total
Finance Finance 83925.67 251777
HR           HR 63333.33 190000
IT           IT 59928.67 179786
Sales     Sales 62481.67 187445

Andy 
> From: Robin Schroeder 
> 
> Dear list & Andy, 
> 
> I am hopelessly stumped, how would one add the department 
> names as a variable?
> 
> Robin
> 
> > Robin Tori Schroeder
> > International Institute for Sustainability 
> > P.O. Box 873211
> > Arizona State University
> > Tempe, Arizona 85287-3211
> > Phone: (480) 727-7290
> > 
> > 
> 
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Liaw, Andy
> Sent: Monday, March 28, 2005 6:45 PM
> To: 'Sivakumaran Raman'; r-help at stat.math.ethz.ch
> Subject: RE: [R] Aggregating data (with more than one function)
> 
> 
> Here's one possible way, using the data you supplied:
> 
> > dat <- read.table("clipboard", header=T, row=1)
> > do.call("rbind",by(dat$Salary, dat$Department, function(x) 
> c(mean=mean(x),
> total=sum(x))))
>             mean  total
> Finance 83925.67 251777
> HR      63333.33 190000
> IT      59928.67 179786
> Sales   62481.67 187445
> 
> If you need the department names as a variable, you can add 
> that easily.
> 
> HTH,
> Andy
> 
> > From: Sivakumaran Raman
> > 
> > I have the data similar to the following in a data frame:
> >     LastName   Department  Salary
> > 1   Johnson    IT          56000
> > 2   James      HR          54223
> > 3   Howe       Finance     80000
> > 4   Jones      Finance     82000
> > 5   Norwood    IT          67000
> > 6   Benson     Sales       76000
> > 7   Smith      Sales       65778
> > 8   Baker      HR          56778
> > 9   Dempsey    HR          78999
> > 10  Nolan      Sales       45667
> > 11  Garth      Finance     89777
> > 12  Jameson    IT          56786
> > 
> > I want to calculate both the mean salary broken down by 
> > Department and 
> > also the
> > total amount paid out per department i.e. I want both 
> sum(Salary) and
> > mean(Salary) for each Department. Right now, I am using 
> > aggregate.data.frame
> > twice, creating two data frames, and then combining them 
> > using data.frame.
> > However, this seems to be very memory and processor 
> intensive and is 
> > taking a
> > very long time on my data set. Is there a quicker way to do this?
> > 
> > Thanks in advance,
> > Siv Raman
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

bogdan romocea

2005-Apr-21 15:25 UTC

head link

[R] Aggregating data (with more than one function)

I am looking for an answer to a similar question - a generalized
solution that would be able to apply
   (1) any number of functions
   (2) to any number of vectors
   (3) by any number of factors 
(just like SQL's group by). 
The output data frame must contain the values of the by factors, to be
used for joins.

Aggregate() does (2) and (3). The solutions posted to this thread
(split+sapply, by, tapply) do (1) and (3) (or so it seems to me). What
would be the best way to get to (1)+(2)+(3)?

I am inclined to use aggregate() in a loop with 
eval(parse(text="aggregate expression here")). 
Running
groupby <- do.call("rbind", by(var_i, list(a,b,c,d,e,f),
	function(x) c(fct1(x),fct2(x),fct3(x),fct4(x))))
in a loop (var_1, var_2 etc) would be very nice but I don't know how
to add a-f as columns in the output data frame.

Thank you,
b.



On Mon, 2005-03-28 at 19:15 -0600, Sivakumaran Raman
wrote:> I have the data similar to the following in a data frame:
>     LastName   Department  Salary
> 1   Johnson    IT          56000
> 2   James      HR          54223
> 3   Howe       Finance     80000
> 4   Jones      Finance     82000
> 5   Norwood    IT          67000
> 6   Benson     Sales       76000
> 7   Smith      Sales       65778
> 8   Baker      HR          56778
> 9   Dempsey    HR          78999
> 10  Nolan      Sales       45667
> 11  Garth      Finance     89777
> 12  Jameson    IT          56786
> 
> I want to calculate both the mean salary broken down by Department and 
> also the
> total amount paid out per department i.e. I want both sum(Salary) and
> mean(Salary) for each Department. Right now, I am using
aggregate.data.frame
> twice, creating two data frames, and then combining them using data.frame.
> However, this seems to be very memory and processor intensive and is 
> taking a
> very long time on my data set. Is there a quicker way to do this?
> 
> Thanks in advance,
> Siv Raman
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Seemingly Similar Threads

Search for more maybe matching threads

R help - Mar 2005 - Aggregating data (with more than one function)

[R] Aggregating data (with more than one function)

[R] Aggregating data (with more than one function)

[R] Aggregating data (with more than one function)

[R] Aggregating data (with more than one function)

[R] Aggregating data (with more than one function)

[R] Aggregating data (with more than one function)

[R] Aggregating data (with more than one function)

Seemingly Similar Threads