thr3ads.net - R help - [R] Help on aggregate method [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Stella Pachidi

2010-Jun-01 14:48 UTC

[R] Help on aggregate method

Dear R experts,

I would really appreciate if you had an idea on how to use more
efficiently the aggregate method:

More specifically, I would like to calculate the mean of certain
values on a data frame,? grouped by various attributes, and then
create a new column in the data frame that will have the corresponding
mean for every row. I attach part of my code:

matchMean <- function(ind,dataTable,aggrTable)
{
    index <- which((aggrTable[,1]==dataTable[["Attr1"]][ind]) &
(aggrTable[,2]==dataTable[["Attr2"]][ind]))
    as.numeric(aggrTable[index,3])
}

avgDur <- aggregate(ap.dat[["Dur"]], by =
list(ap.dat[["Attr1"]],
ap.dat[["Attr2"]]), FUN="mean")
meanDur <- sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur)
ap.dat <- cbind (ap.dat, meanDur)

As I deal with very large dataset, it takes long time to run my
matching function, so if you had an idea on how to automate more this
matching process I would be really grateful.

Thank you very much in advance!

Kind regards,
Stella



--
Stella Pachidi
Master in Business Informatics student
Utrecht University

Erik Iverson

2010-Jun-01 14:58 UTC

head link

[R] Help on aggregate method

It's easiest for us to help if you give us a reproducible example.  We 
don't have your datasets (ap.dat), so we can't run your code below. 
It's easy to create sample data with the random number generators in R, 
or use ?dput to give us a sample of your actual data.frame.

I would guess your problem is solved by ?ave though.

Stella Pachidi wrote:> Dear R experts,
> 
> I would really appreciate if you had an idea on how to use more
> efficiently the aggregate method:
> 
> More specifically, I would like to calculate the mean of certain
> values on a data frame,  grouped by various attributes, and then
> create a new column in the data frame that will have the corresponding
> mean for every row. I attach part of my code:
> 
> matchMean <- function(ind,dataTable,aggrTable)
> {
>     index <- which((aggrTable[,1]==dataTable[["Attr1"]][ind])
&
> (aggrTable[,2]==dataTable[["Attr2"]][ind]))
>     as.numeric(aggrTable[index,3])
> }
> 
> avgDur <- aggregate(ap.dat[["Dur"]], by =
list(ap.dat[["Attr1"]],
> ap.dat[["Attr2"]]), FUN="mean")
> meanDur <- sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur)
> ap.dat <- cbind (ap.dat, meanDur)
> 
> As I deal with very large dataset, it takes long time to run my
> matching function, so if you had an idea on how to automate more this
> matching process I would be really grateful.
> 
> Thank you very much in advance!
> 
> Kind regards,
> Stella
> 
> 
> 
> --
> Stella Pachidi
> Master in Business Informatics student
> Utrecht University
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Joris Meys

2010-Jun-01 15:27 UTC

head link

[R] Help on aggregate method

Take a look at
?split (and unsplit)

eg:
Dur <- rnorm(100)
Attr1=rep(c("A","B"),each=50)
Attr2=rep(c("A","B"),times=50)

ap.dat <-data.frame(Attr1,Attr2,Dur)

split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2)
ap.list <-split(ap.dat,split.fact)
ap.mean <-lapply(ap.list,function(x){
        x$meanDur=rep(mean(x$Dur),dim(x)[1])
        return(x)
  })

ap.dat.fast <- unsplit(ap.mean,split.fact)

system.time on 1000 replicates gives :> system.time(replicate(1000,{+ split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2)
+ ap.list <-split(ap.dat,split.fact)
+ ap.mean <-lapply(ap.list,functi .... [TRUNCATED]
   user  system elapsed
   4.88    0.00    4.88> source(.trPaths[5], echo=TRUE, max.deparse.length=150)
> system.time(replicate(1000,{+ avgDur <- aggregate(ap.dat[["Dur"]], by =
list(ap.dat[["Attr1"]],
+ ap.dat[["Attr2"]]), FUN="mean")
+ meanDur <- sapp .... [TRUNCATED]
   user  system elapsed
  58.00    0.11   58.13>
It should be a tenfold faster.

Cheers
Joris


On Tue, Jun 1, 2010 at 4:48 PM, Stella Pachidi
<stella.pachidi@gmail.com>wrote:
> Dear R experts,
>
> I would really appreciate if you had an idea on how to use more
> efficiently the aggregate method:
>
> More specifically, I would like to calculate the mean of certain
> values on a data frame,  grouped by various attributes, and then
> create a new column in the data frame that will have the corresponding
> mean for every row. I attach part of my code:
>
> matchMean <- function(ind,dataTable,aggrTable)
> {
>    index <- which((aggrTable[,1]==dataTable[["Attr1"]][ind])
&
> (aggrTable[,2]==dataTable[["Attr2"]][ind]))
>    as.numeric(aggrTable[index,3])
> }
>
> avgDur <- aggregate(ap.dat[["Dur"]], by =
list(ap.dat[["Attr1"]],
> ap.dat[["Attr2"]]), FUN="mean")
> meanDur <- sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur)
> ap.dat <- cbind (ap.dat, meanDur)
>
> As I deal with very large dataset, it takes long time to run my
> matching function, so if you had an idea on how to automate more this
> matching process I would be really grateful.
>
> Thank you very much in advance!
>
> Kind regards,
> Stella
>
>
>
> --
> Stella Pachidi
> Master in Business Informatics student
> Utrecht University
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
Joris.Meys@Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jun 2010 - Help on aggregate method

[R] Help on aggregate method

[R] Help on aggregate method

[R] Help on aggregate method

Seemingly Similar Threads