Michael Karol
2011-Aug-02 14:32 UTC
[R] Help with aggregate syntax for a multi-column function please.
Dear R-experts: I am using a function called AUC whose arguments are data, time, id, and dv. data is the name of the dataframe, time is the independent variable column name, id is the subject id and dv is the dependent variable. The function computes area under the curve by trapezoidal rule, for each subject id. I would like to embed this in aggregate to further subset by each Cycle, DoseDayNominal and Drug, but I can't seem to get the aggregate syntax correct. All the examples I can find use single column function such as mean, whereas this AUC function requires four arguments. Could someone kindly show me the syntax? This is what I've tried so far: AUC.DF<- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x,tm,pt,conc) {AUC(x)}, tm="TimeBestEstimate", pt="Pt", conc="ConcentrationBQLzero" ) AUC.DF<- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x) {AUC(x,"TimeBestEstimate", "Pt", "ConcentrationBQLzero" )} ) AUC syntax is: args(AUC) function (data, time = "TIME", id = "ID", dv = "DV") thanks Regards, Michael [[alternative HTML version deleted]]
Jean V Adams
2011-Aug-02 15:12 UTC
[R] Help with aggregate syntax for a multi-column function please.
Michael, The function aggregate() is not going to work for your situation. The function is applied to the individual columns of the subsetted data, not the subsetted data frame as a whole. The help file reads: "Then, each of the variables (columns) in x is split into subsets of cases (rows) of identical combinations of the components of by, and FUN is applied to each such subset with further arguments in ... passed to it." If you can rewrite your function so that it is a function with one argument, the data frame alone, then using the by() function should give you what you need. Here is a simple example: df <- data.frame(a=1:5, b=2:6, i=c(1, 1, 1, 2, 2)) junk <- function(df) { sum(df$a^2) + prod(df$b) } data.frame(index=sort(unique(df$i)), results=as.vector(by(df[, c("a", "b")], df$i, junk))) Hope this helps. Jean `·.,, ><(((º> `·.,, ><(((º> `·.,, ><(((º> Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: "Michael Karol" <MKarol@syntapharma.com> To: <r-help@r-project.org> Date: 08/02/2011 09:35 AM Subject: [R] Help with aggregate syntax for a multi-column function please. Sent by: r-help-bounces@r-project.org Dear R-experts: I am using a function called AUC whose arguments are data, time, id, and dv. data is the name of the dataframe, time is the independent variable column name, id is the subject id and dv is the dependent variable. The function computes area under the curve by trapezoidal rule, for each subject id. I would like to embed this in aggregate to further subset by each Cycle, DoseDayNominal and Drug, but I can't seem to get the aggregate syntax correct. All the examples I can find use single column function such as mean, whereas this AUC function requires four arguments. Could someone kindly show me the syntax? This is what I've tried so far: AUC.DF<- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x,tm,pt,conc) {AUC(x)}, tm="TimeBestEstimate", pt="Pt", conc="ConcentrationBQLzero" ) AUC.DF<- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, PKdata$Drug), function(x) {AUC(x,"TimeBestEstimate", "Pt", "ConcentrationBQLzero" )} ) AUC syntax is: args(AUC) function (data, time = "TIME", id = "ID", dv = "DV") thanks Regards, Michael [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Dennis Murphy
2011-Aug-02 16:23 UTC
[R] Help with aggregate syntax for a multi-column function please.
Hi: Another way to do this is to use one of the summarization packages. The following uses the plyr package. The first step is to create a function that takes a data frame as input and outputs either a data frame or a scalar. In this case, the function returns a scalar, but if you want to carry along additional variables in the output, you can replace it with a data frame that returns the set of variables you want. You don't need to return the grouping variables, but no harm is done if you do. # This assumes the existence of a function AUC with the arguments # you stated in your post. I presume it returns a scalar value; if not, # you should modify it to return a data frame instead. It would probably # be better to modify AUC and call it in ddply() directly, but without the # function code there's not much one can do... myAUC <- function(df) AUC(df, 'TimeBestEstimate', 'Pt','ConcentrationBQLzero') library('plyr') ddply(PKdata, .(Cycle, DoseDayNominal, Drug), myAUC) This is obviously untested, so caveat emptor. Both plyr and data.table can accept functions with multiple arguments and do the right thing. The trick in plyr is to write a function that takes a generic input object (e.g., a (sub)data frame) and then uses (the variables within) it to do the necessary calculations. Generally, you want the output of the function to be compatible with the type of output you want from the **ply() function. In this case, ddply() means data frame input, data frame output; alply() would mean array input and list output, etc. If this doesn't work, please provide a reproducible example. HTH, Dennis On Tue, Aug 2, 2011 at 7:32 AM, Michael Karol <MKarol at syntapharma.com> wrote:> Dear R-experts: > > > > I am using a function called AUC whose arguments are data, time, id, and > dv. > > data is the name of the dataframe, > time is the independent variable column name, > id is the subject id and > dv is the dependent variable. > > The function computes area under the curve by trapezoidal rule, for each > subject id. > > I would like to embed this in aggregate to further subset by each Cycle, > DoseDayNominal and Drug, but I can't seem to get the aggregate syntax > correct. ?All the examples I can find use single column function such as > mean, whereas this AUC function requires four arguments. > > Could someone kindly show me the syntax? > > This is what I've tried so far: > > AUC.DF<- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, > PKdata$Drug), > ? ? ? ? ? ? ? ? ? function(x,tm,pt,conc) {AUC(x)}, > tm="TimeBestEstimate", pt="Pt", conc="ConcentrationBQLzero" ) > > AUC.DF<- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, > PKdata$Drug), > ? ? ? ? ? ? ? ? ? function(x) {AUC(x,"TimeBestEstimate", "Pt", > "ConcentrationBQLzero" )} ) > > AUC syntax is: > args(AUC) > function (data, time = "TIME", id = "ID", dv = "DV") > > > thanks > > > > Regards, > > Michael > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >