Bert Gunter
2015-Mar-05 16:59 UTC
[R] R 3.1.2 using a custom function in aggregate() function on Windows 7 OS 64bit
That's not what ?aggregate says: "aggregate.data.frame is the data frame method. If x is not a data frame, it is coerced to one, which must have a non-zero number of rows. Then, each of the variables (columns) in x is split into subsets of cases (rows) of identical combinations of the components of by, and FUN is applied to each such subset with further arguments in ... passed to it." As I read this, the argument of FUN is a data frame that is a subset of the original frame, defined by the by variable values. No? -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Thu, Mar 5, 2015 at 8:55 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> I don't see your point. No matter which version of aggregate you use, FUN is applied to vectors. Those vectors may be columns in a data frame or not, but FUN is always given one vector at a time by aggregate. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On March 5, 2015 8:12:39 AM PST, Bert Gunter <gunter.berton at gene.com> wrote: >>Sorry, Jeff. aggregate() is generic. >> >>>From ?aggregate: >> >>"## S3 method for class 'data.frame' >>aggregate(x, by, FUN, ..., simplify = TRUE)" >> >>Cheers, >>Bert >> >>Bert Gunter >>Genentech Nonclinical Biostatistics >>(650) 467-7374 >> >>"Data is not information. Information is not knowledge. And knowledge >>is certainly not wisdom." >>Clifford Stoll >> >> >> >> >>On Thu, Mar 5, 2015 at 7:54 AM, Jeff Newmiller >><jdnewmil at dcn.davis.ca.us> wrote: >>> The aggregate function applies FUN to vectors, not data frames. For >>example, the default "mean" function accepts a vector such as a column >>in a data frame and returns a scalar (well, a vector of length 1). >>Aggregate then calls this function once for each piece of the column(s) >>you give it. Your function wants two vectors, but aggregate does not >>understand how to give two inputs. >>> >>> (In the future, please follow R-help mailing list guidelines and post >>using plain text so your code does not get messed up.) >>> >>> You could use split to break your data frame into a list of data >>frames, and then sapply to extract the results you are looking for. I >>prefer to use the plyr or dplyr or data.table packages to do all this >>for me. >>> >>> d_rule <- function( DF ) { >>> i <- which( DF$a==max( DF$a ) ) >>> if ( length( i ) == 1 ){ >>> DF[ i, "x" ] >>> } else { >>> min( DF[ , "x" ] ) # did you mean min( DF$x[i] ) ? >>> } >>> } >>> >>> dat <- data.frame( a=c(2,2,1,4,2,5,2,3,4,4) >>> , x = c(1:10) >>> , g = c(1,1,2,2,3,3,4,4,5,5) >>> ) >>> # note that cbind on vectors creates a matrix >>> # in a matrix all columns must be of the same type >>> # but data frames generally have a variety of types >>> # so don't use cbind when making a data frame >>> >>> library( dplyr ) >>> >>> result <- dat %>% group_by( g ) %>% do( answer = d_rule( . ) ) %>% >>as.data.frame >>> >>> >>--------------------------------------------------------------------------- >>> Jeff Newmiller The ..... ..... Go >>Live... >>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>Go... >>> Live: OO#.. Dead: OO#.. >>Playing >>> Research Engineer (Solar/Batteries O.O#. #.O#. with >>> /Software/Embedded Controllers) .OO#. .OO#. >>rocks...1k >>> >>--------------------------------------------------------------------------- >>> Sent from my phone. Please excuse my brevity. >>> >>> On March 4, 2015 2:02:06 PM PST, Typhenn Brichieri-Colombi via R-help >><r-help at r-project.org> wrote: >>>>Hello, >>>> >>>>I am trying to use the following custom function in an >>>>aggregatefunction, but cannot get R to recognize my data. I?ve read >>the >>>>help on function()and on aggregate() but am unable to solve my >>problem. >>>>How can I get R torecognize the data inputs for the custom function >>>>nested within aggregate()? >>>> >>>>My custom function is found below, as well as the errormessage I get >>>>when I run it on a test data set (I will be using this functionon a >>>>much larger dataset (over 600,000 rows)) >>>> >>>>Thank you for your time and your help! >>>> >>>> >>>> >>>>d_rule<-function(a,x){ >>>> >>>>i<-which(a==max(a)) >>>> >>>>out<-ifelse(length(i)==1, x[i], min(x)) >>>> >>>>return(out) >>>> >>>>} >>>> >>>> >>>> >>>>a<-c(2,2,1,4,2,5,2,3,4,4) >>>> >>>>x<-c(1:10) >>>> >>>>g<-c(1,1,2,2,3,3,4,4,5,5) >>>> >>>>dat<-as.data.frame(cbind(x,g)) >>>> >>>> >>>> >>>>test<-aggregate(dat, by=list(g), FUN=d_rule,dat$a, dat$x) >>>> >>>>Error in dat$x : $ operator is invalid for atomic vectors >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>>______________________________________________ >>>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >
Jeff Newmiller
2015-Mar-05 18:47 UTC
[R] R 3.1.2 using a custom function in aggregate() function on Windows 7 OS 64bit
Bert: using the sample data frame from below, try to interpret the output of this: aggregate( dat[,1:2], dat[,"g",drop=FALSE, FUN=function(x){print(x);class(x)}) The help text you quote is probably not as clear as it should be. Would the following be better? "... and FUN is applied to each column in each such subset with further arguments in ... passed to it." I became aware of this "feature" because this application of exactly the same aggregation function to all of my data columns is not convenient for my day-to-day work. Thus, I don't use "aggregate" very often. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On March 5, 2015 8:59:55 AM PST, Bert Gunter <gunter.berton at gene.com> wrote:>That's not what ?aggregate says: > >"aggregate.data.frame is the data frame method. If x is not a data >frame, it is coerced to one, which must have a non-zero number of >rows. Then, each of the variables (columns) in x is split into subsets >of cases (rows) of identical combinations of the components of by, and >FUN is applied to each such subset with further arguments in ... >passed to it." > > >As I read this, the argument of FUN is a data frame that is a subset >of the original frame, defined by the by variable values. > > >No? > > >-- Bert > >Bert Gunter >Genentech Nonclinical Biostatistics >(650) 467-7374 > >"Data is not information. Information is not knowledge. And knowledge >is certainly not wisdom." >Clifford Stoll > > > > >On Thu, Mar 5, 2015 at 8:55 AM, Jeff Newmiller ><jdnewmil at dcn.davis.ca.us> wrote: >> I don't see your point. No matter which version of aggregate you use, >FUN is applied to vectors. Those vectors may be columns in a data frame >or not, but FUN is always given one vector at a time by aggregate. >> >--------------------------------------------------------------------------- >> Jeff Newmiller The ..... ..... Go >Live... >> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >Go... >> Live: OO#.. Dead: OO#.. >Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >> >--------------------------------------------------------------------------- >> Sent from my phone. Please excuse my brevity. >> >> On March 5, 2015 8:12:39 AM PST, Bert Gunter <gunter.berton at gene.com> >wrote: >>>Sorry, Jeff. aggregate() is generic. >>> >>>>From ?aggregate: >>> >>>"## S3 method for class 'data.frame' >>>aggregate(x, by, FUN, ..., simplify = TRUE)" >>> >>>Cheers, >>>Bert >>> >>>Bert Gunter >>>Genentech Nonclinical Biostatistics >>>(650) 467-7374 >>> >>>"Data is not information. Information is not knowledge. And knowledge >>>is certainly not wisdom." >>>Clifford Stoll >>> >>> >>> >>> >>>On Thu, Mar 5, 2015 at 7:54 AM, Jeff Newmiller >>><jdnewmil at dcn.davis.ca.us> wrote: >>>> The aggregate function applies FUN to vectors, not data frames. For >>>example, the default "mean" function accepts a vector such as a >column >>>in a data frame and returns a scalar (well, a vector of length 1). >>>Aggregate then calls this function once for each piece of the >column(s) >>>you give it. Your function wants two vectors, but aggregate does not >>>understand how to give two inputs. >>>> >>>> (In the future, please follow R-help mailing list guidelines and >post >>>using plain text so your code does not get messed up.) >>>> >>>> You could use split to break your data frame into a list of data >>>frames, and then sapply to extract the results you are looking for. I >>>prefer to use the plyr or dplyr or data.table packages to do all this >>>for me. >>>> >>>> d_rule <- function( DF ) { >>>> i <- which( DF$a==max( DF$a ) ) >>>> if ( length( i ) == 1 ){ >>>> DF[ i, "x" ] >>>> } else { >>>> min( DF[ , "x" ] ) # did you mean min( DF$x[i] ) ? >>>> } >>>> } >>>> >>>> dat <- data.frame( a=c(2,2,1,4,2,5,2,3,4,4) >>>> , x = c(1:10) >>>> , g = c(1,1,2,2,3,3,4,4,5,5) >>>> ) >>>> # note that cbind on vectors creates a matrix >>>> # in a matrix all columns must be of the same type >>>> # but data frames generally have a variety of types >>>> # so don't use cbind when making a data frame >>>> >>>> library( dplyr ) >>>> >>>> result <- dat %>% group_by( g ) %>% do( answer = d_rule( . ) ) %>% >>>as.data.frame >>>> >>>> >>>--------------------------------------------------------------------------- >>>> Jeff Newmiller The ..... ..... Go >>>Live... >>>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. >Live >>>Go... >>>> Live: OO#.. Dead: OO#.. >>>Playing >>>> Research Engineer (Solar/Batteries O.O#. #.O#. >with >>>> /Software/Embedded Controllers) .OO#. .OO#. >>>rocks...1k >>>> >>>--------------------------------------------------------------------------- >>>> Sent from my phone. Please excuse my brevity. >>>> >>>> On March 4, 2015 2:02:06 PM PST, Typhenn Brichieri-Colombi via >R-help >>><r-help at r-project.org> wrote: >>>>>Hello, >>>>> >>>>>I am trying to use the following custom function in an >>>>>aggregatefunction, but cannot get R to recognize my data. I?ve read >>>the >>>>>help on function()and on aggregate() but am unable to solve my >>>problem. >>>>>How can I get R torecognize the data inputs for the custom function >>>>>nested within aggregate()? >>>>> >>>>>My custom function is found below, as well as the errormessage I >get >>>>>when I run it on a test data set (I will be using this functionon a >>>>>much larger dataset (over 600,000 rows)) >>>>> >>>>>Thank you for your time and your help! >>>>> >>>>> >>>>> >>>>>d_rule<-function(a,x){ >>>>> >>>>>i<-which(a==max(a)) >>>>> >>>>>out<-ifelse(length(i)==1, x[i], min(x)) >>>>> >>>>>return(out) >>>>> >>>>>} >>>>> >>>>> >>>>> >>>>>a<-c(2,2,1,4,2,5,2,3,4,4) >>>>> >>>>>x<-c(1:10) >>>>> >>>>>g<-c(1,1,2,2,3,3,4,4,5,5) >>>>> >>>>>dat<-as.data.frame(cbind(x,g)) >>>>> >>>>> >>>>> >>>>>test<-aggregate(dat, by=list(g), FUN=d_rule,dat$a, dat$x) >>>>> >>>>>Error in dat$x : $ operator is invalid for atomic vectors >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>>______________________________________________ >>>>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>>PLEASE do read the posting guide >>>>>http://www.R-project.org/posting-guide.html >>>>>and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>
Bert Gunter
2015-Mar-05 19:19 UTC
[R] R 3.1.2 using a custom function in aggregate() function on Windows 7 OS 64bit
Well, I obviously don't use it either, as I'm just quoting the docs. I either use by(), or tapply(). -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Thu, Mar 5, 2015 at 10:47 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> Bert: using the sample data frame from below, try to interpret the output of this: > > aggregate( dat[,1:2], dat[,"g",drop=FALSE, FUN=function(x){print(x);class(x)}) > > The help text you quote is probably not as clear as it should be. Would the following be better? > > "... and FUN is applied to each column in each such subset with further arguments in ... passed to it." > > I became aware of this "feature" because this application of exactly the same aggregation function to all of my data columns is not convenient for my day-to-day work. Thus, I don't use "aggregate" very often. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On March 5, 2015 8:59:55 AM PST, Bert Gunter <gunter.berton at gene.com> wrote: >>That's not what ?aggregate says: >> >>"aggregate.data.frame is the data frame method. If x is not a data >>frame, it is coerced to one, which must have a non-zero number of >>rows. Then, each of the variables (columns) in x is split into subsets >>of cases (rows) of identical combinations of the components of by, and >>FUN is applied to each such subset with further arguments in ... >>passed to it." >> >> >>As I read this, the argument of FUN is a data frame that is a subset >>of the original frame, defined by the by variable values. >> >> >>No? >> >> >>-- Bert >> >>Bert Gunter >>Genentech Nonclinical Biostatistics >>(650) 467-7374 >> >>"Data is not information. Information is not knowledge. And knowledge >>is certainly not wisdom." >>Clifford Stoll >> >> >> >> >>On Thu, Mar 5, 2015 at 8:55 AM, Jeff Newmiller >><jdnewmil at dcn.davis.ca.us> wrote: >>> I don't see your point. No matter which version of aggregate you use, >>FUN is applied to vectors. Those vectors may be columns in a data frame >>or not, but FUN is always given one vector at a time by aggregate. >>> >>--------------------------------------------------------------------------- >>> Jeff Newmiller The ..... ..... Go >>Live... >>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>Go... >>> Live: OO#.. Dead: OO#.. >>Playing >>> Research Engineer (Solar/Batteries O.O#. #.O#. with >>> /Software/Embedded Controllers) .OO#. .OO#. >>rocks...1k >>> >>--------------------------------------------------------------------------- >>> Sent from my phone. Please excuse my brevity. >>> >>> On March 5, 2015 8:12:39 AM PST, Bert Gunter <gunter.berton at gene.com> >>wrote: >>>>Sorry, Jeff. aggregate() is generic. >>>> >>>>>From ?aggregate: >>>> >>>>"## S3 method for class 'data.frame' >>>>aggregate(x, by, FUN, ..., simplify = TRUE)" >>>> >>>>Cheers, >>>>Bert >>>> >>>>Bert Gunter >>>>Genentech Nonclinical Biostatistics >>>>(650) 467-7374 >>>> >>>>"Data is not information. Information is not knowledge. And knowledge >>>>is certainly not wisdom." >>>>Clifford Stoll >>>> >>>> >>>> >>>> >>>>On Thu, Mar 5, 2015 at 7:54 AM, Jeff Newmiller >>>><jdnewmil at dcn.davis.ca.us> wrote: >>>>> The aggregate function applies FUN to vectors, not data frames. For >>>>example, the default "mean" function accepts a vector such as a >>column >>>>in a data frame and returns a scalar (well, a vector of length 1). >>>>Aggregate then calls this function once for each piece of the >>column(s) >>>>you give it. Your function wants two vectors, but aggregate does not >>>>understand how to give two inputs. >>>>> >>>>> (In the future, please follow R-help mailing list guidelines and >>post >>>>using plain text so your code does not get messed up.) >>>>> >>>>> You could use split to break your data frame into a list of data >>>>frames, and then sapply to extract the results you are looking for. I >>>>prefer to use the plyr or dplyr or data.table packages to do all this >>>>for me. >>>>> >>>>> d_rule <- function( DF ) { >>>>> i <- which( DF$a==max( DF$a ) ) >>>>> if ( length( i ) == 1 ){ >>>>> DF[ i, "x" ] >>>>> } else { >>>>> min( DF[ , "x" ] ) # did you mean min( DF$x[i] ) ? >>>>> } >>>>> } >>>>> >>>>> dat <- data.frame( a=c(2,2,1,4,2,5,2,3,4,4) >>>>> , x = c(1:10) >>>>> , g = c(1,1,2,2,3,3,4,4,5,5) >>>>> ) >>>>> # note that cbind on vectors creates a matrix >>>>> # in a matrix all columns must be of the same type >>>>> # but data frames generally have a variety of types >>>>> # so don't use cbind when making a data frame >>>>> >>>>> library( dplyr ) >>>>> >>>>> result <- dat %>% group_by( g ) %>% do( answer = d_rule( . ) ) %>% >>>>as.data.frame >>>>> >>>>> >>>>--------------------------------------------------------------------------- >>>>> Jeff Newmiller The ..... ..... Go >>>>Live... >>>>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. >>Live >>>>Go... >>>>> Live: OO#.. Dead: OO#.. >>>>Playing >>>>> Research Engineer (Solar/Batteries O.O#. #.O#. >>with >>>>> /Software/Embedded Controllers) .OO#. .OO#. >>>>rocks...1k >>>>> >>>>--------------------------------------------------------------------------- >>>>> Sent from my phone. Please excuse my brevity. >>>>> >>>>> On March 4, 2015 2:02:06 PM PST, Typhenn Brichieri-Colombi via >>R-help >>>><r-help at r-project.org> wrote: >>>>>>Hello, >>>>>> >>>>>>I am trying to use the following custom function in an >>>>>>aggregatefunction, but cannot get R to recognize my data. I?ve read >>>>the >>>>>>help on function()and on aggregate() but am unable to solve my >>>>problem. >>>>>>How can I get R torecognize the data inputs for the custom function >>>>>>nested within aggregate()? >>>>>> >>>>>>My custom function is found below, as well as the errormessage I >>get >>>>>>when I run it on a test data set (I will be using this functionon a >>>>>>much larger dataset (over 600,000 rows)) >>>>>> >>>>>>Thank you for your time and your help! >>>>>> >>>>>> >>>>>> >>>>>>d_rule<-function(a,x){ >>>>>> >>>>>>i<-which(a==max(a)) >>>>>> >>>>>>out<-ifelse(length(i)==1, x[i], min(x)) >>>>>> >>>>>>return(out) >>>>>> >>>>>>} >>>>>> >>>>>> >>>>>> >>>>>>a<-c(2,2,1,4,2,5,2,3,4,4) >>>>>> >>>>>>x<-c(1:10) >>>>>> >>>>>>g<-c(1,1,2,2,3,3,4,4,5,5) >>>>>> >>>>>>dat<-as.data.frame(cbind(x,g)) >>>>>> >>>>>> >>>>>> >>>>>>test<-aggregate(dat, by=list(g), FUN=d_rule,dat$a, dat$x) >>>>>> >>>>>>Error in dat$x : $ operator is invalid for atomic vectors >>>>>> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>>______________________________________________ >>>>>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>PLEASE do read the posting guide >>>>>>http://www.R-project.org/posting-guide.html >>>>>>and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> >