Dear R users I searched some sources but i did not find an answer.Please give me some hint to following problem. I would like to compute a summary statistic for some vector for different factor levels. I know I can use tapply or aggregate but I do not know if there is a way how to use function with several (two) variable input (like weighted.mean). I wrote a simple a function for factor weighted mean fff<-function(x,fact,w) { ws<-tapply(w,fact,sum) newx<-x*w tapply(newx,fact,sum)/ws } which can handle particular case but does exist some more general solution how to use FUN(X1,X2) in aggregation procedures (tapply, aggregate, by) directly? Thank you Petr Pikal petr.pikal at precheza.cz p.pik at volny.cz -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi Petr, probably I don't understand correctly your question. However you can write any function with several arguments (input variables) and after use tapply. That is: fn<-function(arg1, arg2, arg3, ....){....} tapply(arg1, factor, fn, arg2, arg3,....) Furthermore you can you the three dots "..." to pass any argument in the functions in your fn() Hope this helps you! vito ----- Original Message ----- From: "Petr Pikal" <petr.pikal at precheza.cz> To: <r-help at stat.math.ethz.ch> Sent: Thursday, January 24, 2002 7:56 AM Subject: [R] aggregate, by tapply> Dear R users > > I searched some sources but i did not find an answer.Please give > me some hint to following problem. > > I would like to compute a summary statistic for some vector for > different factor levels. I know I can use tapply or aggregate but I > do not know if there is a way how to use function with several > (two) variable input (like weighted.mean). > > I wrote a simple a function for factor weighted mean > fff<-function(x,fact,w) > { > ws<-tapply(w,fact,sum) > newx<-x*w > tapply(newx,fact,sum)/ws > } > > which can handle particular case but does exist some more general > solution how to use FUN(X1,X2) in aggregation procedures > (tapply, aggregate, by) directly? > > Thank you > Petr Pikal > petr.pikal at precheza.cz > p.pik at volny.cz > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-> r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html> Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
In the case of *apply functions, the paramenters follow the name of the function. I.e., if you want to compute a mean with na.rm=T(which for one single vector would be mean(mivector,na.rm=T), then apply(mat,1,mean,na.rm=T) Agus Dr. Agustin Lobo Instituto de Ciencias de la Tierra (CSIC) Lluis Sole Sabaris s/n 08028 Barcelona SPAIN tel 34 93409 5410 fax 34 93411 0012 alobo at ija.csic.es On Thu, 24 Jan 2002, Petr Pikal wrote:> Dear R users > > I searched some sources but i did not find an answer.Please give > me some hint to following problem. > > I would like to compute a summary statistic for some vector for > different factor levels. I know I can use tapply or aggregate but I > do not know if there is a way how to use function with several > (two) variable input (like weighted.mean). > > I wrote a simple a function for factor weighted mean > fff<-function(x,fact,w) > { > ws<-tapply(w,fact,sum) > newx<-x*w > tapply(newx,fact,sum)/ws > } > > which can handle particular case but does exist some more general > solution how to use FUN(X1,X2) in aggregation procedures > (tapply, aggregate, by) directly? > > Thank you > Petr Pikal > petr.pikal at precheza.cz > p.pik at volny.cz > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
In the soon-to-be beta-released Hmisc library, you can do this with the summarize function (attached) in the following way: summarize(cbind(y,weights), groups, function(x)g(x[,1],x[,2])) I have attached code for summarize. Also, Hmisc has several functions for weighted estimators. -Frank Harrell On Thu, 24 Jan 2002 07:56:59 +0100 Petr Pikal <petr.pikal at precheza.cz> wrote:> Dear R users > > I searched some sources but i did not find an answer.Please give > me some hint to following problem. > > I would like to compute a summary statistic for some vector for > different factor levels. I know I can use tapply or aggregate but I > do not know if there is a way how to use function with several > (two) variable input (like weighted.mean). > > I wrote a simple a function for factor weighted mean > fff<-function(x,fact,w) > { > ws<-tapply(w,fact,sum) > newx<-x*w > tapply(newx,fact,sum)/ws > } > > which can handle particular case but does exist some more general > solution how to use FUN(X1,X2) in aggregation procedures > (tapply, aggregate, by) directly? > > Thank you > Petr Pikal > petr.pikal at precheza.cz > p.pik at volny.cz > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: summarize.txt Url: https://stat.ethz.ch/pipermail/r-help/attachments/20020124/fce89753/summarize.txt
On 24 Jan 2002 at 11:54, Agustin Lobo wrote:> > In the case of *apply functions, the paramenters follow > the name of the function. I.e., if you want to compute a mean > with na.rm=T(which for one single vector would be > mean(mivector,na.rm=T), then > > apply(mat,1,mean,na.rm=T) > > Agus >Thanks to all. Actually this works for mean, sum, var, sd with na.rm=T. My problem is with weighted.mean It works as standalone function, but inside any aggregation function it causes warning and it ***does not compute correctly***.> weighted.mean(lll[rrr==2001],ttt[rrr==2001])[1] -0.9257375> tapply(lll,rrr,weighted.mean,ttt)1997 1998 1999 2000 2001 -0.4495764 -0.4956762 -0.4920173 -0.9416626 -0.9455542 Warning messages: 1: longer object length is not a multiple of shorter object length in: x * w <snip> 5: longer object length is not a multiple of shorter object length in: x * w I traced the problem to ***lapply*** (probably the workhorse for all aggregate functions - see the enclosed code)> lapply(split(lll,rrr),weighted.mean,ttt)$"1997" [1] -0.4495764 <snip> $"2001" [1] -0.9455542 Warning messages: 1: longer object length is not a multiple of shorter object length in: x * w <snip> 5: longer object length is not a multiple of shorter object length in: x * w I used a modified wersion of weighted.mean which works alone> weighted.mean.modif(lll[rrr==2001],ttt[rrr==2001])[1] -0.9257375 weighted.mean.modif_function (x, w) { if (missing(w)) w <- rep(1, length(x)) { i <- complete.cases(x,w) w <- w[i] x <- x[i] } sw <-sum(w) sum(x * w)/sw } but using it in any aggregate function causes error and debugging does not show me any hints.> tapply(lll,rrr,weighted.mean,ttt)Error in complete.cases(...) : not all arguments have the same length debug: rval <- .Internal(lapply(X, FUN)) Browse[1]> Error in complete.cases(...) : not all arguments have the same length and this is completely beyond my ability to solve it. I use R 1.4.0 Windows version, lll is some property of a product rrr are years ttt is tonage of the product they are all the same length (226) but the length varies from year to year> tapply(lll,rrr,length)1997 1998 1999 2000 2001 48 51 40 42 45 Please if anybody can tell me where is the mistake.> Dr. Agustin Lobo > Instituto de Ciencias de la Tierra (CSIC) > Lluis Sole Sabaris s/n > 08028 Barcelona SPAIN > tel 34 93409 5410 > fax 34 93411 0012 > alobo at ija.csic.es > > > On Thu, 24 Jan 2002, Petr Pikal wrote: > > > Dear R users > > > > I searched some sources but i did not find an answer.Please give me > > some hint to following problem. > > > > I would like to compute a summary statistic for some vector for > > different factor levels. I know I can use tapply or aggregate but I > > do not know if there is a way how to use function with several (two) > > variable input (like weighted.mean). > > > > I wrote a simple a function for factor weighted mean > > fff<-function(x,fact,w) > > { > > ws<-tapply(w,fact,sum) > > newx<-x*w > > tapply(newx,fact,sum)/ws > > } > > > > which can handle particular case but does exist some more general > > solution how to use FUN(X1,X2) in aggregation procedures (tapply, > > aggregate, by) directly? > > > > Thank you > > Petr Pikal > > petr.pikal at precheza.cz > > p.pik at volny.cz > > > >Petr Pikal petr.pikal at precheza.cz p.pik at volny.cz -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 24 Jan 2002, Petr Pikal wrote:> Dear R users > > I searched some sources but i did not find an answer.Please give > me some hint to following problem. > > I would like to compute a summary statistic for some vector for > different factor levels. I know I can use tapply or aggregate but I > do not know if there is a way how to use function with several > (two) variable input (like weighted.mean). > > I wrote a simple a function for factor weighted mean > fff<-function(x,fact,w) > { > ws<-tapply(w,fact,sum) > newx<-x*w > tapply(newx,fact,sum)/ws > } > > which can handle particular case but does exist some more general > solution how to use FUN(X1,X2) in aggregation procedures > (tapply, aggregate, by) directly?If all your variables are in some data frame df you can do by(df, df$fact, function(df.i) weighted.mean(df.i$x,df.i$w)) -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._