Dear list, I write a small function to calculate multiple stats on multiple variables and export in a format exactly the way I want. Everything seems fine until NA appears in the data. Here is my function: do.stats <- function(data, stats.func, summary.var) as.data.frame(signif(sapply(stats.func,function(func) mapply(func,data[summary.var])),3)) A test dataset: test <- data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100)) a command like the following do.stats(test, stats.func=c('mean','sd','median','min','max'), summary.var=c('CL','V1', 'V2','ALPHA')) gives me mean sd median min max CL 0.1030 0.917 0.0363 -2.32 2.47 V1 -0.0545 1.070 -0.2120 -2.21 2.70 V2 0.0600 1.000 0.0621 -2.80 2.62 ALPHA -0.0113 0.919 0.0284 -2.35 2.31 However if I have a NA in the data test$CL[1] <- NA The same command run gives me mean sd median min max CL * NA NA NA NA NA* V1 -0.0545 1.070 -0.2120 -2.21 2.70 V2 0.0600 1.000 0.0621 -2.80 2.62 ALPHA -0.0113 0.919 0.0284 -2.35 2.31 I know this is because those functions (mean, sd etc.) all have na.rm=F by default. How can I pass na.rm=T to all these functions without manually redefining those stats functions Appreciate any comment. Thanks for your help. Jun [[alternative HTML version deleted]]
Why not remove it yourself before passing it to those functions? -- Sent from my phone. Please excuse my brevity. On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen.ut at gmail.com> wrote:>Dear list, > >I write a small function to calculate multiple stats on multiple >variables >and export in a format exactly the way I want. Everything seems fine >until >NA appears in the data. > >Here is my function: > >do.stats <- function(data, stats.func, summary.var) > as.data.frame(signif(sapply(stats.func,function(func) >mapply(func,data[summary.var])),3)) > >A test dataset: >test <- >data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100)) > >a command like the following >do.stats(test, stats.func=c('mean','sd','median','min','max'), >summary.var=c('CL','V1', 'V2','ALPHA')) > >gives me > > mean sd median min max >CL 0.1030 0.917 0.0363 -2.32 2.47 >V1 -0.0545 1.070 -0.2120 -2.21 2.70 >V2 0.0600 1.000 0.0621 -2.80 2.62 >ALPHA -0.0113 0.919 0.0284 -2.35 2.31 > > >However if I have a NA in the data >test$CL[1] <- NA > >The same command run gives me > mean sd median min max >CL * NA NA NA NA NA* >V1 -0.0545 1.070 -0.2120 -2.21 2.70 >V2 0.0600 1.000 0.0621 -2.80 2.62 >ALPHA -0.0113 0.919 0.0284 -2.35 2.31 > >I know this is because those functions (mean, sd etc.) all have >na.rm=F by default. How can I > >pass na.rm=T to all these functions without manually redefining those >stats functions > >Appreciate any comment. > >Thanks for your help. > > >Jun > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Because in reality the NA may appear in one variable but not others. For example for ID=1, CL may be NA but not for others, For ID=2, V1 may be NA etc. To keep all the IDs and all the variables in one data frame, it's inevitable to see some NA On Thu, Jul 28, 2016 at 10:22 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> Why not remove it yourself before passing it to those functions? > -- > Sent from my phone. Please excuse my brevity. > > On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen.ut at gmail.com> wrote: > >Dear list, > > > >I write a small function to calculate multiple stats on multiple > >variables > >and export in a format exactly the way I want. Everything seems fine > >until > >NA appears in the data. > > > >Here is my function: > > > >do.stats <- function(data, stats.func, summary.var) > > as.data.frame(signif(sapply(stats.func,function(func) > >mapply(func,data[summary.var])),3)) > > > >A test dataset: > >test <- > > >data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100)) > > > >a command like the following > >do.stats(test, stats.func=c('mean','sd','median','min','max'), > >summary.var=c('CL','V1', 'V2','ALPHA')) > > > >gives me > > > > mean sd median min max > >CL 0.1030 0.917 0.0363 -2.32 2.47 > >V1 -0.0545 1.070 -0.2120 -2.21 2.70 > >V2 0.0600 1.000 0.0621 -2.80 2.62 > >ALPHA -0.0113 0.919 0.0284 -2.35 2.31 > > > > > >However if I have a NA in the data > >test$CL[1] <- NA > > > >The same command run gives me > > mean sd median min max > >CL * NA NA NA NA NA* > >V1 -0.0545 1.070 -0.2120 -2.21 2.70 > >V2 0.0600 1.000 0.0621 -2.80 2.62 > >ALPHA -0.0113 0.919 0.0284 -2.35 2.31 > > > >I know this is because those functions (mean, sd etc.) all have > >na.rm=F by default. How can I > > > >pass na.rm=T to all these functions without manually redefining those > >stats functions > > > >Appreciate any comment. > > > >Thanks for your help. > > > > > >Jun > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]