thr3ads.net - R help - [R] How to pass na.rm=T to a user defined function [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Jun Shen

2016-Jul-30 00:08 UTC

[R] How to pass na.rm=T to a user defined function

Thanks Jeff/David for the reply. I wasn't clear in the previous message.
the problem of using na.omit is it will omit the whole row where there is
at least one NA, even when some variables do have non-NA values.

For example: let's define a new function
N <- function(x) length(x[!is.na(x)])

test <-
data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
test$CL[1] <- NA

do.stats(test,
stats.func=c('mean','sd','median','min','max','N'),
summary.var=c('CL','V1', 'V2','ALPHA'))

gives

         mean    sd  median   min  max  N
CL    -0.0232 0.918 -0.0786 -2.14 3.14 99
V1    -0.0410 0.936 -0.1160 -2.86 2.67 99
V2    -0.1760 0.978 -0.1490 -2.31 2.15 99
ALPHA -0.1380 0.960 -0.2160 -2.41 2.20 99

there is one non-missing value in V1,V2 and ALPHA is omitted.



On Fri, Jul 29, 2016 at 2:29 AM, David Winsemius <dwinsemius at
comcast.net>
wrote:
>
> > On Jul 28, 2016, at 7:37 PM, Jun Shen <jun.shen.ut at gmail.com>
wrote:
> >
> > Because in reality the NA may appear in one variable but not others.
For
> > example for ID=1, CL may be NA but not for others, For ID=2, V1 may be
NA
> > etc. To keep all the IDs and all the variables in one data frame,
it's
> > inevitable to see some NA
>
> That doesn't seem to acknowledge Newmiller's advice. In particular
this
> would have seemed to an obvious response to that suggestion:
>
> do.stats <- function(data, stats.func, summary.var)
>           as.data.frame(signif(sapply(stats.func,function(func)
> mapply( func,  na.omit( data[summary.var]) )), 3))
>
>
> And please also heed the advice in the Posting Guide to use plain text.
>
> --
> David.
>
>
>
> >
> > On Thu, Jul 28, 2016 at 10:22 PM, Jeff Newmiller <
> jdnewmil at dcn.davis.ca.us>
> > wrote:
> >
> >> Why not remove it yourself before passing it to those functions?
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >>
> >> On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen.ut at
gmail.com>
> wrote:
> >>> Dear list,
> >>>
> >>> I write a small function to calculate multiple stats on
multiple
> >>> variables
> >>> and export in a format exactly the way I want. Everything
seems fine
> >>> until
> >>> NA appears in the data.
> >>>
> >>> Here is my function:
> >>>
> >>> do.stats <- function(data, stats.func, summary.var)
> >>>          
as.data.frame(signif(sapply(stats.func,function(func)
> >>> mapply(func,data[summary.var])),3))
> >>>
> >>> A test dataset:
> >>> test <-
> >>
> >>>
>
data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
> >>>
> >>> a command like the following
> >>> do.stats(test,
stats.func=c('mean','sd','median','min','max'),
> >>> summary.var=c('CL','V1',
'V2','ALPHA'))
> >>>
> >>> gives me
> >>>
> >>>        mean    sd  median   min  max
> >>> CL     0.1030 0.917  0.0363 -2.32 2.47
> >>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
> >>> V2     0.0600 1.000  0.0621 -2.80 2.62
> >>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
> >>>
> >>>
> >>> However if I have a NA in the data
> >>> test$CL[1] <- NA
> >>>
> >>> The same command run gives me
> >>>        mean    sd  median   min  max
> >>> CL        * NA    NA      NA    NA   NA*
> >>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
> >>> V2     0.0600 1.000  0.0621 -2.80 2.62
> >>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
> >>>
> >>> I know this is because those functions (mean, sd etc.) all
have
> >>> na.rm=F by default. How can I
> >>>
> >>> pass na.rm=T to all these functions without manually
redefining those
> >>> stats functions
> >>>
> >>> Appreciate any comment.
> >>>
> >>> Thanks for your help.
> >>>
> >>>
> >>> Jun
> >>>
> >>>      [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
> >>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>
	[[alternative HTML version deleted]]

David Winsemius

2016-Jul-30 00:52 UTC

head link

[R] How to pass na.rm=T to a user defined function

> On Jul 29, 2016, at 5:08 PM, Jun Shen <jun.shen.ut at gmail.com>
wrote:
> 
> Thanks Jeff/David for the reply. I wasn't clear in the previous
message. the problem of using na.omit is it will omit the whole row where there
is at least one NA, even when some variables do have non-NA values.
Did you actually run the example I offered,  or did you just guess at what would
happen and complained? When applied only to a vector there is no such thing as a
"column".

What you are describing would only have happened if `na.omit` were applied to an
object that was a dataframe. That was not what was offered in the example.

-- 
David.> 
> For example: let's define a new function
> N <- function(x) length(x[!is.na(x)])
> 
> test <-
data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
> test$CL[1] <- NA
> 
> do.stats(test,
stats.func=c('mean','sd','median','min','max','N'),
summary.var=c('CL','V1', 'V2','ALPHA'))
> 
> gives
> 
>          mean    sd  median   min  max  N
> CL    -0.0232 0.918 -0.0786 -2.14 3.14 99
> V1    -0.0410 0.936 -0.1160 -2.86 2.67 99
> V2    -0.1760 0.978 -0.1490 -2.31 2.15 99
> ALPHA -0.1380 0.960 -0.2160 -2.41 2.20 99
> 
> 
> there is one non-missing value in V1,V2 and ALPHA is omitted.
> 
> 
> On Fri, Jul 29, 2016 at 2:29 AM, David Winsemius <dwinsemius at
comcast.net> wrote:
> 
> > On Jul 28, 2016, at 7:37 PM, Jun Shen <jun.shen.ut at gmail.com>
wrote:
> >
> > Because in reality the NA may appear in one variable but not others.
For
> > example for ID=1, CL may be NA but not for others, For ID=2, V1 may be
NA
> > etc. To keep all the IDs and all the variables in one data frame,
it's
> > inevitable to see some NA
> 
> That doesn't seem to acknowledge Newmiller's advice. In particular
this would have seemed to an obvious response to that suggestion:
> 
> do.stats <- function(data, stats.func, summary.var)
>           as.data.frame(signif(sapply(stats.func,function(func)
> mapply( func,  na.omit( data[summary.var]) )), 3))
> 
> 
> And please also heed the advice in the Posting Guide to use plain text.
> 
> --
> David.
> 
> 
> 
> >
> > On Thu, Jul 28, 2016 at 10:22 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
> > wrote:
> >
> >> Why not remove it yourself before passing it to those functions?
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >>
> >> On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen.ut at
gmail.com> wrote:
> >>> Dear list,
> >>>
> >>> I write a small function to calculate multiple stats on
multiple
> >>> variables
> >>> and export in a format exactly the way I want. Everything
seems fine
> >>> until
> >>> NA appears in the data.
> >>>
> >>> Here is my function:
> >>>
> >>> do.stats <- function(data, stats.func, summary.var)
> >>>          
as.data.frame(signif(sapply(stats.func,function(func)
> >>> mapply(func,data[summary.var])),3))
> >>>
> >>> A test dataset:
> >>> test <-
> >>
> >>>
data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
> >>>
> >>> a command like the following
> >>> do.stats(test,
stats.func=c('mean','sd','median','min','max'),
> >>> summary.var=c('CL','V1',
'V2','ALPHA'))
> >>>
> >>> gives me
> >>>
> >>>        mean    sd  median   min  max
> >>> CL     0.1030 0.917  0.0363 -2.32 2.47
> >>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
> >>> V2     0.0600 1.000  0.0621 -2.80 2.62
> >>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
> >>>
> >>>
> >>> However if I have a NA in the data
> >>> test$CL[1] <- NA
> >>>
> >>> The same command run gives me
> >>>        mean    sd  median   min  max
> >>> CL        * NA    NA      NA    NA   NA*
> >>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
> >>> V2     0.0600 1.000  0.0621 -2.80 2.62
> >>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
> >>>
> >>> I know this is because those functions (mean, sd etc.) all
have
> >>> na.rm=F by default. How can I
> >>>
> >>> pass na.rm=T to all these functions without manually
redefining those
> >>> stats functions
> >>>
> >>> Appreciate any comment.
> >>>
> >>> Thanks for your help.
> >>>
> >>>
> >>> Jun
> >>>
> >>>      [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
> >>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> 
David Winsemius
Alameda, CA, USA

David Winsemius

2016-Jul-30 01:00 UTC

head link

[R] How to pass na.rm=T to a user defined function

> On Jul 29, 2016, at 5:52 PM, David Winsemius <dwinsemius at
comcast.net> wrote:
> 
> 
>> On Jul 29, 2016, at 5:08 PM, Jun Shen <jun.shen.ut at gmail.com>
wrote:
>> 
>> Thanks Jeff/David for the reply. I wasn't clear in the previous
message. the problem of using na.omit is it will omit the whole row where there
is at least one NA, even when some variables do have non-NA values.
> 
> Did you actually run the example I offered,  or did you just guess at what
would happen and complained? When applied only to a vector there is no such
thing as a "column".
> 
> What you are describing would only have happened if `na.omit` were applied
to an object that was a dataframe. That was not what was offered in the example.
And then I looked at the code again and realized you were not looping over the
columns as I thought was happening. So what you wnat is:

do.stats <- function(data, stats.func, summary.var)
         as.data.frame(signif(sapply(stats.func,function(func)
mapply( func, lapply( data[summary.var], na.omit) )), 3))

-- 
David

> 
> -- 
> David.
>> 
>> For example: let's define a new function
>> N <- function(x) length(x[!is.na(x)])
>> 
>> test <-
data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
>> test$CL[1] <- NA
>> 
>> do.stats(test,
stats.func=c('mean','sd','median','min','max','N'),
summary.var=c('CL','V1', 'V2','ALPHA'))
>> 
>> gives
>> 
>>         mean    sd  median   min  max  N
>> CL    -0.0232 0.918 -0.0786 -2.14 3.14 99
>> V1    -0.0410 0.936 -0.1160 -2.86 2.67 99
>> V2    -0.1760 0.978 -0.1490 -2.31 2.15 99
>> ALPHA -0.1380 0.960 -0.2160 -2.41 2.20 99
>> 
>> 
>> there is one non-missing value in V1,V2 and ALPHA is omitted.
>> 
>> 
>> On Fri, Jul 29, 2016 at 2:29 AM, David Winsemius <dwinsemius at
comcast.net> wrote:
>> 
>>> On Jul 28, 2016, at 7:37 PM, Jun Shen <jun.shen.ut at
gmail.com> wrote:
>>> 
>>> Because in reality the NA may appear in one variable but not
others. For
>>> example for ID=1, CL may be NA but not for others, For ID=2, V1 may
be NA
>>> etc. To keep all the IDs and all the variables in one data frame,
it's
>>> inevitable to see some NA
>> 
>> That doesn't seem to acknowledge Newmiller's advice. In
particular this would have seemed to an obvious response to that suggestion:
>> 
>> do.stats <- function(data, stats.func, summary.var)
>>          as.data.frame(signif(sapply(stats.func,function(func)
>> mapply( func,  na.omit( data[summary.var]) )), 3))
>> 
>> 
>> And please also heed the advice in the Posting Guide to use plain text.
>> 
>> --
>> David.
>> 
>> 
>> 
>>> 
>>> On Thu, Jul 28, 2016 at 10:22 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
>>> wrote:
>>> 
>>>> Why not remove it yourself before passing it to those
functions?
>>>> --
>>>> Sent from my phone. Please excuse my brevity.
>>>> 
>>>> On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen.ut at
gmail.com> wrote:
>>>>> Dear list,
>>>>> 
>>>>> I write a small function to calculate multiple stats on
multiple
>>>>> variables
>>>>> and export in a format exactly the way I want. Everything
seems fine
>>>>> until
>>>>> NA appears in the data.
>>>>> 
>>>>> Here is my function:
>>>>> 
>>>>> do.stats <- function(data, stats.func, summary.var)
>>>>>         
as.data.frame(signif(sapply(stats.func,function(func)
>>>>> mapply(func,data[summary.var])),3))
>>>>> 
>>>>> A test dataset:
>>>>> test <-
>>>> 
>>>>>
data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
>>>>> 
>>>>> a command like the following
>>>>> do.stats(test,
stats.func=c('mean','sd','median','min','max'),
>>>>> summary.var=c('CL','V1',
'V2','ALPHA'))
>>>>> 
>>>>> gives me
>>>>> 
>>>>>       mean    sd  median   min  max
>>>>> CL     0.1030 0.917  0.0363 -2.32 2.47
>>>>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
>>>>> V2     0.0600 1.000  0.0621 -2.80 2.62
>>>>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
>>>>> 
>>>>> 
>>>>> However if I have a NA in the data
>>>>> test$CL[1] <- NA
>>>>> 
>>>>> The same command run gives me
>>>>>       mean    sd  median   min  max
>>>>> CL        * NA    NA      NA    NA   NA*
>>>>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
>>>>> V2     0.0600 1.000  0.0621 -2.80 2.62
>>>>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
>>>>> 
>>>>> I know this is because those functions (mean, sd etc.) all
have
>>>>> na.rm=F by default. How can I
>>>>> 
>>>>> pass na.rm=T to all these functions without manually
redefining those
>>>>> stats functions
>>>>> 
>>>>> Appreciate any comment.
>>>>> 
>>>>> Thanks for your help.
>>>>> 
>>>>> 
>>>>> Jun
>>>>> 
>>>>>     [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>> 
>>>> 
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
>> 
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

R help - Jul 2016 - How to pass na.rm=T to a user defined function

[R] How to pass na.rm=T to a user defined function

[R] How to pass na.rm=T to a user defined function

[R] How to pass na.rm=T to a user defined function