Ivan Calandra
2018-Aug-22 14:33 UTC
[R] differing behavior of mean(), median() and sd() with na.rm
Dear useRs, I have just noticed that when input is only NA with na.rm=TRUE, mean() results in NaN, whereas median() and sd() produce NA. Shouldn't it all be the same? I think NA makes more sense than NaN in that case. x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1] NAsd(x, na.rm=TRUE) [1] NA Thanks for any feedback. Best, Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra
Bert Gunter
2018-Aug-22 14:47 UTC
[R] differing behavior of mean(), median() and sd() with na.rm
Actually, the dissonance is a bit more basic. After xxx(...., na.rm=TRUE) with all NA's in ... you have numeric(0). So what you see is actually:> z <- numeric(0) > mean(z)[1] NaN> median(z)[1] NA> sd(z)[1] NA> sum(z)[1] 0 etc. I imagine that there may be more of these little inconsistencies due to the organic way R evolved over time. What the conventions should be can be purely a matter of personal opinion in the absence of accepted standards. But I would look to see what accepted standards were, if any, first. -- Bert On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra <calandra at rgzm.de> wrote:> Dear useRs, > > I have just noticed that when input is only NA with na.rm=TRUE, mean() > results in NaN, whereas median() and sd() produce NA. Shouldn't it all > be the same? I think NA makes more sense than NaN in that case. > > x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1] > NAsd(x, na.rm=TRUE) [1] NA > > Thanks for any feedback. > > Best, > Ivan > > -- > Dr. Ivan Calandra > TraCEr, laboratory for Traceology and Controlled Experiments > MONREPOS Archaeological Research Centre and > Museum for Human Behavioural Evolution > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Duncan Murdoch
2018-Aug-22 14:47 UTC
[R] differing behavior of mean(), median() and sd() with na.rm
On 22/08/2018 10:33 AM, Ivan Calandra wrote:> Dear useRs, > > I have just noticed that when input is only NA with na.rm=TRUE, mean() > results in NaN, whereas median() and sd() produce NA. Shouldn't it all > be the same? I think NA makes more sense than NaN in that case.The mean can be defined as sum(x)/length(x), so if x is length 0, you get 0/0 which is NaN. median(x) is documented in its help page to give NA for x of length 0. sd(x) is documented to give an error for such x and NA for length 1, but it gives NA for both. Duncan Murdoch> > x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1] > NAsd(x, na.rm=TRUE) [1] NA > > Thanks for any feedback. > > Best, > Ivan >
Bert Gunter
2018-Aug-22 14:55 UTC
[R] differing behavior of mean(), median() and sd() with na.rm
... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) 0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives NaN. So you can see the sorts of issues you may need to consider. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:> Actually, the dissonance is a bit more basic. > > After xxx(...., na.rm=TRUE) with all NA's in ... you have numeric(0). So > what you see is actually: > > > z <- numeric(0) > > mean(z) > [1] NaN > > median(z) > [1] NA > > sd(z) > [1] NA > > sum(z) > [1] 0 > etc. > > I imagine that there may be more of these little inconsistencies due to > the organic way R evolved over time. What the conventions should be can be > purely a matter of personal opinion in the absence of accepted standards. > But I would look to see what accepted standards were, if any, first. > > -- Bert > > > On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra <calandra at rgzm.de> wrote: > >> Dear useRs, >> >> I have just noticed that when input is only NA with na.rm=TRUE, mean() >> results in NaN, whereas median() and sd() produce NA. Shouldn't it all >> be the same? I think NA makes more sense than NaN in that case. >> >> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1] >> NAsd(x, na.rm=TRUE) [1] NA >> >> Thanks for any feedback. >> >> Best, >> Ivan >> >> -- >> Dr. Ivan Calandra >> TraCEr, laboratory for Traceology and Controlled Experiments >> MONREPOS Archaeological Research Centre and >> Museum for Human Behavioural Evolution >> Schloss Monrepos >> 56567 Neuwied, Germany >> +49 (0) 2631 9772-243 >> https://www.researchgate.net/profile/Ivan_Calandra >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]