This is just posed out of curiosity, (not as a criticism per se). But what is the functional role of the argument na.rm inside the mean() function? If there are missing values, mean() will always return an NA as in the example below. But, is there ever a purpose in computing a mean only to receive NA as a result? In 10 years of using R, I have always used mean() in order to get a result, which is the opposite of its default behavior (when there are NAs). Can anyone suggest a reason why it is in fact desired to get NA as a result of computing mean()?> x <- rnorm(100) > x[1] <- NA> mean(x)[1] NA> mean(x, na.rm=TRUE)[1] 0.08136736 If the reason is to alert the user that the vector has missing values, I suppose I could buy that. But, I think other checks are better Harold [[alternative HTML version deleted]]
In SQL, the default is to ignore NULL (equivalent to NA in R). However, it can be dangerous to fail to verify how much data was actually used in an aggregation, so the logic behind the default na.rm setting may be one of encouraging the user to take responsibility for missing data. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. "Doran, Harold" <HDoran@air.org> wrote: This is just posed out of curiosity, (not as a criticism per se). But what is the functional role of the argument na.rm inside the mean() function? If there are missing values, mean() will always return an NA as in the example below. But, is there ever a purpose in computing a mean only to receive NA as a result? In 10 years of using R, I have always used mean() in order to get a result, which is the opposite of its default behavior (when there are NAs). Can anyone suggest a reason why it is in fact desired to get NA as a result of computing mean()?> x <- rnorm(100) > x[1] <- NA> mean(x)[1] NA> mean(x, na.rm=TRUE)[1] 0.08136736 If the reason is to alert the user that the vector has missing values, I suppose I could buy that. But, I think other checks are better Harold [[alternative HTML version deleted]] _____________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
On 12/07/2011 12:26 PM, Doran, Harold wrote:> This is just posed out of curiosity, (not as a criticism per se). But what is the functional role of the argument na.rm inside the mean() function? If there are missing values, mean() will always return an NA as in the example below. But, is there ever a purpose in computing a mean only to receive NA as a result?The general idea in R is that NA stands for "unknown". If some of the values in a vector are unknown, then the mean of the vector is also unknown. NA is also used in other ways sometimes; then it makes sense to remove it and compute the mean of the other values. Duncan Murdoch> In 10 years of using R, I have always used mean() in order to get a result, which is the opposite of its default behavior (when there are NAs). Can anyone suggest a reason why it is in fact desired to get NA as a result of computing mean()? > > > x<- rnorm(100) > > x[1]<- NA > > > mean(x) > [1] NA > > > mean(x, na.rm=TRUE) > [1] 0.08136736 > > If the reason is to alert the user that the vector has missing values, I suppose I could buy that. But, I think other checks are better > > Harold > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Harold, Many (most?) of the statistics function have a similar argument. I suspect it is sort of to warn the user---you have to be explicit about it rather than the program just silently removing or ignoring values that would not work in the function called. I can think of one example where I want a missing value returned. In psychology we often create scores on some construct (say optimism), by averaging individuals' response to several questions. In certain cases if a subject does not respond to one question, their overall score should be missing. This is easily accomplished by letting na.rm = FALSE. Cheers, Josh On Tue, Jul 12, 2011 at 9:26 AM, Doran, Harold <HDoran at air.org> wrote:> This is just posed out of curiosity, (not as a criticism per se). But what is the functional role of the argument na.rm inside the mean() function? If there are missing values, mean() will always return an NA as in the example below. But, is there ever a purpose in computing a mean only to receive NA as a result? > > In 10 years of using R, I have always used mean() in order to get a result, which is the opposite of its default behavior (when there are NAs). Can anyone suggest a reason why it is in fact desired to get NA as a result of computing mean()? > >> x <- rnorm(100) >> x[1] <- NA > >> mean(x) > [1] NA > >> mean(x, na.rm=TRUE) > [1] 0.08136736 > > If the reason is to alert the user that the vector has missing values, I suppose I could buy that. But, I think other checks are better > > Harold > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/