A question for R (and perhaps S and SPlus) historians. Does anyone know the reason for the inconsistency in the way that the action that should be taken when data are missing is specified? There are several variants, na.action, na.omit, "T", TRUE, etc. I know that a foolish consistency is the hobgoblin of a small mind, but consistency can make things easier. My question is not meant as a complaint. I very much admire the R development team. I simply am curious. John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 - NOTE NEW EMAIL ADDRESS: jsorkin at grecc.umaryland.edu
Duncan Murdoch
2005-Sep-03 15:40 UTC
[R] Inconsistence in specifying action for missing data
John Sorkin wrote:> A question for R (and perhaps S and SPlus) historians. > > Does anyone know the reason for the inconsistency in the way that the > action that should be taken when data are missing is specified? There > are several variants, na.action, na.omit, "T", TRUE, etc. I know that a > foolish consistency is the hobgoblin of a small mind, but consistency > can make things easier. > > My question is not meant as a complaint. I very much admire the R > development team. I simply am curious.R and S have been developed by lots of people, over a long time. I think that's it. Duncan Murdoch
Thomas Lumley
2005-Sep-04 16:42 UTC
[R] Inconsistence in specifying action for missing data
On Sat, 3 Sep 2005, John Sorkin wrote:> A question for R (and perhaps S and SPlus) historians. > > Does anyone know the reason for the inconsistency in the way that the > action that should be taken when data are missing is specified? There > are several variants, na.action, na.omit, "T", TRUE, etc. I know that a > foolish consistency is the hobgoblin of a small mind, but consistency > can make things easier. >There's actually a little more consistency than first appears. There are two most common ways to refer to missingness, na.rm and na.action. Usually na.rm has default TRUE (using T is a bug) and removes NAs from one vector at a time. na.action usually has default na.omit() and works on whole data frames, eg na.omit and na.exclude do casewise deletion if any variable is NA. These aren't completely uniform, and that is simply historical. I think there was once an attempt to make na.fail() the default na.action, but there was too much resistance to change. -thomas