Dear R-users, Running R 2.1.1 in WindowsXP, there seems to be a 'bug' in sd() If>x<-c(1,2,3,NA,5) >mean(x)[1] NA But>sd(x)Or>var(x)give Error in var(x, na.rm = na.rm) : missing observations in cov/cor There are obvious work-rounds, like>sd(x, is.na(x)==F)which gives the result (with error message) [1] 1.707825 Warning message: the condition has length > 1 and only the first element will be used in: if (na.rm) "complete.obs" else "all.obs" or>y<-subset(x, is.na(x)==F) >sd(y)[1] 1.707825 Am I missing something, or is this a problem with sd()? Why does mean(x) give a simple NA, but var(x) requires some additional work? There was some previous discussion in r-help about this for R v 1.6.0, with mention of a fix for 1.6.1. Dr Roger Dungan School of Biological Sciences University of Cantebury Christchurch, New Zealand ph +64 3 366 7001 ext. 4848 fax +64 3 364 2590
Try:> sd (x,na.rm=TRUE)[1] 1.707825 Jarek ====================================================\==== Jarek Tuszynski, PhD. o / \ Science Applications International Corporation <\__,| (703) 676-4192 "> \ Jaroslaw.W.Tuszynski at saic.com ` \ -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Roger Dungan Sent: Monday, October 31, 2005 4:24 PM To: r-help at stat.math.ethz.ch Subject: [R] Still a bug with NA in sd() or var()? Dear R-users, Running R 2.1.1 in WindowsXP, there seems to be a 'bug' in sd() If>x<-c(1,2,3,NA,5) >mean(x)[1] NA But>sd(x)Or>var(x)give Error in var(x, na.rm = na.rm) : missing observations in cov/cor There are obvious work-rounds, like>sd(x, is.na(x)==F)which gives the result (with error message) [1] 1.707825 Warning message: the condition has length > 1 and only the first element will be used in: if (na.rm) "complete.obs" else "all.obs" or>y<-subset(x, is.na(x)==F) >sd(y)[1] 1.707825 Am I missing something, or is this a problem with sd()? Why does mean(x) give a simple NA, but var(x) requires some additional work? There was some previous discussion in r-help about this for R v 1.6.0, with mention of a fix for 1.6.1. Dr Roger Dungan School of Biological Sciences University of Cantebury Christchurch, New Zealand ph +64 3 366 7001 ext. 4848 fax +64 3 364 2590 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
The behavior is actually documented in the var() help file (?var), although it's not clear that this would be the behavior if you just read ?sd. sd() does call var if you look at the code, so the behavior should not be unexpected. --Matt> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Roger Dungan > Sent: Monday, October 31, 2005 1:24 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Still a bug with NA in sd() or var()? > > > Dear R-users, > > Running R 2.1.1 in WindowsXP, there seems to be a 'bug' in sd() > If > >x<-c(1,2,3,NA,5) > >mean(x) > [1] NA > > But > >sd(x) > Or > >var(x) > give > Error in var(x, na.rm = na.rm) : missing observations in cov/cor > > There are obvious work-rounds, like > >sd(x, is.na(x)==F) > which gives the result (with error message) > [1] 1.707825 > Warning message: > the condition has length > 1 and only the first element will > be used in: > if (na.rm) "complete.obs" else "all.obs" > > or > > >y<-subset(x, is.na(x)==F) > >sd(y) > [1] 1.707825 > > Am I missing something, or is this a problem with sd()? Why > does mean(x) > give a simple NA, but var(x) requires some additional work? There was > some previous discussion in r-help about this for R v 1.6.0, with > mention of a fix for 1.6.1. > > Dr Roger Dungan > School of Biological Sciences > University of Cantebury > Christchurch, New Zealand > ph +64 3 366 7001 ext. 4848 > fax +64 3 364 2590 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html
Roger Dungan wrote:> [snip]> > There are obvious work-rounds, like > >>sd(x, is.na(x)==F) > > which gives the result (with error message) > [1] 1.707825 > Warning message: > the condition has length > 1 and only the first element will be used in: > if (na.rm) "complete.obs" else "all.obs" >What you are doing here looks very odd to me -- you are passing a vector of logicals as the value for the argument na.rm. This is odd because na.rm should be just a single logical value, not a vector of the same length as x (hence the warning message). Only the first element of that vector is used, so you are passing essentially a random value. By luck, in your example, the first element was T, which is why you got a value of 1.707825 as the result, and not NA. The rest might fall into place when this understanding is cleared up. -- Tony Plate