Hi: I remember a post several days ago by Jon Baron, concerning the behavior of sum() when one sets na.rm=TRUE: the result will be a zero sum for a vector of all NA's, as here, for the second row:> ss<- data.frame(x=c(1,NA,3,4),y=c(2,NA,4,NA)) > ssx y 1 1 2 2 NA NA 3 3 4 4 4 NA> apply(ss,1,sum,na.rm=TRUE)1 2 3 4 3 0 7 4 I am rather alarmed by that zero, because I was just about to place the sum function into am apply() on a rather large data management project, where about 5% of my matrix rows have two missing values. Is there a "safe" way to use sum(), so that such zeroes are not created? A safe.sum() that takes arguments just as general as sum()? I mean, I think I could get around this little problem like this, apply(ss,1,function(x){ifelse(all(is.na(x)),NA,sum(!is.na(x))*mean(x,na.rm=T RUE))}) 1 2 3 4 3 NA 7 4 but is there a safer way to write a sum() function? Or, do these zeroes serve some purpose that I am missing? Thanks in advance... Tom -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 25 Apr 2002, Richards, Tom wrote:> Hi: > > I remember a post several days ago by Jon Baron, concerning the > behavior of sum() when one sets na.rm=TRUE: > the result will be a zero sum for a vector of all NA's, as here, for the > second row: > > > ss<- data.frame(x=c(1,NA,3,4),y=c(2,NA,4,NA)) > > ss > x y > 1 1 2 > 2 NA NA > 3 3 4 > 4 4 NA > > > apply(ss,1,sum,na.rm=TRUE) > 1 2 3 4 > 3 0 7 4 > > I am rather alarmed by that zero, because I was just about to place the sum > function into am apply() on a rather large data management project, where > about 5% of my matrix rows have two missing values. Is there a "safe" way > to use sum(), so that such zeroes are not created? A safe.sum() that takes > arguments just as general as sum()? I mean, I think I could get around this > little problem like this, > > apply(ss,1,function(x){ifelse(all(is.na(x)),NA,sum(!is.na(x))*mean(x,na.rm=T > RUE))}) > 1 2 3 4 > 3 NA 7 4 > > but is there a safer way to write a sum() function? Or, do these zeroes > serve some purpose that I am missing?They are the correct answer! The sum of an empty set is zero, by definition. If that is not what you want, then you don't want the sum and should define a function to do what you do want. That might be> apply(ss,1,function(x){z <- x[!is.na(x)]; ifelse(length(z), sum(z), NA)})1 2 3 4 3 NA 7 4 Yours accounts for all missing twice. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, stats.ox.ac.uk/~ripley University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
After stripping NAs, you have a vector of 0 elements. Sum of such vector is 0, not NA. Try: sum(numeric(0)) Andy> -----Original Message----- > From: Richards, Tom [mailto:richards at upci.pitt.edu] > Sent: Thursday, April 25, 2002 11:26 AM > To: r-help at stat.math.ethz.ch > Subject: [R] sum() with na.rm=TRUE, again > > > Hi: > > I remember a post several days ago by Jon Baron, concerning the > behavior of sum() when one sets na.rm=TRUE: > the result will be a zero sum for a vector of all NA's, as > here, for the > second row: > > > ss<- data.frame(x=c(1,NA,3,4),y=c(2,NA,4,NA)) > > ss > x y > 1 1 2 > 2 NA NA > 3 3 4 > 4 4 NA > > > apply(ss,1,sum,na.rm=TRUE) > 1 2 3 4 > 3 0 7 4 > > I am rather alarmed by that zero, because I was just about to > place the sum > function into am apply() on a rather large data management > project, where > about 5% of my matrix rows have two missing values. Is there > a "safe" way > to use sum(), so that such zeroes are not created? A > safe.sum() that takes > arguments just as general as sum()? I mean, I think I could > get around this > little problem like this, > > apply(ss,1,function(x){ifelse(all(is.na(x)),NA,sum(!is.na(x))* > mean(x,na.rm=T > RUE))}) > 1 2 3 4 > 3 NA 7 4 > > but is there a safer way to write a sum() function? Or, do > these zeroes > serve some purpose that I am missing? > Thanks in advance... > > Tom > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"Richards, Tom" <richards at upci.pitt.edu> writes:> but is there a safer way to write a sum() function? Or, do these zeroes > serve some purpose that I am missing? > Thanks in advance...Mathematically, the sum over an empty set is zero. This serves various consistency purposes (sum over disjoint union of two index sets, etc.) Just use something like mysum <- function(x) if (all(is.na(x))) NA else sum(x,na.rm=T) apply(ss,1,mysum) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
It is apparently a good convention to have the some of nothing be zero. But the mean of nothing involves dividing by infinity, so the mean of c(NA,NA,NA) is NaN, which acts like NA for most purposes. So I think the trick is to compute the mean and then multiply by the number of non-missing cases, e.g., apply(matrix1,1,mean,na.rm=T)*apply(!is.na(matrix1),1,sum) but the all() method will work too. Jon Baron -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._