David Freedman
2010-Nov-13 19:50 UTC
[R] aggregate with missing values, data.frame vs formula
It seems that the formula and data.frame forms of aggregate handle missing values differently. For example, (d=data.frame(sex=rep(0:1,each=3), wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50))) x1=aggregate(d, by = list(d$sex), FUN = mean); names(x1)[3:4]=c('list.wt','list.ht') x2=aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); names(x2)[2:3]=c('form.wt','form.ht') cbind(x1,x2) Group.1 sex list.wt list.ht sex form.wt form.ht 1 0 0 110 NA 0 105 15 2 1 1 NA 40 1 205 35 So, the data.frame form deletes gives an NA if there are missing values in the group for the variable with missing values. But, the formula form deletes the entire row (missing and non-missing values) if any of the values are missing. Is this what was intended or the best option ? thanks, david freedman -- View this message in context: http://r.789695.n4.nabble.com/aggregate-with-missing-values-data-frame-vs-formula-tp3041198p3041198.html Sent from the R help mailing list archive at Nabble.com.