Josh Browning
2012-Sep-29 14:10 UTC
[R] Unexpected behavior with weights in binomial glm()
Hi useRs, I'm experiencing something quite weird with glm() and weights, and maybe someone can explain what I'm doing wrong. I have a dataset where each row represents a single case, and I run glm(...,family="binomial") and get my coefficients. However, some of my cases have the exact same values for predictor variables, so I should be able to aggregate up my data frame and run glm(..., family="binomial",weights=wts) and get the same coefficients (maybe this is my incorrect assumption, but I can't see why it would be). Anyways, here's a minimum working example below:> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) ) > glm( RESP ~ INDEP, family="binomial", data=d )Call: glm(formula = RESP ~ INDEP, family = "binomial", data = d) Coefficients: (Intercept) INDEP -1.609 21.176 Degrees of Freedom: 9 Total (i.e. Null); 8 Residual Null Deviance: 13.86 Residual Deviance: 5.407 AIC: 9.407> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length ) > colnames(dAgg) = c("RESP","INDEP","WT") > glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT )Call: glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg, weights = WT) Coefficients: (Intercept) INDEP -1.609 20.975 Degrees of Freedom: 2 Total (i.e. Null); 1 Residual Null Deviance: 13.86 Residual Deviance: 5.407 AIC: 9.407 Thanks for the help! Josh
David Winsemius
2012-Sep-30 01:50 UTC
[R] Unexpected behavior with weights in binomial glm()
On Sep 29, 2012, at 7:10 AM, Josh Browning wrote:> Hi useRs, > > I'm experiencing something quite weird with glm() and weights, and > maybe someone can explain what I'm doing wrong. I have a dataset > where each row represents a single case, and I run > glm(...,family="binomial") and get my coefficients. However, some of > my cases have the exact same values for predictor variables, so I > should be able to aggregate up my data frame and run glm(..., > family="binomial",weights=wts) and get the same coefficients (maybe > this is my incorrect assumption, but I can't see why it would be). > Anyways, here's a minimum working example below: > >> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) ) >> glm( RESP ~ INDEP, family="binomial", data=d ) > > Call: glm(formula = RESP ~ INDEP, family = "binomial", data = d) > > Coefficients: > (Intercept) INDEP > -1.609 21.176 > > Degrees of Freedom: 9 Total (i.e. Null); 8 Residual > Null Deviance: 13.86 > Residual Deviance: 5.407 AIC: 9.407 >> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length ) >> colnames(dAgg) = c("RESP","INDEP","WT") >> glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT ) > > Call: glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg, > weights = WT) > > Coefficients: > (Intercept) INDEP > -1.609 20.975 > > Degrees of Freedom: 2 Total (i.e. Null); 1 Residual > Null Deviance: 13.86 > Residual Deviance: 5.407 AIC: 9.407Those two results look very similar and it is with a data situation that seems somewhat extreme. The concern is for the 1% numerical difference in the regression coefficient? Am I reading you correctly? -- David Winsemius, MD Alameda, CA, USA