thr3ads.net - R help - [R] logistic regression (weights) [May 2003]

If this information is useful, please help other people find it:
Share via:

Edoardo M Airoldi

2003-May-31 22:21 UTC

[R] logistic regression (weights)

hi all,
 I am fitting a logistic regression model on binary data.  I care about 
the fitted probabilities, so I am not worried about infinite 
(or non-existent) MLEs.  I use:
> glm(Y~., data=X, weights=wgt, family=binomial(link=logit), maxit=250)
 I understand the three ways to fit model, and in my case Y is a factor,
one column 
> Y <- c(rep("A",679), rep("B",38))
> Y <- as.factor(Y)
  My question is about the weights.  I can use integer weights, which
makes more mathematical sense, and 
> wgt <- c(rep(1,679), rep(17,38)) 
or i can use
> wgt <- c(rep(38/679,679, rep(1,38))
which makes more sense for my problem, but the mathematic is weak as I am
using non integer successes in a bernoulli...  I estimate the accuracy
'out of the bag' over 10000 experiments to get

          | integer wgt          | non-int wgt
 -------- + -------------------- + --------------------
 accuracy | A = 94.9%  B = 82.3% | A = 94.7%  B = 83.3%
 std.dev. |      2.3%      15.4% |      2.6%      13.2%
 avg. AIC | 707                  | 124

 As I understand, non-integer weights are more respectful of what I
observe since instead of augmenting the successes on the rare class, which
I did not observe, they simply down-weight the successes on the populus
class.  The populations can be thought as equal, and only the sample sizes
are unbalanced.
 Predictions also look better, so I was hoping that the continuity of the
Binomial for N in [0,1] ans X in [0,1] could guarantee me that my results
still make sense, but I am not sure.  Any thoughts?
Thanks

Edo

Possibly Parallel Threads

Search for more possibly parallel threads

R help - May 2003 - logistic regression (weights)

[R] logistic regression (weights)

Possibly Parallel Threads

Wisdom of the Ancients