thr3ads.net - R help - [R] glm-logistic on discrete-time methods with individual and aggregated data [Feb 2006]

If this information is useful, please help other people find it:
Share via:

Camarda, Carlo Giovanni

2006-Feb-01 18:31 UTC

[R] glm-logistic on discrete-time methods with individual and aggregated data

Dear R-Users,
without going into details I tried to prepare a simple example to show
you where I would need help.
In particular I prepare two examples-template for a study I'm conduction
on discrete-time methods for survival analysis.
Each of this example has two datasets which are basically equal, with
the exception that in the former one has individual data and in the
latter one aggregated data.
The difference between the two examples is on a single subject: I
substituted to the first example a censored case with a subject who died
at the first time-unit.
Afterward I fitted a logistic model (Fahrmeir and Tutz, 2001) in the glm
context, but whereas there is not difference between individual and
aggregated dataset in the first example, I noted some discrepancies in
the second example. I might guess that something with weights is going
on, but I did not manage to clearly understand.
Hope that the following example will be more clear than my explanations,
Thanks in advance,
Carlo Giovanni Camarda


rm(list = ls())
# working one

timesIND  <- c(rep(1:4, 3), 1, rep(1:2,2), rep(1:3     , 2), rep(1:4,
2))
statusIND <- c(rep(0  ,12), 1, rep(0:1,2), rep(c(0,0,1), 2),
rep(c(0,0,0,1),2))
datiIND <- as.data.frame(cbind(timesIND, statusIND))
datiIND$timesIND <- as.factor(datiIND$timesIND)

timesAGG  <- c(  1:4,    1,   1:2,      1:3,       1:4)
statusAGG <- c(rep(0,4), 1,   0:1,    c(0,0,1), c(0,0,0,1))
weightAGG <- c(rep(3,4), 1, rep(2,2), rep(2,3), rep(2,4))   
datiAGG <- as.data.frame(cbind(timesAGG, statusAGG, weightAGG))
datiAGG$timesAGG <- as.factor(datiAGG$timesAGG)

coef(glm(statusIND ~ timesIND, family=binomial, data=datiIND))
coef(glm(statusAGG ~ timesAGG, family=binomial, data=datiAGG,
weights=weightAGG))

# not working one

timesINDa  <- c(rep(1:4, 4), rep(1:2,2), rep(1:3     , 2), rep(1:4,
2))
statusINDa <- c(rep(0  ,16), rep(0:1,2), rep(c(0,0,1), 2),
rep(c(0,0,0,1),2))
datiINDa <- as.data.frame(cbind(timesINDa, statusINDa))
datiINDa$timesINDa <- as.factor(datiINDa$timesINDa)

timesAGGa  <- c( 1:4,        1:2,     1:3,      1:4)
statusAGGa <- c(rep(0,4),    0:1,   c(0,0,1), c(0,0,0,1))
weightAGGa <- c(rep(4,4), rep(2,2), rep(2,3),  rep(2,4))   
datiAGGa <- as.data.frame(cbind(timesAGGa, statusAGGa, weightAGGa))
datiAGGa$timesAGGa <- as.factor(datiAGGa$timesAGGa)

coef(glm(statusINDa ~ timesINDa, family=binomial, data=datiINDa))
coef(glm(statusAGGa ~ timesAGGa, family=binomial, data=datiAGGa,
weights=weightAGGa))

+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

Spencer Graves

2006-Feb-05 22:02 UTC

head link

[R] glm-logistic on discrete-time methods with individual and aggregated data

You've found a region of infinite extent over which the likelihood 
function is for all practical purposes flat.  This means that the 
maximum likelihood estimates (MLEs) are not unique.  To see this 
consider the following properties of your datiINDa:

 > with(datiINDa, table(statusINDa, timesINDa))
           timesINDa
statusINDa  1  2  3  4
          0 10  8  6  4
          1  0  2  2  2
 > sapply(datiINDa, class)
  timesINDa statusINDa
   "factor"  "numeric"

	  You are estimating 4 parameters, an intercept plus one parameter for 
each level of the factor "timesInda".  The first level occurs only
with
statusINDa = 0, never with statusINDa = 1.  Therefore, the theoretical 
MLE for that level of timesINDa would have slope = +/-Inf (and the 
intercept would also be adjusted to +/-Inf to compensate).  However, glm 
doesn't bother pushing it that far, and gives up with still moderately 
small values for the parameters.  To understand this better, first 
modify your example to store the glm fitted object as follows:

fit.a <- glm(statusINDa ~ timesINDa, family=binomial, data=datiINDa)

	  Then apply "predict" to that object:

predict(fit.a, type="response")

	  The result is that the 10 cases with timesInda = 1 all have a 
Pr{statusINDa = 1} = 3e-9, which glm thinks is essentially 0 and quits.

	  Now let's do the same with your weighted version:

fit.wa <- glm(statusAGGa ~ timesAGGa, family=binomial, data=datiAGGa,
weights=weightAGGa)
sort(predict(fit.wa, type="response"))

	  Those 10 cases now have Pr{statusINDa = 1} = 5.4e-9.  This is 
essentially the issue of "complete separation".  We can request more 
precision as follows:

 > fit.a3 <- glm(statusINDa ~ timesINDa, family=binomial, data=datiINDa,
+           control=glm.control(epsilon=1e-13,
+             maxit=250))
Warning message:
fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = 
Y, weights = weights, start = start, etastart = etastart,

	  In this case, we get a warning.  For more on this, try 
RSiteSearch("complete separation with logistic regression").

	  Sehr interessant, nicht?
	  hope this helps.
	  spencer graves

Camarda, Carlo Giovanni wrote:
> Dear R-Users,
> without going into details I tried to prepare a simple example to show
> you where I would need help.
> In particular I prepare two examples-template for a study I'm
conduction
> on discrete-time methods for survival analysis.
> Each of this example has two datasets which are basically equal, with
> the exception that in the former one has individual data and in the
> latter one aggregated data.
> The difference between the two examples is on a single subject: I
> substituted to the first example a censored case with a subject who died
> at the first time-unit.
> Afterward I fitted a logistic model (Fahrmeir and Tutz, 2001) in the glm
> context, but whereas there is not difference between individual and
> aggregated dataset in the first example, I noted some discrepancies in
> the second example. I might guess that something with weights is going
> on, but I did not manage to clearly understand.
> Hope that the following example will be more clear than my explanations,
> Thanks in advance,
> Carlo Giovanni Camarda
> 
> 
> rm(list = ls())
> # working one
> 
> timesIND  <- c(rep(1:4, 3), 1, rep(1:2,2), rep(1:3     , 2), rep(1:4,
> 2))
> statusIND <- c(rep(0  ,12), 1, rep(0:1,2), rep(c(0,0,1), 2),
> rep(c(0,0,0,1),2))
> datiIND <- as.data.frame(cbind(timesIND, statusIND))
> datiIND$timesIND <- as.factor(datiIND$timesIND)
> 
> timesAGG  <- c(  1:4,    1,   1:2,      1:3,       1:4)
> statusAGG <- c(rep(0,4), 1,   0:1,    c(0,0,1), c(0,0,0,1))
> weightAGG <- c(rep(3,4), 1, rep(2,2), rep(2,3), rep(2,4))   
> datiAGG <- as.data.frame(cbind(timesAGG, statusAGG, weightAGG))
> datiAGG$timesAGG <- as.factor(datiAGG$timesAGG)
> 
> coef(glm(statusIND ~ timesIND, family=binomial, data=datiIND))
> coef(glm(statusAGG ~ timesAGG, family=binomial, data=datiAGG,
> weights=weightAGG))
> 
> # not working one
> 
> timesINDa  <- c(rep(1:4, 4), rep(1:2,2), rep(1:3     , 2), rep(1:4,
> 2))
> statusINDa <- c(rep(0  ,16), rep(0:1,2), rep(c(0,0,1), 2),
> rep(c(0,0,0,1),2))
> datiINDa <- as.data.frame(cbind(timesINDa, statusINDa))
> datiINDa$timesINDa <- as.factor(datiINDa$timesINDa)
> 
> timesAGGa  <- c( 1:4,        1:2,     1:3,      1:4)
> statusAGGa <- c(rep(0,4),    0:1,   c(0,0,1), c(0,0,0,1))
> weightAGGa <- c(rep(4,4), rep(2,2), rep(2,3),  rep(2,4))   
> datiAGGa <- as.data.frame(cbind(timesAGGa, statusAGGa, weightAGGa))
> datiAGGa$timesAGGa <- as.factor(datiAGGa$timesAGGa)
> 
> coef(glm(statusINDa ~ timesINDa, family=binomial, data=datiINDa))
> coef(glm(statusAGGa ~ timesAGGa, family=binomial, data=datiAGGa,
> weights=weightAGGa))
> 
> +++++
> This mail has been sent through the MPI for Demographic Rese...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R help - Feb 2006 - glm-logistic on discrete-time methods with individual and aggregated data

[R] glm-logistic on discrete-time methods with individual and aggregated data

[R] glm-logistic on discrete-time methods with individual and aggregated data