lionel humbert
2006-Mar-23 16:28 UTC
[R] AIC mathematical artefact or computation problem ?
Dear R user, I have made many logistic regression (glm function) with a second order polynomial formula on a data set containing 440 observation of 96 variables. I?ve made the plot of AIC versus the frequency (presence/observations) of each variable and I obtain a nearly perfect arch effect with a symmetric axe for a frequency of 0.5 . I obtain the same effect with deterministic data. Maybe I?ve miss something, but I have found nothing that could explain this in the theoretical calculation. Could it be due to the computation under R or AIC value is a function of frequency ? Thanks for your consideration Lionel Humbert PhD candidate Inter-University Forest Ecology Research Group University of Quebec in Montreal
lionel humbert <humbert.lionel <at> courrier.uqam.ca> writes:> > Dear R user, > > I have made many logistic regression (glm function) with a second order > polynomial formula on a data set containing 440 observation of 96 > variables. I?ve made the plot of AIC versus the frequency > (presence/observations) of each variable and I obtain a nearly perfect > arch effect with a symmetric axe for a frequency of 0.5 . I obtain the > same effect with deterministic data. Maybe I?ve miss something, but I > have found nothing that could explain this in the theoretical > calculation. Could it be due to the computation under R or AIC value is > a function of frequency ? >f <- function(a,b,n=500) { x <- runif(n) y <- rbinom(n,size=1,prob=plogis(a+b*x)) AIC(glm(y~x,family=binomial)) } b <- 0.1 avec <- seq(-5,5,length=50) nsim <- 100 resmat <- matrix(nrow=length(avec),ncol=nsim) for (i in 1:length(a)) { resmat[i,] <- replicate(nsim,f(avec[i],b)) } matplot(resmat) ## or even more simply: x2 <- sapply(avec, function(a) {-sum(dbinom(rbinom(10000,size=1,prob=plogis(a)), size=1,prob=plogis(a),log=TRUE))}) I don't think it's an artifact. The curve basically reflects some function of the variance of the binomial distribution -- the more variance, the lower the likelihood of any particular outcome, the higher the log-likelihood and the AIC. Doing a little math would probably get you the exact form of the curve.