Justin Michell
2014-May-22 11:40 UTC
[R] Non-convergence in boot.stepAIC function with a logit model
Hi all I am getting warning when I try to perform a bootstrap selection procedure on variables (using boot.stepAIC function in the bootStepAIC package). I had previously established which variables were collinear and kept the one which had the lowest AIC following univariate regression on each predictor. I obtain a candidate list of variables that are not correlated at the end of this procedure. I then revisit those variables that were excluded at each step using bootstrapping. I have referred to other list questions (such as http://stackoverflow.com/questions/8596160/why-am-i-getting-algorithm-did-not-converge-and-fitted-prob-numerically-0-or) and I see that this is a common problem with logit models, but convergence fails only in a bootstrapping context. For the first set of previously excluded variables, I added individually each variable to candidate list of variables and then performed bootstrapping, and then added more than one variable to see if the algorithm would converge. Sometimes it did other times not (as indicated by ‘dc’). I suppose the presence of multicollinearity affects this process?>From the next group on of excluded variables I only really considered adding variables separately one at a time and then checked if there was an improvement in AIC. If one of the previously excluded variables is in the candidate list, then i take that variable out and add the previously excluded one and see if there is an improvement in AIC.>From this reasoning I end up adding two new variables to the list. They are not correlated with any of the variables in candidate list, nor are they correlated.My question is, is this a valid way to come up with my best set of predictors? Is there a way I can monitor more closely what is going on, i.e. if multicollinearity in is mathematically causing the algorithm not to converge for some variables? Here is my workflow using the boot.stepAIC function in the the forward stepwise direction (the forward direction seems to be more robust w.r.t convergence): (if reproducible code is required I can happily provide it - via dropbox for the data) #kept altitude (15454.23) (but not in candidate list) and excluded: # meanTemp (14422.72), minTemp (14435.72), bio1 (14767.88), bio6 (didn't converge (dc)), bio8 (15050.46), bio10 (14285.46), bio11 (14655.82), bio18 (15445.24), bio10+bio11 (dc), # bio10 + bio11 + # bio18 (dc), bio10+bio18 (dc), meanTemp + bio10+bio11 (14160.33), minTemp + bio10+bio11 (14204.41), meanTemp + minTemp + bio10+bio11 (14135.49), meanTemp+minTemp + bio10+bio11+bio1(dc), # bio10+bio11+bio1(dc) fit.1 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI + bio5 + bio9 + bio10, weights = Examind, data = spatialVars, family="binomial") bootGLM.1 <- boot.stepAIC(fit.1, spatialVars, direction = "backward", alpha = 0.05, B = 1000) #15445.24 # add bio10 (lowest AIC - 14285.46) # (backward drection dc) # kept bio2 and excluded: bio7 (dc), -bio2 + bio7 (14710.04), -bi02 - bio10 + bio7 (15676.62) fit.2 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI + bio5 + bio9 + bio10, weights = Examind, data = spatialVars, family="binomial") #15302.59 bootGLM.2 <- boot.stepAIC(fit.2, spatialVars, direction = "forward", alpha = 0.05, B = 1000) # keep bio2 in candidate list (+bio10) # kept bio5 and excluded: altitude (dc), -bio5 + altitude (15659.26), -bio5 +maxTemp (15637.91) fit.3 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI + bio5 + bio9 + bio10, weights = Examind, data = spatialVars, family="binomial") bootGLM.3 <- boot.stepAIC(fit.3, spatialVars, direction = "forward", alpha = 0.05, B = 1000) # keep bio5 in candidate list (+bio10) # kept bio17 (not in candidate list) (bio17 (14178.88)) and excluded: bio12 (dc), bio14 (14168.77), bio16 (14287.42), bio19 (14248.65), rain (14287.45), bio17+bio12(14162.3) fit.4 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI + bio5 + bio9 + bio10 + rain, weights = Examind, data = spatialVars, family="binomial") bootGLM.4 <- boot.stepAIC(fit.4, spatialVars, direction = "forward", alpha = 0.05, B = 1000) # add bio14 to candiate list (+bio10) # keptp bio15 (14168.77) (not included in candidate list) and excluded: bio17 (14161.03) fit.5 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI + bio5 + bio9 + bio10 + bio14, weights = Examind, data = spatialVars, family="binomial") bootGLM.5 <- boot.stepAIC(fit.5, spatialVars, direction = "forward", alpha = 0.05, B = 1000) Thanks very much (for any help, advice or thoughts) Justin [[alternative HTML version deleted]]