Andrew.Hoskins at csiro.au
2014-Oct-23 04:48 UTC
[R] Help with GLM starting values in user defined link function
Hi R-list, I'm trying to fit a binomial GLM with user defined link function (negative exponential), however I seem to be unable to find the correct starting values to initialise such a model. I've tried taking starting values from a logistic and log models fit to the same data and also tried to substitute the intercept from the null model in as the starting value for this model, however all continue to return the same error. Any help or tips with how to determine some valid starting values (or just to confirm that I've specified the link function correctly) would be greatly appreciated. See below for some example code. Andrew ## Example fit of negative exponential binomial GLM ## define link function negexp <- function() { linkfun <- function(mu) 1-exp(-mu) linkinv <- function(eta) -log(1-eta) mu.eta <- function(eta) 1/(1-eta) valideta <- function(eta) TRUE link <- paste0("negexp") structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, valideta = valideta, name = link), class = "link-glm") } ## create some data y <- c(0,0,0,0,0,1,1,1,0,1,0,1,1,0,0,1,1,1,1,0,1,0,1,0,1,0,1,1,1,0,1,0,1,0,0,1,0 ,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,1,1,0,1,0,1,1,0,1,0,1,0,1,1 ,0,1,0,0,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,1,0,1,0,1,0,0) eco <- c(0.30406146,0.77127034,0.27507740,0.34660849,0.10496959,0.87483283,1.34652163,0.59570289,0.76557945 ,1.07105129,0.85681582,1.15885519,0.62478718,0.82327890,0.61921331,1.40337615,1.69376337,0.96662890 ,0.62756558,1.22148480,0.29509170,1.20822702,1.04490241,0.63994034,0.44537652,0.80908805,1.38793219 ,0.68987695,0.65253113,0.10996619,2.18030035,0.95187860,1.91719194,0.55910638,0.42246265,0.99747093 ,0.65609015,1.56408171,0.09024976,0.49430176,0.89651639,0.13943031,0.72264673,0.33212781,0.53156567 ,0.24478163,0.20439708,0.26577897,0.73061755,1.41380646,0.45361391,0.53961802,0.20099582,0.16278695 ,0.51188479,1.23152701,1.45180489,0.16136045,0.84696597,1.06556860,0.31352700,7.54728452,0.47765713 ,1.62966928,0.51514442,0.87203787,0.33515181,0.71407043,0.84767445,0.33640927,0.70331392,0.41617675 ,1.41137914,0.22586531,0.92797131,1.34627407,0.21408341,0.38903027,0.91690877,0.95946623,0.46114617 ,0.62965571,7.50492235,1.96516642,0.61555184,1.24061426,0.95281453,1.02729643,1.44581350,1.63148077 ,1.02291891,0.80319545,0.92136436,1.22428318,0.59172977,1.56985149,0.35790202,2.23402940,0.98565537 ,0.41658919) geog <- c(254.94615,277.78675,3.69047,47.92363,0.90241,449.44532,1795.89910,985.66843,1063.47287 ,160.27883,58.72738,1093.00270,1423.51194,1232.16769,54.56121,4772.54353,1877.95322,1110.18161 ,174.53805,2829.67281,551.22870,1781.67608,495.13007,44.42326,1057.72959,783.99003,3025.58558 ,1855.59219,1715.27590,41.75478,3687.95693,2125.34324,751.42284,1598.04527,1625.35627,848.40949 ,1484.40835,3332.15554,214.99678,136.60188,1388.07919,77.21198,2366.56327,617.31749,1421.72213 ,537.38636,223.57615,256.35456,3022.63678,4783.64718,45.97153,194.79090,2.65647,69.08392 ,948.66990,1480.70503,2805.30205,144.55345,1134.92666,728.04570,1421.45250,827.57959,1517.75701 ,682.77014,1060.09369,448.44398,848.64842,1437.74925,2887.23135,56.28056,725.78408,91.19194 ,1905.87208,749.92265,261.19075,2529.80027,371.16338,1130.14904,802.23348,1851.86105,1274.20599 ,260.79728,1427.11459,3891.82373,482.58143,2011.86414,1310.10546,975.37470,1087.50127,2195.28667 ,2358.70761,44.82955,1553.35558,2261.60567,1216.64486,1674.70189,165.13405,1463.93362,1542.33074 ,1683.01992) testDat <- data.frame(y,eco,geog) ## Fit model (doesn't work) fit <- glm(y ~ eco + geog,data=testDat,family=binomial(negexp())) ## try logistic to get starting values (doesn't work) fit.logit <- glm(y ~ eco + geog,data=testDat,family=binomial(logit)) fit <- glm(y ~ eco + geog,data=testDat,family=binomial(negexp()),start=coef(fit.logit)) ## try quasibinomial log (doesn't work) fit.log <- glm(y ~ eco + geog,data=testDat,family=quasibinomial(log),start=c(-1,0,0)) fit <- glm(y ~ eco + geog,data=testDat,family=binomial(negexp()),start=coef(fit.log)) ## try null with 0.5 for other coefs (doesn't work) fit.null <- glm(y ~ 1,data=testDat,family=binomial(negexp())) fit <- glm(y ~ eco + geog,data=testDat,family=binomial(negexp()),start=c(coef(fit.null),0.5,0.5)) ## try again (null intercept, logit coefs) fit <- glm(y ~ eco + geog,data=testDat,family=binomial(negexp()),start=c(coef(fit.null),coef(fit.logit)[2:3])) ## try again (null intercept, log coefs) fit <- glm(y ~ eco + geog,data=testDat,family=binomial(negexp()),start=c(coef(fit.null),coef(fit.log)[2:3])) Andrew Hoskins Postdoctoral reasearch fellow Ecosystem Sciences CSIRO E Andrew.Hoskins at csiro.au T +61 2 6246 5902 Black Mountain Laboratories Clunies Ross Street, Acton, ACT 2601, Australia www.csiro.au PLEASE NOTE The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email. [[alternative HTML version deleted]]
Ken Knoblauch
2014-Oct-23 15:27 UTC
[R] Help with GLM starting values in user defined link function
<Andrew.Hoskins <at> csiro.au> writes:> I'm trying to fit a binomial GLM with user definedlink function (negative exponential), however I seem to> be unable to find the correct starting values toinitialise such a model. I've tried taking starting> values from a logistic and log models fit to thesame data and also tried to substitute the intercept from> the null model in as the starting value for thismodel, however all continue to return the same error.> > Andrew > > ## Example fit of negative exponential binomialGLM> > ## define link function > negexp <- function() > { > linkfun <- function(mu) 1-exp(-mu) > linkinv <- function(eta) -log(1-eta) > mu.eta <- function(eta) 1/(1-eta) > valideta <- function(eta) TRUE > link <- paste0("negexp") > structure(list(linkfun = linkfun, linkinv = linkinv, > mu.eta = mu.eta, valideta = valideta,name = link),> class = "link-glm") > } >---SNIP--- Take a look at the limits of eta for the extreme values of mu and compare them with the linear predictor of your link applied to say the fitted values of your logit fit. It seems to suggest that two values fall outside the range of valid eta, according to your linkfun: c(62, 83). I got it to work with these removed although there were lots of other warnings that you might have to worry about. Also, when choosing start values you might want to base them on a fit with your link rather than a different one. So, I got start values by trying EV <- negexp()$linkfun(fitted(fit.logit)) LE.lm <- lm(EV ~ eco + geog, testDat) Ec <- coef(LE.lm) with these defined as in your mail (sorry I snipped your code out). So, I found which(fitted(LE.lm) > (1 - exp(-1))) 62 83 62 83 and then glm(y ~ eco + geog, family = binomial(negexp()), data = testDat[-c(62, 83), ], start = Ec) Coefficients: (Intercept) eco geog 1.593e-01 2.085e-01 4.713e-06 Degrees of Freedom: 97 Total (i.e. Null); 95 Residual Null Deviance: 134.4 Residual Deviance: 112.3 AIC: 118.3 There were 27 warnings (use warnings() to see them) HTH> > Andrew Hoskins > Postdoctoral reasearch fellow > Ecosystem Sciences > CSIRO > > E Andrew.Hoskins <at> csiro.au T +61 2 6246 5902 > Black Mountain Laboratories > Clunies Ross Street, Acton, ACT 2601, Australia > www.csiro.au >>-- Kenneth Knoblauch Inserm U846 Stem-cell and Brain Research Institute Department of Integrative Neurosciences 18 avenue du Doyen L?pine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.sbri.fr/members/kenneth-knoblauch.html