It has always been my understanding that deviance for GLMs is defined by; D = -2(loglikelihood(model) - loglikelihood(saturated model)) and this can be calculated by (or at least usually is); D = -2(loglikelihood(model)) As is done so in the code for 'polr' by Brian Ripley (in the package 'MASS') where the -loglikehood is minimised using optim; res <- optim(s0, fmin, gmin, method = "BFGS", hessian = Hess, ...) . . . deviance <- 2 * res$value If so, why is it that;> x = rnorm(10)> y = rpois(10,lam=exp(1 + 2*x))> test = glm(formula = y ~ x, family = poisson)> deviance(test)[1] 5.483484> -2*logLik(test)[1] 36.86335 I'm clearly not understanding something here, can anyone shed any light? Why is; -2*logLik(test) =/= deviance(test) ??? I think this is something that is poorly understood all over the internet (at least from my google searches anyway!) Thanks, Jeff ----------------------------------------- ********************************************** Confidentiality: The contents of this e-mail and any attachments transmitted with it are intended to be confidential to the intended recipient; and may be privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. This e-mail is sent by a William Hill PLC group company. The William Hill group companies include, among others, William Hill PLC (registered number 4212563), William Hill Organization Limited (registered number 278208), William Hill Credit Limited (registered number 413846), WHG (International) Limited (registered number 99191) and WHG Trading Limited (registered number 101439). Each of William Hill PLC, William Hill Organization Limited and William Hill Credit Limited is registered in England and Wales and has its registered office at Greenside House, 50 Station Road, Wood Green, London N22 7TP. Each of WHG (International) Limited and WHG Trading Limited is registered in Gibraltar and has its registered office at 6/1 Waterport Place, Gibraltar. Unless specifically indicated otherwise, the contents of this e-mail are subject to contract; and are not an official statement, and do not necessarily represent the views, of William Hill PLC, its subsidiaries or affiliated companies. Please note that neither William Hill PLC, nor its subsidiaries and affiliated companies can accept any responsibility for any viruses contained within this e-mail and it is your responsibility to scan any emails and their attachments. William Hill PLC, its subsidiaries and affiliated companies may monitor e-mail traffic data and also the content of e-mails for effective operation of the e-mail system, or for security, purposes. ********************************************* [[alternative HTML version deleted]]
As you mentioned, the deviance does not always reduce to: D = -2(loglikelihood(model)) It does for ungrouped data, such as for binary logistic regression. So let's stick with the original definition. In this case, we need the log-likelihood for the saturated model. x = rnorm(10) y = rpois(10,lam=exp(1 + 2*x)) test = glm(formula = y ~ x, family = poisson) sm <- glm(y ~ factor(1:10),family=poisson) mydev <- as.numeric(2*(logLik(sm)-logLik(test))) mydev deviance(test) On Fri, Apr 15, 2011 at 7:00 AM, Jeffrey Pollock <jpollock at williamhill.co.uk> wrote:> It has always been my understanding that deviance for GLMs is defined > by; > > > > D = ?-2(loglikelihood(model) - loglikelihood(saturated model)) > > > > and this can be calculated by (or at least usually is); > > > > D = -2(loglikelihood(model)) > > > > As is done so in the code for 'polr' by Brian Ripley (in the package > 'MASS') where the -loglikehood is minimised using optim; > > > > res <- optim(s0, fmin, gmin, method = "BFGS", hessian = Hess, ...) > > . > > . > > . > > deviance <- 2 * res$value > > > > If so, why is it that; > > > >> x = rnorm(10) > >> y = rpois(10,lam=exp(1 + 2*x)) > >> test = glm(formula = y ~ x, family = poisson) > >> deviance(test) > > [1] 5.483484 > >> -2*logLik(test) > > [1] 36.86335 > > > > I'm clearly not understanding something here, can anyone shed any light? > Why is; > > > > -2*logLik(test) =/= deviance(test) ??? > > > > I think this is something that is poorly understood all over the > internet (at least from my google searches anyway!) > > > > Thanks, > > > > Jeff > > > > ----------------------------------------- ********************************************** Confidentiality: The contents of this e-mail and any attachments transmitted with it are intended to be confidential to the intended recipient; and may be privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. This e-mail is sent by a William Hill PLC group company. The William Hill group companies include, among others, William Hill PLC (registered number 4212563), William Hill Organization Limited (registered number 278208), William Hill Credit Limited (registered number 413846), WHG (International) Limited (registered number 99191) and WHG Trading Limited (registered number 101439). Each of William Hill PLC, William Hill Organization Limited and William Hill Credit Limited is registered in Engl! > ?and and Wales and has its registered office at Greenside House, 50 Station Road, Wood Green, London N22 7TP. Each of WHG (International) Limited and WHG Trading Limited is registered in Gibraltar and has its registered office at 6/1 Waterport Place, Gibraltar. Unless specifically indicated otherwise, the contents of this e-mail are subject to contract; and are not an official statement, and do not necessarily represent the views, of William Hill PLC, its subsidiaries or affiliated companies. Please note that neither William Hill PLC, nor its subsidiaries and affiliated companies can accept any responsibility for any viruses contained within this e-mail and it is your responsibility to scan any emails and their attachments. William Hill PLC, its subsidiaries and affiliated companies may monitor e-mail traffic data and also the content of e-mails for effective operation of the e-mail system, or for security, purposes. ********************************************* > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Apr 21, 2011, at 05:14 , Juliet Hannah wrote:> As you mentioned, the deviance does not always reduce to: > > D = -2(loglikelihood(model)) > > It does for ungrouped data, such as for binary logistic regression.To be precise, it only happens when the log likelihood of the saturated model is 0, which for discrete models implies that the probability of the observed data under the saturated model is 1. Binary data is pretty much the _only_ case where this is true (because individual fitted probabilities become either zero or one). -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
First of all thank you both for the replies, my understanding is a lot clearer after thinking about the example you showed. One further question though, looking at the code for 'polr' in MASS suggests that ordinal (and I would guess nominal too) data with levels >2 (ie not binary) also has a saturated model log-likelihood of 0;> res <- optim(s0, fmin, gmin, method = "BFGS", hessian = Hess, ...) > . > . > . > deviance <- 2 * res$valueSo am I right in saying that Binary data isnt the only case where this is true? It would make sense to me that for a multinomial model you could have a unique factor for each data point and thus be able to create a likelihood of 1. I have an example of this similar to the poisson one;> library(nnet) > y <- sample(1:3,replace=TRUE,size=10) > factor <- as.factor(1:10) > mod <- multinom(y~factor)# weights: 33 (20 variable) initial value 10.986123 iter 10 value 0.073035 final value 0.000086 converged> modCall: multinom(formula = y ~ factor) Coefficients: (Intercept) factor2 factor3 factor4 factor5 factor6 factor7 factor8 factor9 factor10 2 -14.33414 -21.50626 28.45961 28.45961 -9.18573 -9.18573 -9.18573 -9.18573 28.45961 -9.18573 3 -12.71240 -27.36982 -14.22512 -14.22512 23.75682 23.75682 23.75682 23.75682 -14.22512 23.75682 Residual Deviance: 0.0001713779 AIC: 40.00017> logLik(mod)'log Lik.' -8.568897e-05 (df=20)> cbind(y,factor)y factor [1,] 1 1 [2,] 1 2 [3,] 2 3 [4,] 2 4 [5,] 3 5 [6,] 3 6 [7,] 3 7 [8,] 3 8 [9,] 2 9 [10,] 3 10 My understanding of this is that; if I observe a factor of '1' then the model will say with probability 1 that the outcome will be 1 and so on for the other rows in the dataset, and this shows in the estimated coefficients. I think the reason the log-likelihood doesn?t return exactly 0 is that the fitting algorithm used gets suitably close and then stops. Wouldn't make sense to continue the algorithm until the coefficients where either 'Inf' or '-Inf'. Please let me know your thoughts on this. Thanks again, Jeff -----Original Message----- From: peter dalgaard [mailto:pdalgd at gmail.com] Sent: 21 April 2011 09:32 To: Juliet Hannah Cc: Jeffrey Pollock; r-help at r-project.org Subject: Re: [R] GLM output for deviance and loglikelihood On Apr 21, 2011, at 05:14 , Juliet Hannah wrote:> As you mentioned, the deviance does not always reduce to: > > D = -2(loglikelihood(model)) > > It does for ungrouped data, such as for binary logistic regression.To be precise, it only happens when the log likelihood of the saturated model is 0, which for discrete models implies that the probability of the observed data under the saturated model is 1. Binary data is pretty much the _only_ case where this is true (because individual fitted probabilities become either zero or one). -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com ----------------------------------------- ********************************************** Confidentiality: The contents of this e-mail and any attachments transmitted with it are intended to be confidential to the intended recipient; and may be privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. This e-mail is sent by a William Hill PLC group company. The William Hill group companies include, among others, William Hill PLC (registered number 4212563), William Hill Organization Limited (registered number 278208), William Hill Credit Limited (registered number 413846), WHG (International) Limited (registered number 99191) and WHG Trading Limited (registered number 101439). Each of William Hill PLC, William Hill Organization Limited and William Hill Credit Limited is registered in England and Wales and has its registered office at Greenside House, 50 Station Road, Wood Green, London N22 7TP. Each of WHG (International) Limited and WHG Trading Limited is registered in Gibraltar and has its registered office at 6/1 Waterport Place, Gibraltar. Unless specifically indicated otherwise, the contents of this e-mail are subject to contract; and are not an official statement, and do not necessarily represent the views, of William Hill PLC, its subsidiaries or affiliated companies. Please note that neither William Hill PLC, nor its subsidiaries and affiliated companies can accept any responsibility for any viruses contained within this e-mail and it is your responsibility to scan any emails and their attachments. William Hill PLC, its subsidiaries and affiliated companies may monitor e-mail traffic data and also the content of e-mails for effective operation of the e-mail system, or for security, purposes. *********************************************