John Smith
2020-Aug-29 05:30 UTC
[R] How to obtain individual log-likelihood value from glm?
Thanks Prof. Fox. I am curious: what is the model estimated below? I guess my inquiry seems more complicated than I thought: with y being 0/1, how to fit weighted logistic regression with weights <1, in the sense of weighted least squares? Thanks> On Aug 28, 2020, at 10:51 PM, John Fox <jfox at mcmaster.ca> wrote: > > Dear John > > I think that you misunderstand the use of the weights argument to glm() for a binomial GLM. From ?glm: "For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes." That is, in this case y should be the observed proportion of successes (i.e., between 0 and 1) and the weights are integers giving the number of trials for each binomial observation. > > I hope this helps, > John > > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ > >> On 2020-08-28 9:28 p.m., John Smith wrote: >> If the weights < 1, then we have different values! See an example below. >> How should I interpret logLik value then? >> set.seed(135) >> y <- c(rep(0, 50), rep(1, 50)) >> x <- rnorm(100) >> data <- data.frame(cbind(x, y)) >> weights <- c(rep(1, 50), rep(2, 50)) >> fit <- glm(y~x, data, family=binomial(), weights/10) >> res.dev <- residuals(fit, type="deviance") >> res2 <- -0.5*res.dev^2 >> cat("loglikelihood value", logLik(fit), sum(res2), "\n") >>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard <pdalgd at gmail.com> wrote: >>> If you don't worry too much about an additive constant, then half the >>> negative squared deviance residuals should do. (Not quite sure how weights >>> factor in. Looks like they are accounted for.) >>> >>> -pd >>> >>>> On 25 Aug 2020, at 17:33 , John Smith <jswhct at gmail.com> wrote: >>>> >>>> Dear R-help, >>>> >>>> The function logLik can be used to obtain the maximum log-likelihood >>> value >>>> from a glm object. This is an aggregated value, a summation of individual >>>> log-likelihood values. How do I obtain individual values? In the >>> following >>>> example, I would expect 9 numbers since the response has length 9. I >>> could >>>> write a function to compute the values, but there are lots of >>>> family members in glm, and I am trying not to reinvent wheels. Thanks! >>>> >>>> counts <- c(18,17,15,20,10,20,25,13,12) >>>> outcome <- gl(3,1,9) >>>> treatment <- gl(3,3) >>>> data.frame(treatment, outcome, counts) # showing data >>>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) >>>> (ll <- logLik(glm.D93)) >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> -- >>> Peter Dalgaard, Professor, >>> Center for Statistics, Copenhagen Business School >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>> Phone: (+45)38153501 >>> Office: A 4.23 >>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> [[alternative HTML version deleted]] >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
John Fox
2020-Aug-29 14:02 UTC
[R] How to obtain individual log-likelihood value from glm?
Dear John, On 2020-08-29 1:30 a.m., John Smith wrote:> Thanks Prof. Fox. > > I am curious: what is the model estimated below?Nonsense, as Peter explained in a subsequent response to your prior posting.> > I guess my inquiry seems more complicated than I thought: with y being 0/1, how to fit weighted logistic regression with weights <1, in the sense of weighted least squares? ThanksWhat sense would that make? WLS is meant to account for non-constant error variance in a linear model, but in a binomial GLM, the variance is purely a function for the mean. If you had binomial (rather than binary 0/1) observations (i.e., binomial trials exceeding 1), then you could account for overdispersion, e.g., by introducing a dispersion parameter via the quasibinomial family, but that isn't equivalent to variance weights in a LM, rather to the error-variance parameter in a LM. I guess the question is what are you trying to achieve with the weights? Best, John> >> On Aug 28, 2020, at 10:51 PM, John Fox <jfox at mcmaster.ca> wrote: >> >> Dear John >> >> I think that you misunderstand the use of the weights argument to glm() for a binomial GLM. From ?glm: "For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes." That is, in this case y should be the observed proportion of successes (i.e., between 0 and 1) and the weights are integers giving the number of trials for each binomial observation. >> >> I hope this helps, >> John >> >> John Fox, Professor Emeritus >> McMaster University >> Hamilton, Ontario, Canada >> web: https://socialsciences.mcmaster.ca/jfox/ >> >>> On 2020-08-28 9:28 p.m., John Smith wrote: >>> If the weights < 1, then we have different values! See an example below. >>> How should I interpret logLik value then? >>> set.seed(135) >>> y <- c(rep(0, 50), rep(1, 50)) >>> x <- rnorm(100) >>> data <- data.frame(cbind(x, y)) >>> weights <- c(rep(1, 50), rep(2, 50)) >>> fit <- glm(y~x, data, family=binomial(), weights/10) >>> res.dev <- residuals(fit, type="deviance") >>> res2 <- -0.5*res.dev^2 >>> cat("loglikelihood value", logLik(fit), sum(res2), "\n") >>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard <pdalgd at gmail.com> wrote: >>>> If you don't worry too much about an additive constant, then half the >>>> negative squared deviance residuals should do. (Not quite sure how weights >>>> factor in. Looks like they are accounted for.) >>>> >>>> -pd >>>> >>>>> On 25 Aug 2020, at 17:33 , John Smith <jswhct at gmail.com> wrote: >>>>> >>>>> Dear R-help, >>>>> >>>>> The function logLik can be used to obtain the maximum log-likelihood >>>> value >>>>> from a glm object. This is an aggregated value, a summation of individual >>>>> log-likelihood values. How do I obtain individual values? In the >>>> following >>>>> example, I would expect 9 numbers since the response has length 9. I >>>> could >>>>> write a function to compute the values, but there are lots of >>>>> family members in glm, and I am trying not to reinvent wheels. Thanks! >>>>> >>>>> counts <- c(18,17,15,20,10,20,25,13,12) >>>>> outcome <- gl(3,1,9) >>>>> treatment <- gl(3,3) >>>>> data.frame(treatment, outcome, counts) # showing data >>>>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) >>>>> (ll <- logLik(glm.D93)) >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> -- >>>> Peter Dalgaard, Professor, >>>> Center for Statistics, Copenhagen Business School >>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>>> Phone: (+45)38153501 >>>> Office: A 4.23 >>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> [[alternative HTML version deleted]] >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code.
John Smith
2020-Aug-29 15:18 UTC
[R] How to obtain individual log-likelihood value from glm?
Thanks for very insightful thoughts. What I am trying to achieve with the weights is actually not new, something like https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances. I thought my inquiry was not too strange, and I could utilize some existing codes. It is just an optimization problem at the end of day, or not? Thanks On Sat, Aug 29, 2020 at 9:02 AM John Fox <jfox at mcmaster.ca> wrote:> Dear John, > > On 2020-08-29 1:30 a.m., John Smith wrote: > > Thanks Prof. Fox. > > > > I am curious: what is the model estimated below? > > Nonsense, as Peter explained in a subsequent response to your prior > posting. > > > > > I guess my inquiry seems more complicated than I thought: with y being > 0/1, how to fit weighted logistic regression with weights <1, in the sense > of weighted least squares? Thanks > > What sense would that make? WLS is meant to account for non-constant > error variance in a linear model, but in a binomial GLM, the variance is > purely a function for the mean. > > If you had binomial (rather than binary 0/1) observations (i.e., > binomial trials exceeding 1), then you could account for overdispersion, > e.g., by introducing a dispersion parameter via the quasibinomial > family, but that isn't equivalent to variance weights in a LM, rather to > the error-variance parameter in a LM. > > I guess the question is what are you trying to achieve with the weights? > > Best, > John > > > > >> On Aug 28, 2020, at 10:51 PM, John Fox <jfox at mcmaster.ca> wrote: > >> > >> Dear John > >> > >> I think that you misunderstand the use of the weights argument to glm() > for a binomial GLM. From ?glm: "For a binomial GLM prior weights are used > to give the number of trials when the response is the proportion of > successes." That is, in this case y should be the observed proportion of > successes (i.e., between 0 and 1) and the weights are integers giving the > number of trials for each binomial observation. > >> > >> I hope this helps, > >> John > >> > >> John Fox, Professor Emeritus > >> McMaster University > >> Hamilton, Ontario, Canada > >> web: https://socialsciences.mcmaster.ca/jfox/ > >> > >>> On 2020-08-28 9:28 p.m., John Smith wrote: > >>> If the weights < 1, then we have different values! See an example > below. > >>> How should I interpret logLik value then? > >>> set.seed(135) > >>> y <- c(rep(0, 50), rep(1, 50)) > >>> x <- rnorm(100) > >>> data <- data.frame(cbind(x, y)) > >>> weights <- c(rep(1, 50), rep(2, 50)) > >>> fit <- glm(y~x, data, family=binomial(), weights/10) > >>> res.dev <- residuals(fit, type="deviance") > >>> res2 <- -0.5*res.dev^2 > >>> cat("loglikelihood value", logLik(fit), sum(res2), "\n") > >>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard <pdalgd at gmail.com> > wrote: > >>>> If you don't worry too much about an additive constant, then half the > >>>> negative squared deviance residuals should do. (Not quite sure how > weights > >>>> factor in. Looks like they are accounted for.) > >>>> > >>>> -pd > >>>> > >>>>> On 25 Aug 2020, at 17:33 , John Smith <jswhct at gmail.com> wrote: > >>>>> > >>>>> Dear R-help, > >>>>> > >>>>> The function logLik can be used to obtain the maximum log-likelihood > >>>> value > >>>>> from a glm object. This is an aggregated value, a summation of > individual > >>>>> log-likelihood values. How do I obtain individual values? In the > >>>> following > >>>>> example, I would expect 9 numbers since the response has length 9. I > >>>> could > >>>>> write a function to compute the values, but there are lots of > >>>>> family members in glm, and I am trying not to reinvent wheels. > Thanks! > >>>>> > >>>>> counts <- c(18,17,15,20,10,20,25,13,12) > >>>>> outcome <- gl(3,1,9) > >>>>> treatment <- gl(3,3) > >>>>> data.frame(treatment, outcome, counts) # showing data > >>>>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) > >>>>> (ll <- logLik(glm.D93)) > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> ______________________________________________ > >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>>> -- > >>>> Peter Dalgaard, Professor, > >>>> Center for Statistics, Copenhagen Business School > >>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark > >>>> Phone: (+45)38153501 > >>>> Office: A 4.23 > >>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> [[alternative HTML version deleted]] > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
John Fox
2020-Aug-29 16:39 UTC
[R] How to obtain individual log-likelihood value from glm?
Dear John, On 2020-08-29 11:18 a.m., John Smith wrote:> Thanks for very insightful thoughts. What I am trying to achieve with the > weights is actually not new, something like > https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances. > I thought my inquiry was not too strange, and I could utilize some existing > codes. It is just an optimization problem at the end of day, or not? ThanksSo the object is to fit a regularized (i.e, penalized) logistic regression rather than to fit by ML. glm() won't do that. I took a quick look at the stackexchange link that you provided and the document referenced in that link. The penalty proposed in the document is just a multiple of the sum of squared regression coefficients, what usually called an L2 penalty in the machine-learning literature. There are existing implementations of regularized logistic regression in R -- see the machine learning CRAN taskview <https://cran.r-project.org/web/views/MachineLearning.html>. I believe that the penalized package will fit a regularized logistic regression with an L2 penalty. As well, unless my quick reading was inaccurate, I think that you, and perhaps the stackexchange poster, might have been confused by the terminology used in the document: What's referred to as "weights" in the document is what statisticians more typically call "regression coefficients," and the "bias weight" is the "intercept" or "regression constant." Perhaps I'm missing some connection -- I'm not the best person to ask about machine learning. Best, John> > On Sat, Aug 29, 2020 at 9:02 AM John Fox <jfox at mcmaster.ca> wrote: > >> Dear John, >> >> On 2020-08-29 1:30 a.m., John Smith wrote: >>> Thanks Prof. Fox. >>> >>> I am curious: what is the model estimated below? >> >> Nonsense, as Peter explained in a subsequent response to your prior >> posting. >> >>> >>> I guess my inquiry seems more complicated than I thought: with y being >> 0/1, how to fit weighted logistic regression with weights <1, in the sense >> of weighted least squares? Thanks >> >> What sense would that make? WLS is meant to account for non-constant >> error variance in a linear model, but in a binomial GLM, the variance is >> purely a function for the mean. >> >> If you had binomial (rather than binary 0/1) observations (i.e., >> binomial trials exceeding 1), then you could account for overdispersion, >> e.g., by introducing a dispersion parameter via the quasibinomial >> family, but that isn't equivalent to variance weights in a LM, rather to >> the error-variance parameter in a LM. >> >> I guess the question is what are you trying to achieve with the weights? >> >> Best, >> John >> >>> >>>> On Aug 28, 2020, at 10:51 PM, John Fox <jfox at mcmaster.ca> wrote: >>>> >>>> Dear John >>>> >>>> I think that you misunderstand the use of the weights argument to glm() >> for a binomial GLM. From ?glm: "For a binomial GLM prior weights are used >> to give the number of trials when the response is the proportion of >> successes." That is, in this case y should be the observed proportion of >> successes (i.e., between 0 and 1) and the weights are integers giving the >> number of trials for each binomial observation. >>>> >>>> I hope this helps, >>>> John >>>> >>>> John Fox, Professor Emeritus >>>> McMaster University >>>> Hamilton, Ontario, Canada >>>> web: https://socialsciences.mcmaster.ca/jfox/ >>>> >>>>> On 2020-08-28 9:28 p.m., John Smith wrote: >>>>> If the weights < 1, then we have different values! See an example >> below. >>>>> How should I interpret logLik value then? >>>>> set.seed(135) >>>>> y <- c(rep(0, 50), rep(1, 50)) >>>>> x <- rnorm(100) >>>>> data <- data.frame(cbind(x, y)) >>>>> weights <- c(rep(1, 50), rep(2, 50)) >>>>> fit <- glm(y~x, data, family=binomial(), weights/10) >>>>> res.dev <- residuals(fit, type="deviance") >>>>> res2 <- -0.5*res.dev^2 >>>>> cat("loglikelihood value", logLik(fit), sum(res2), "\n") >>>>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard <pdalgd at gmail.com> >> wrote: >>>>>> If you don't worry too much about an additive constant, then half the >>>>>> negative squared deviance residuals should do. (Not quite sure how >> weights >>>>>> factor in. Looks like they are accounted for.) >>>>>> >>>>>> -pd >>>>>> >>>>>>> On 25 Aug 2020, at 17:33 , John Smith <jswhct at gmail.com> wrote: >>>>>>> >>>>>>> Dear R-help, >>>>>>> >>>>>>> The function logLik can be used to obtain the maximum log-likelihood >>>>>> value >>>>>>> from a glm object. This is an aggregated value, a summation of >> individual >>>>>>> log-likelihood values. How do I obtain individual values? In the >>>>>> following >>>>>>> example, I would expect 9 numbers since the response has length 9. I >>>>>> could >>>>>>> write a function to compute the values, but there are lots of >>>>>>> family members in glm, and I am trying not to reinvent wheels. >> Thanks! >>>>>>> >>>>>>> counts <- c(18,17,15,20,10,20,25,13,12) >>>>>>> outcome <- gl(3,1,9) >>>>>>> treatment <- gl(3,3) >>>>>>> data.frame(treatment, outcome, counts) # showing data >>>>>>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) >>>>>>> (ll <- logLik(glm.D93)) >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>>> -- >>>>>> Peter Dalgaard, Professor, >>>>>> Center for Statistics, Copenhagen Business School >>>>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>>>>> Phone: (+45)38153501 >>>>>> Office: A 4.23 >>>>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> [[alternative HTML version deleted]] >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >