thr3ads.net - R help - [R] How to obtain individual log-likelihood value from glm? [Aug 2020]

If this information is useful, please help other people find it:
Share via:

John Smith

2020-Aug-29 15:18 UTC

[R] How to obtain individual log-likelihood value from glm?

Thanks for very insightful thoughts. What I am trying to achieve with the
weights is actually not new, something like
https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances.
I thought my inquiry was not too strange, and I could utilize some existing
codes. It is just an optimization problem at the end of day, or not? Thanks

On Sat, Aug 29, 2020 at 9:02 AM John Fox <jfox at mcmaster.ca> wrote:
> Dear John,
>
> On 2020-08-29 1:30 a.m., John Smith wrote:
> > Thanks Prof. Fox.
> >
> > I am curious: what is the model estimated below?
>
> Nonsense, as Peter explained in a subsequent response to your prior
> posting.
>
> >
> > I guess my inquiry seems more complicated than I thought: with y being
> 0/1, how to fit weighted logistic regression with weights <1, in the
sense
> of weighted least squares? Thanks
>
> What sense would that make? WLS is meant to account for non-constant
> error variance in a linear model, but in a binomial GLM, the variance is
> purely a function for the mean.
>
> If you had binomial (rather than binary 0/1) observations (i.e.,
> binomial trials exceeding 1), then you could account for overdispersion,
> e.g., by introducing a dispersion parameter via the quasibinomial
> family, but that isn't equivalent to variance weights in a LM, rather
to
> the error-variance parameter in a LM.
>
> I guess the question is what are you trying to achieve with the weights?
>
> Best,
>   John
>
> >
> >> On Aug 28, 2020, at 10:51 PM, John Fox <jfox at mcmaster.ca>
wrote:
> >>
> >> Dear John
> >>
> >> I think that you misunderstand the use of the weights argument to
glm()
> for a binomial GLM. From ?glm: "For a binomial GLM prior weights are
used
> to give the number of trials when the response is the proportion of
> successes." That is, in this case y should be the observed proportion
of
> successes (i.e., between 0 and 1) and the weights are integers giving the
> number of trials for each binomial observation.
> >>
> >> I hope this helps,
> >> John
> >>
> >> John Fox, Professor Emeritus
> >> McMaster University
> >> Hamilton, Ontario, Canada
> >> web: https://socialsciences.mcmaster.ca/jfox/
> >>
> >>> On 2020-08-28 9:28 p.m., John Smith wrote:
> >>> If the weights < 1, then we have different values! See an
example
> below.
> >>> How  should I interpret logLik value then?
> >>> set.seed(135)
> >>>   y <- c(rep(0, 50), rep(1, 50))
> >>>   x <- rnorm(100)
> >>>   data <- data.frame(cbind(x, y))
> >>>   weights <- c(rep(1, 50), rep(2, 50))
> >>>   fit <- glm(y~x, data, family=binomial(), weights/10)
> >>>   res.dev <- residuals(fit, type="deviance")
> >>>   res2 <- -0.5*res.dev^2
> >>>   cat("loglikelihood value", logLik(fit), sum(res2),
"\n")
> >>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard <pdalgd
at gmail.com>
> wrote:
> >>>> If you don't worry too much about an additive
constant, then half the
> >>>> negative squared deviance residuals should do. (Not quite
sure how
> weights
> >>>> factor in. Looks like they are accounted for.)
> >>>>
> >>>> -pd
> >>>>
> >>>>> On 25 Aug 2020, at 17:33 , John Smith <jswhct at
gmail.com> wrote:
> >>>>>
> >>>>> Dear R-help,
> >>>>>
> >>>>> The function logLik can be used to obtain the maximum
log-likelihood
> >>>> value
> >>>>> from a glm object. This is an aggregated value, a
summation of
> individual
> >>>>> log-likelihood values. How do I obtain individual
values? In the
> >>>> following
> >>>>> example, I would expect 9 numbers since the response
has length 9. I
> >>>> could
> >>>>> write a function to compute the values, but there are
lots of
> >>>>> family members in glm, and I am trying not to reinvent
wheels.
> Thanks!
> >>>>>
> >>>>> counts <- c(18,17,15,20,10,20,25,13,12)
> >>>>>      outcome <- gl(3,1,9)
> >>>>>      treatment <- gl(3,3)
> >>>>>      data.frame(treatment, outcome, counts) # showing
data
> >>>>>      glm.D93 <- glm(counts ~ outcome + treatment,
family = poisson())
> >>>>>      (ll <- logLik(glm.D93))
> >>>>>
> >>>>>        [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained,
reproducible code.
> >>>>
> >>>> --
> >>>> Peter Dalgaard, Professor,
> >>>> Center for Statistics, Copenhagen Business School
> >>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> >>>> Phone: (+45)38153501
> >>>> Office: A 4.23
> >>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>     [[alternative HTML version deleted]]
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
>
	[[alternative HTML version deleted]]

John Smith

2020-Aug-29 17:23 UTC

head link

[R] How to obtain individual log-likelihood value from glm?

In the book Modern Applied Statistics with S, 4th edition, 2002, by
Venables and Ripley, there is a function logitreg on page 445, which does
provide the weighted logistic regression I asked, judging by the loss
function. And interesting enough, logitreg provides the same coefficients
as glm in the example I provided earlier, even with weights < 1. Also for
residual deviance, logitreg yields the same number as glm. Unless I
misunderstood something, I am convinced that glm is a valid tool for
weighted logistic regression despite the description on weights and somehow
questionable logLik value in the case of non-integer weights < 1. Perhaps
this is a bold claim: the description of weights can be modified and logLik
can be updated as well.

The stackexchange inquiry I provided is what I feel interesting, not the
link in that post. Sorry for the confusion.

On Sat, Aug 29, 2020 at 10:18 AM John Smith <jswhct at gmail.com> wrote:
> Thanks for very insightful thoughts. What I am trying to achieve with the
> weights is actually not new, something like
>
https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances.
> I thought my inquiry was not too strange, and I could utilize some existing
> codes. It is just an optimization problem at the end of day, or not? Thanks
>
> On Sat, Aug 29, 2020 at 9:02 AM John Fox <jfox at mcmaster.ca> wrote:
>
>> Dear John,
>>
>> On 2020-08-29 1:30 a.m., John Smith wrote:
>> > Thanks Prof. Fox.
>> >
>> > I am curious: what is the model estimated below?
>>
>> Nonsense, as Peter explained in a subsequent response to your prior
>> posting.
>>
>> >
>> > I guess my inquiry seems more complicated than I thought: with y
being
>> 0/1, how to fit weighted logistic regression with weights <1, in the
sense
>> of weighted least squares? Thanks
>>
>> What sense would that make? WLS is meant to account for non-constant
>> error variance in a linear model, but in a binomial GLM, the variance
is
>> purely a function for the mean.
>>
>> If you had binomial (rather than binary 0/1) observations (i.e.,
>> binomial trials exceeding 1), then you could account for
overdispersion,
>> e.g., by introducing a dispersion parameter via the quasibinomial
>> family, but that isn't equivalent to variance weights in a LM,
rather to
>> the error-variance parameter in a LM.
>>
>> I guess the question is what are you trying to achieve with the
weights?
>>
>> Best,
>>   John
>>
>> >
>> >> On Aug 28, 2020, at 10:51 PM, John Fox <jfox at
mcmaster.ca> wrote:
>> >>
>> >> Dear John
>> >>
>> >> I think that you misunderstand the use of the weights argument
to
>> glm() for a binomial GLM. From ?glm: "For a binomial GLM prior
weights are
>> used to give the number of trials when the response is the proportion
of
>> successes." That is, in this case y should be the observed
proportion of
>> successes (i.e., between 0 and 1) and the weights are integers giving
the
>> number of trials for each binomial observation.
>> >>
>> >> I hope this helps,
>> >> John
>> >>
>> >> John Fox, Professor Emeritus
>> >> McMaster University
>> >> Hamilton, Ontario, Canada
>> >> web: https://socialsciences.mcmaster.ca/jfox/
>> >>
>> >>> On 2020-08-28 9:28 p.m., John Smith wrote:
>> >>> If the weights < 1, then we have different values! See
an example
>> below.
>> >>> How  should I interpret logLik value then?
>> >>> set.seed(135)
>> >>>   y <- c(rep(0, 50), rep(1, 50))
>> >>>   x <- rnorm(100)
>> >>>   data <- data.frame(cbind(x, y))
>> >>>   weights <- c(rep(1, 50), rep(2, 50))
>> >>>   fit <- glm(y~x, data, family=binomial(), weights/10)
>> >>>   res.dev <- residuals(fit, type="deviance")
>> >>>   res2 <- -0.5*res.dev^2
>> >>>   cat("loglikelihood value", logLik(fit),
sum(res2), "\n")
>> >>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard
<pdalgd at gmail.com>
>> wrote:
>> >>>> If you don't worry too much about an additive
constant, then half the
>> >>>> negative squared deviance residuals should do. (Not
quite sure how
>> weights
>> >>>> factor in. Looks like they are accounted for.)
>> >>>>
>> >>>> -pd
>> >>>>
>> >>>>> On 25 Aug 2020, at 17:33 , John Smith <jswhct
at gmail.com> wrote:
>> >>>>>
>> >>>>> Dear R-help,
>> >>>>>
>> >>>>> The function logLik can be used to obtain the
maximum log-likelihood
>> >>>> value
>> >>>>> from a glm object. This is an aggregated value, a
summation of
>> individual
>> >>>>> log-likelihood values. How do I obtain individual
values? In the
>> >>>> following
>> >>>>> example, I would expect 9 numbers since the
response has length 9. I
>> >>>> could
>> >>>>> write a function to compute the values, but there
are lots of
>> >>>>> family members in glm, and I am trying not to
reinvent wheels.
>> Thanks!
>> >>>>>
>> >>>>> counts <- c(18,17,15,20,10,20,25,13,12)
>> >>>>>      outcome <- gl(3,1,9)
>> >>>>>      treatment <- gl(3,3)
>> >>>>>      data.frame(treatment, outcome, counts) #
showing data
>> >>>>>      glm.D93 <- glm(counts ~ outcome +
treatment, family >> poisson())
>> >>>>>      (ll <- logLik(glm.D93))
>> >>>>>
>> >>>>>        [[alternative HTML version deleted]]
>> >>>>>
>> >>>>> ______________________________________________
>> >>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>> PLEASE do read the posting guide
>> >>>> http://www.R-project.org/posting-guide.html
>> >>>>> and provide commented, minimal, self-contained,
reproducible code.
>> >>>>
>> >>>> --
>> >>>> Peter Dalgaard, Professor,
>> >>>> Center for Statistics, Copenhagen Business School
>> >>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> >>>> Phone: (+45)38153501
>> >>>> Office: A 4.23
>> >>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>     [[alternative HTML version deleted]]
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained,
reproducible code.
>>
>
	[[alternative HTML version deleted]]

John Fox

2020-Aug-29 18:14 UTC

head link

[R] How to obtain individual log-likelihood value from glm?

Dear John,

If you look at the code for logitreg() in the MASS text, you'll see that 
the casewise components of the log-likelihood are multiplied by the 
corresponding weights. As far as I can see, this only makes sense if the 
weights are binomial trials. Otherwise, while the coefficients 
themselves will be the same as obtained for proportionally similar 
integer weights (e.g., using your weights rather than weights/10), 
quantities such as the maximized log-likelihood, deviance, and 
coefficient standard errors will be uninterpretable.

logitreg() is simply another way to compute the MLE, using a 
general-purpose optimizer rather than than iteratively weighted 
least-squares, which is what glm() uses. That the two functions provide 
the same answer within rounding error is unsurprising -- they're solving 
the same problem. A difference between the two functions is that glm() 
issues a warning about non-integer weights, while logitreg() doesn't. As 
I understand it, the motivation for writing logitreg() is to provide a 
function that could easily be modified, e.g., to impose parameter 
constraints on the solution.

I think that this discussion has gotten unproductive. If you feel that 
proceeding with noninteger weights makes sense, for a reason that I 
don't understand, then you should go ahead.

Best,
  John

On 2020-08-29 1:23 p.m., John Smith wrote:> In the book Modern Applied Statistics with S, 4th edition, 2002, by 
> Venables and Ripley, there is a function logitreg on page 445, which 
> does provide the weighted logistic regression I asked, judging by the 
> loss function. And interesting enough, logitreg?provides the same 
> coefficients as glm in the example I provided earlier, even with weights 
> < 1. Also for residual?deviance, logitreg?yields the same number as glm.
> Unless I misunderstood something, I am convinced that glm is a 
> valid?tool for weighted logistic regression despite the description on 
> weights and somehow questionable logLik value in the case of non-integer 
> weights < 1. Perhaps this is a bold claim: the description of weights 
> can be modified and logLik can be updated as well.
> 
> The stackexchange inquiry I provided is what I feel interesting, not the 
> link in that post. Sorry for the confusion.
> 
> On Sat, Aug 29, 2020 at 10:18 AM John Smith <jswhct at gmail.com 
> <mailto:jswhct at gmail.com>> wrote:
> 
>     Thanks for very insightful thoughts. What I am trying?to achieve
>     with the weights is actually not new, something like
>    
https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances.
>     I thought my inquiry was not too strange, and I could utilize some
>     existing codes. It is just an optimization problem at the end of
>     day, or not? Thanks
> 
>     On Sat, Aug 29, 2020 at 9:02 AM John Fox <jfox at mcmaster.ca
>     <mailto:jfox at mcmaster.ca>> wrote:
> 
>         Dear John,
> 
>         On 2020-08-29 1:30 a.m., John Smith wrote:
>          > Thanks Prof. Fox.
>          >
>          > I am curious: what is the model estimated below?
> 
>         Nonsense, as Peter explained in a subsequent response to your
>         prior posting.
> 
>          >
>          > I guess my inquiry seems more complicated than I thought:
>         with y being 0/1, how to fit weighted logistic regression with
>         weights <1, in the sense of weighted least squares? Thanks
> 
>         What sense would that make? WLS is meant to account for
>         non-constant
>         error variance in a linear model, but in a binomial GLM, the
>         variance is
>         purely a function for the mean.
> 
>         If you had binomial (rather than binary 0/1) observations (i.e.,
>         binomial trials exceeding 1), then you could account for
>         overdispersion,
>         e.g., by introducing a dispersion parameter via the quasibinomial
>         family, but that isn't equivalent to variance weights in a LM,
>         rather to
>         the error-variance parameter in a LM.
> 
>         I guess the question is what are you trying to achieve with the
>         weights?
> 
>         Best,
>          ? John
> 
>          >
>          >> On Aug 28, 2020, at 10:51 PM, John Fox <jfox at
mcmaster.ca
>         <mailto:jfox at mcmaster.ca>> wrote:
>          >>
>          >> Dear John
>          >>
>          >> I think that you misunderstand the use of the weights
>         argument to glm() for a binomial GLM. From ?glm: "For a
binomial
>         GLM prior weights are used to give the number of trials when the
>         response is the proportion of successes." That is, in this
case
>         y should be the observed proportion of successes (i.e., between
>         0 and 1) and the weights are integers giving the number of
>         trials for each binomial observation.
>          >>
>          >> I hope this helps,
>          >> John
>          >>
>          >> John Fox, Professor Emeritus
>          >> McMaster University
>          >> Hamilton, Ontario, Canada
>          >> web: https://socialsciences.mcmaster.ca/jfox/
>          >>
>          >>> On 2020-08-28 9:28 p.m., John Smith wrote:
>          >>> If the weights < 1, then we have different values!
See an
>         example below.
>          >>> How? should I interpret logLik value then?
>          >>> set.seed(135)
>          >>>? ?y <- c(rep(0, 50), rep(1, 50))
>          >>>? ?x <- rnorm(100)
>          >>>? ?data <- data.frame(cbind(x, y))
>          >>>? ?weights <- c(rep(1, 50), rep(2, 50))
>          >>>? ?fit <- glm(y~x, data, family=binomial(),
weights/10)
>          >>> res.dev <http://res.dev> <- residuals(fit,
type="deviance")
>          >>>? ?res2 <- -0.5*res.dev <http://res.dev>^2
>          >>>? ?cat("loglikelihood value", logLik(fit),
sum(res2), "\n")
>          >>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard
>         <pdalgd at gmail.com <mailto:pdalgd at gmail.com>>
wrote:
>          >>>> If you don't worry too much about an additive
constant,
>         then half the
>          >>>> negative squared deviance residuals should do.
(Not quite
>         sure how weights
>          >>>> factor in. Looks like they are accounted for.)
>          >>>>
>          >>>> -pd
>          >>>>
>          >>>>> On 25 Aug 2020, at 17:33 , John Smith
<jswhct at gmail.com
>         <mailto:jswhct at gmail.com>> wrote:
>          >>>>>
>          >>>>> Dear R-help,
>          >>>>>
>          >>>>> The function logLik can be used to obtain the
maximum
>         log-likelihood
>          >>>> value
>          >>>>> from a glm object. This is an aggregated
value, a
>         summation of individual
>          >>>>> log-likelihood values. How do I obtain
individual values?
>         In the
>          >>>> following
>          >>>>> example, I would expect 9 numbers since the
response has
>         length 9. I
>          >>>> could
>          >>>>> write a function to compute the values, but
there are lots of
>          >>>>> family members in glm, and I am trying not to
reinvent
>         wheels. Thanks!
>          >>>>>
>          >>>>> counts <- c(18,17,15,20,10,20,25,13,12)
>          >>>>>? ? ? outcome <- gl(3,1,9)
>          >>>>>? ? ? treatment <- gl(3,3)
>          >>>>>? ? ? data.frame(treatment, outcome, counts) #
showing data
>          >>>>>? ? ? glm.D93 <- glm(counts ~ outcome +
treatment, family
>         = poisson())
>          >>>>>? ? ? (ll <- logLik(glm.D93))
>          >>>>>
>          >>>>>? ? ? ? [[alternative HTML version deleted]]
>          >>>>>
>          >>>>>
______________________________________________
>          >>>>> R-help at r-project.org <mailto:R-help at
r-project.org>
>         mailing list -- To UNSUBSCRIBE and more, see
>          >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>          >>>>> PLEASE do read the posting guide
>          >>>> http://www.R-project.org/posting-guide.html
>          >>>>> and provide commented, minimal,
self-contained,
>         reproducible code.
>          >>>>
>          >>>> --
>          >>>> Peter Dalgaard, Professor,
>          >>>> Center for Statistics, Copenhagen Business School
>          >>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>          >>>> Phone: (+45)38153501
>          >>>> Office: A 4.23
>          >>>> Email: pd.mes at cbs.dk <mailto:pd.mes at
cbs.dk>? Priv:
>         PDalgd at gmail.com <mailto:PDalgd at gmail.com>
>          >>>>
>          >>>>
>          >>>>
>          >>>>
>          >>>>
>          >>>>
>          >>>>
>          >>>>
>          >>>>
>          >>>>
>          >>>? ? ?[[alternative HTML version deleted]]
>          >>> ______________________________________________
>          >>> R-help at r-project.org <mailto:R-help at
r-project.org> mailing
>         list -- To UNSUBSCRIBE and more, see
>          >>> https://stat.ethz.ch/mailman/listinfo/r-help
>          >>> PLEASE do read the posting guide
>         http://www.R-project.org/posting-guide.html
>          >>> and provide commented, minimal, self-contained,
>         reproducible code.
>

R help - Aug 2020 - How to obtain individual log-likelihood value from glm?

[R] How to obtain individual log-likelihood value from glm?

[R] How to obtain individual log-likelihood value from glm?

[R] How to obtain individual log-likelihood value from glm?