thr3ads.net - R devel - [Rd] predict.glm returns different results for the same model [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Hadley Wickham

2018-Apr-27 13:25 UTC

[Rd] predict.glm returns different results for the same model

Hi all,

Very surprising (to me!) and mystifying result from predict.glm(): the
predictions vary depending on whether or not I use ns() or
splines::ns(). Reprex follows:

library(splines)

set.seed(12345)
dat <- data.frame(claim = rbinom(1000, 1, 0.5))
mns <- c(3.4, 3.6)
sds <- c(0.24, 0.35)
dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd sds[dat$claim
+ 1]))
dat <- dat[order(dat$wind), ]

m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial)

# The model coefficients are the same
unname(coef(m1))
#> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
unname(coef(m2))
#> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128

# But the predictions are not!
newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
unname(predict(m1, newdata = newdf))
#> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
unname(predict(m2, newdata = newdf))
#> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814

Is this a bug?

(Motivating issue from this ggplot2 bug report:
https://github.com/tidyverse/ggplot2/issues/2426)

Thanks!

Hadley



-- 
http://hadley.nz

Avraham Adler

2018-Apr-27 14:12 UTC

head link

[Rd] predict.glm returns different results for the same model

Can?t copy from my computer as gmail is blocked at work but if it helps,
the ?predvars? attribute if the terms object is not the same.

Avi

On Fri, Apr 27, 2018 at 9:25 AM Hadley Wickham <h.wickham at gmail.com>
wrote:
> Hi all,
>
> Very surprising (to me!) and mystifying result from predict.glm(): the
> predictions vary depending on whether or not I use ns() or
> splines::ns(). Reprex follows:
>
> library(splines)
>
> set.seed(12345)
> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
> mns <- c(3.4, 3.6)
> sds <- c(0.24, 0.35)
> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd >
sds[dat$claim + 1]))
> dat <- dat[order(dat$wind), ]
>
> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family =
binomial)
>
> # The model coefficients are the same
> unname(coef(m1))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
> unname(coef(m2))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
>
> # But the predictions are not!
> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length =
5))
> unname(predict(m1, newdata = newdf))
> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
> unname(predict(m2, newdata = newdf))
> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
>
> Is this a bug?
>
> (Motivating issue from this ggplot2 bug report:
> https://github.com/tidyverse/ggplot2/issues/2426)
>
> Thanks!
>
> Hadley
>
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>-- 
Sent from Gmail Mobile

	[[alternative HTML version deleted]]

Joris Meys

2018-Apr-27 14:23 UTC

head link

[Rd] predict.glm returns different results for the same model

Hi Hadley,

This is related to how the terms are constructed. If you look at terms(m1)
versus terms(m2), you see that in the case of m1 the knots are added to the
attribute predvars. Contrary, when using splines::ns() this doesn't happen.
Compare:

mf <- model.frame(claim ~ ns(wind, df = 5), data = dat)
mt <-  terms(mf)

with

mf2 <- model.frame(claim ~ splines::ns(wind, df = 5), data = dat)
mt2 <-  terms(mf)

The culprit is actually in makepredictcall.ns, specifically:

    if (as.character(call)[1L] != "ns")
        return(call)

Obviously the call starting with predict::ns() will cause the function to
return the call unaltered. In the case of ns() it processes the knot
information. And that difference causes the difference in predictions.

So it is a bug imho. Especially since I don't understand the logic. The
variable has class 'ns', so makepredictcall() dispatches correctly. That
extra check in the function makepredictcall.ns seems unnecessary to me and
might be simply removed.

Cheers
Joris

On Fri, Apr 27, 2018 at 3:25 PM, Hadley Wickham <h.wickham at gmail.com>
wrote:
> Hi all,
>
> Very surprising (to me!) and mystifying result from predict.glm(): the
> predictions vary depending on whether or not I use ns() or
> splines::ns(). Reprex follows:
>
> library(splines)
>
> set.seed(12345)
> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
> mns <- c(3.4, 3.6)
> sds <- c(0.24, 0.35)
> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd >
sds[dat$claim + 1]))
> dat <- dat[order(dat$wind), ]
>
> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family =
binomial)
>
> # The model coefficients are the same
> unname(coef(m1))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
> unname(coef(m2))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
>
> # But the predictions are not!
> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length =
5))
> unname(predict(m1, newdata = newdf))
> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
> unname(predict(m2, newdata = newdf))
> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
>
> Is this a bug?
>
> (Motivating issue from this ggplot2 bug report:
> https://github.com/tidyverse/ggplot2/issues/2426)
>
> Thanks!
>
> Hadley
>
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

Duncan Murdoch

2018-Apr-27 14:28 UTC

head link

[Rd] predict.glm returns different results for the same model

On 27/04/2018 9:25 AM, Hadley Wickham wrote:> Hi all,
> 
> Very surprising (to me!) and mystifying result from predict.glm(): the
> predictions vary depending on whether or not I use ns() or
> splines::ns(). Reprex follows: >
> library(splines)
> 
> set.seed(12345)
> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
> mns <- c(3.4, 3.6)
> sds <- c(0.24, 0.35)
> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd >
sds[dat$claim + 1]))
> dat <- dat[order(dat$wind), ]
> 
> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family =
binomial)
> 
> # The model coefficients are the same
> unname(coef(m1))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
> unname(coef(m2))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
> 
> # But the predictions are not!
> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length =
5))
> unname(predict(m1, newdata = newdf))
> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
> unname(predict(m2, newdata = newdf))
> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
> 
> Is this a bug?
The two objects m1 and m2 differ more than they should, so this looks 
like a problem in glm, not just in predict.glm.

 > attr(m1$terms, "predvars")
list(claim, ns(wind, knots = c(25.4756277492997, 30.2270250736796,
35.4093171222502, 43.038645381669), Boundary.knots = c(12.9423820390783,
108.071583734075), intercept = FALSE))

 > attr(m2$terms, "predvars")
list(claim, splines::ns(wind, df = 5))

This appears to be due to a bug in the splines package.  There, the 
function splines:::makepredictcall.ns looks like this:

makepredictcall.ns <- function(var, call)
{
     if(as.character(call)[1L] != "ns") return(call)
     at <- attributes(var)[c("knots", "Boundary.knots",
"intercept")]
     xxx <- call[1L:2L]
     xxx[names(at)] <- at
     xxx
}

The test fails for m2, because as.character(call)[1L] is "splines::ns"
instead of "ns". I'll see if I can work out a better test and
submit a
patch.

Duncan Murdoch
> 
> (Motivating issue from this ggplot2 bug report:
> https://github.com/tidyverse/ggplot2/issues/2426)
> 
> Thanks!
> 
> Hadley

Hadley Wickham

2018-Apr-27 14:36 UTC

head link

[Rd] predict.glm returns different results for the same model

On Fri, Apr 27, 2018 at 7:28 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:> On 27/04/2018 9:25 AM, Hadley Wickham wrote:
>>
>> Hi all,
>>
>> Very surprising (to me!) and mystifying result from predict.glm(): the
>> predictions vary depending on whether or not I use ns() or
>> splines::ns(). Reprex follows: >
>> library(splines)
>>
>> set.seed(12345)
>> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
>> mns <- c(3.4, 3.6)
>> sds <- c(0.24, 0.35)
>> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd
>> sds[dat$claim + 1]))
>> dat <- dat[order(dat$wind), ]
>>
>> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
>> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family
>> binomial)
>>
>> # The model coefficients are the same
>> unname(coef(m1))
>> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674 
4.1564128
>> unname(coef(m2))
>> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674 
4.1564128
>>
>> # But the predictions are not!
>> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length
= 5))
>> unname(predict(m1, newdata = newdf))
>> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
>> unname(predict(m2, newdata = newdf))
>> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
>>
>> Is this a bug?
>
>
> The two objects m1 and m2 differ more than they should, so this looks like
a
> problem in glm, not just in predict.glm.
>
>> attr(m1$terms, "predvars")
> list(claim, ns(wind, knots = c(25.4756277492997, 30.2270250736796,
> 35.4093171222502, 43.038645381669), Boundary.knots = c(12.9423820390783,
> 108.071583734075), intercept = FALSE))
>
>> attr(m2$terms, "predvars")
> list(claim, splines::ns(wind, df = 5))
>
> This appears to be due to a bug in the splines package.  There, the
function
> splines:::makepredictcall.ns looks like this:
>
> makepredictcall.ns <- function(var, call)
> {
>     if(as.character(call)[1L] != "ns") return(call)
>     at <- attributes(var)[c("knots",
"Boundary.knots", "intercept")]
>     xxx <- call[1L:2L]
>     xxx[names(at)] <- at
>     xxx
> }
>
> The test fails for m2, because as.character(call)[1L] is
"splines::ns"
> instead of "ns". I'll see if I can work out a better test and
submit a
> patch.
Great, thanks!


-- 
http://hadley.nz

Martin Maechler

2018-Apr-27 16:51 UTC

head link

[Rd] predict.glm returns different results for the same model

>>>>> Duncan Murdoch <murdoch.duncan at gmail.com>
>>>>>     on Fri, 27 Apr 2018 10:28:16 -0400 writes:
    > On 27/04/2018 9:25 AM, Hadley Wickham wrote:
    >> Hi all,
    >> 
    >> Very surprising (to me!) and mystifying result from predict.glm():
the
    >> predictions vary depending on whether or not I use ns() or
    >> splines::ns(). Reprex follows: >
    >> library(splines)
    >> 
    >> set.seed(12345)
    >> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
    >> mns <- c(3.4, 3.6)
    >> sds <- c(0.24, 0.35)
    >> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd  
>> sds[dat$claim + 1]))
    >> dat <- dat[order(dat$wind), ]
    >> 
    >> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family =
binomial)
    >> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family
= binomial)
    >> 
    >> # The model coefficients are the same
    >> unname(coef(m1))
    >> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674 
4.1564128
    >> unname(coef(m2))
    >> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674 
4.1564128
    >> 
    >> # But the predictions are not!
    >> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind),
length = 5))
    >> unname(predict(m1, newdata = newdf))
    >> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
    >> unname(predict(m2, newdata = newdf))
    >> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
    >> 
    >> Is this a bug?

    > The two objects m1 and m2 differ more than they should, so this looks 
    > like a problem in glm, not just in predict.glm.

    >> attr(m1$terms, "predvars")
    > list(claim, ns(wind, knots = c(25.4756277492997, 30.2270250736796,
    > 35.4093171222502, 43.038645381669), Boundary.knots =
c(12.9423820390783,
    > 108.071583734075), intercept = FALSE))

    >> attr(m2$terms, "predvars")
    > list(claim, splines::ns(wind, df = 5))

    > This appears to be due to a bug in the splines package.  There, the 
    > function splines:::makepredictcall.ns looks like this:

    > makepredictcall.ns <- function(var, call)
    > {
    > if(as.character(call)[1L] != "ns") return(call)
    > at <- attributes(var)[c("knots",
"Boundary.knots", "intercept")]
    > xxx <- call[1L:2L]
    > xxx[names(at)] <- at
    > xxx
    > }

    > The test fails for m2, because as.character(call)[1L] is
"splines::ns"
    > instead of "ns". I'll see if I can work out a better test
and submit a
    > patch.

    > Duncan Murdoch

Thank you, Duncan, for the good diagnosis and the patch!

I will deal with it.

Two things however (both *not* needing a new patch; I'll deal
    	   with it after dinner ;-)

1) I'd like to commit that ASAP, but do want to add a *minimal*
   Reprex for regression test, rather than the above, notably as I
   see that the only place our code calls  makepredictcall() is in
   model.frame.default(), so I thought this should not really be
   related to glm at all, and I was right.

2) Reading the help page ?makepredictcall
    --- yes, a good idea, even for experts, believe me, ((notably when it is
from	 a base package, not produced from roxy.. ;-\)) ---
   reveals what I vaguely remembered:
   The function was introduced to fix a bug first encountered in
      lm(. ~  poly(.))

  and indeed:

  -> That help page actually contains an indirect closer to minimal reprex.
  -> I will also patch the  makepredictcall.poly() method.

  [ as I said: after dinner .. ;-)]

--
Martin

Maybe Matching Threads

Search for more maybe matching threads

R devel - Apr 2018 - predict.glm returns different results for the same model

[Rd] predict.glm returns different results for the same model

[Rd] predict.glm returns different results for the same model

[Rd] predict.glm returns different results for the same model

[Rd] predict.glm returns different results for the same model

[Rd] predict.glm returns different results for the same model

[Rd] predict.glm returns different results for the same model

Maybe Matching Threads