Sebastien Bihorel
2017-Oct-24 01:57 UTC
[R] Issue of reproducibility with gam and lm.wfit in different versions of R
Dear R users,
I recently stumbled upon problems of reproducibility while running GAM analyses
in different R and gam package versions. In the example below, a small dataset
is created in which the y and x1 variables are 100% correlated. The intents of
this example were primarily for regression testing and, secondarily, to evaluate
how the gam algorithm behaves under extreme/limit conditions.
I ran this little snippet in 5 different environments and got 100% consistent
results until I switched to R 3.3.2.
* Comparing results from environments 1, 2, and 3 shows that changing the
version of the gam package did not change the results under R 3.3.0.
* Comparing results from environments 3 and 4 shows that changing the version of
R altered the values of the AIC and the output of the step.gam call (changed to
a NULL object)
* Comparing results from environments 4 and 5 shows that reverting to an older
version of the gam package in R 3.3.2 still produced altered AIC values and the
NULL output from step.gam call
Further investigations into these differences seem to show that the lm.wfit call
in the gam.fit function (called from within gam and step.gam) may result in
different values in R 3.3.0 and 3.3.2.
So my questions are 2-fold:
1- Would you have any information about why lm.wfit would produce different
outcomes in R 3.3.0 and 3.3.2? The source code does not appear significantly
different.
2- Is it expected that the step.gam function returns a NULL object in R 3.3.2
while a valid model has been identified? Looking at the source of step.gam, the
line 157 (if(is.null(form.list)) break) seems to be the reason the function
breaks out and returns a NULL value.
I thank you in advance for your time.
Sebastien Bihorel
library(gam)
dat <- data.frame(
y =
c(57,57,98,83,122,69,108,86,80,87,75,76,97,101,121,111,105,84,65,54,61,71,125,60,50,112,102,110,77,45,93,62,120,115,70,113,117,85,46,123,89,95,116,55,110,92,109,100,72,88,105,119,94,45,67,58,60,45,107,73,100,79,47,99,51,53,68,125,90,48,82,85,65,52,70,59,125,49,118,103,91,124,78,81,63,63),
x1 =
c(52,52,93,78,117,64,103,81,75,82,70,71,92,96,116,106,100,79,60,49,56,66,120,55,45,107,97,105,72,40,88,57,115,110,65,108,112,80,41,118,84,90,111,50,105,87,104,95,67,83,100,114,89,40,62,53,55,40,102,68,95,74,42,94,46,48,63,120,85,43,77,80,60,47,65,54,120,44,113,98,86,119,73,76,58,58),
x2 =
c(0.0001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0001,0)
)
summary(dat)
scope <- list(
x1=c('1','x1','ns(x1, df=2)'),
x2=c('1','x2')
)
gam.object <- gam(y~1, data=dat)
step.object <- step.gam(gam.object, scope=scope, trace=2)
is.null(step.object)
# Run this if you want to evaluate one lm.wfit example call made from within gam
functions
x <- cbind(rep(1,nrow(dat)), dat$x1)
names(x) <- c('Intercept', 'x1')
z <- dat$y
w <- rep(1,nrow(dat))
str(eval(expression(lm.wfit(x, z, w, method = "qr", singular.ok =
TRUE))))
######################################################################
#1 R 3.1.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.09.1
######################################################################
> step.object <- step.gam(gam.object, scope=scope, trace=2)
Start: y ~ 1; AIC= 797.0034
Trial: y ~ x1 + 1 ; AIC= -5121.796
Trial: y ~ 1 + x2 ; AIC= 796.915
Step:1 y ~ x1 ; AIC= -5121.796
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816
Trial: y ~ x1 + x2 ; AIC= -5174.408
Step:2 y ~ x1 + x2 ; AIC= -5174.408
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 > is.null(step.object)
[1] FALSE
######################################################################
#2: R 3.3.0 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.12
######################################################################
> step.object <- step.gam(gam.object, scope=scope, trace=2)
Start: y ~ 1; AIC= 797.0034
Trial: y ~ x1 + 1 ; AIC= -5121.796
Trial: y ~ 1 + x2 ; AIC= 796.915
Step:1 y ~ x1 ; AIC= -5121.796
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816
Trial: y ~ x1 + x2 ; AIC= -5174.408
Step:2 y ~ x1 + x2 ; AIC= -5174.408
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 > is.null(step.object)
[1] FALSE
######################################################################
#3: R 3.3.0 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.14-4
######################################################################
> step.object <- step.gam(gam.object, scope=scope, trace=2)
Start: y ~ 1; AIC= 797.0034
Trial: y ~ x1 + 1 ; AIC= -5121.796
Trial: y ~ 1 + x2 ; AIC= 796.915
Step:1 y ~ x1 ; AIC= -5121.796
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816
Trial: y ~ x1 + x2 ; AIC= -5174.408
Step:2 y ~ x1 + x2 ; AIC= -5174.408
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 > is.null(step.object)
[1] FALSE
######################################################################
#4: R 3.3.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.14-4
######################################################################
> step.object <- step.gam(gam.object, scope=scope, trace=2)
Start: y ~ 1; AIC= 797.0034
Trial: y ~ x1 + 1 ; AIC= -5122.886
Trial: y ~ 1 + x2 ; AIC= 796.915
Step:1 y ~ x1 ; AIC= -5122.886
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5177.531
Trial: y ~ x1 + x2 ; AIC= -5177.531
Step:2 y ~ ns(x1, df = 2) ; AIC= -5177.531
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5222.703
Step:3 y ~ ns(x1, df = 2) + x2 ; AIC= -5222.703 > is.null(step.object)
[1] TRUE
######################################################################
#5: R 3.3.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.09.1
######################################################################
> step.object <- step.gam(gam.object, scope=scope, trace=2)
Start: y ~ 1; AIC= 797.0034
Trial: y ~ x1 + 1 ; AIC= -5122.886
Trial: y ~ 1 + x2 ; AIC= 796.915
Step:1 y ~ x1 ; AIC= -5122.886
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5177.531
Trial: y ~ x1 + x2 ; AIC= -5177.531
Step:2 y ~ ns(x1, df = 2) ; AIC= -5177.531
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5222.703
Step:3 y ~ ns(x1, df = 2) + x2 ; AIC= -5222.703 > is.null(step.object)
[1] TRUE
