Andrew Crane-Droesch
2013-May-08 14:12 UTC
[R] gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)
Dear All, I'm using gam for a project that involves multiple imputation, and it has led me to a question about how f-statistics/p-values work in gam. Specifically, how do the values in summary(gam) get generated? As is made clear by the dumb example below, I'm manipul;ating gam objects to reflect the MI procedures that I'm using. I don't trust the f-statistics/p-values that I'm getting, but I'd like to know how to further manipulate these objects to get trustworthy values. Part of that invovles knowing how the output in summary(gam) gets generated, and from what elements of a gam object. Here is the example: library(mgcv) par(mfrow=c(2,2)) impute <- function (a, a.impute){ ifelse (is.na(a), a.impute, a) } #make some dumb fake data x1.knots = c(-1,-.5,0,.5,1) x2.knots = c(-1,-.5,0,.5,1) K=5 x1 = rnorm(100) x2 = rnorm(100) y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100) #some of it is missing x1[81:100] = NA x2[71:90] = NA #do a few dumb imputations, and fit models to the partially-imputed data x1.imp = impute(x1, rnorm(100)) x2.imp = impute(x2, rnorm(100)) m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp = x2.knots)) plot(m1,plot.type="contour",scheme=2,main="Imp 1") x1.imp = impute(x1, rnorm(100)) x2.imp = impute(x2, rnorm(100)) m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp = x2.knots)) plot(m2,plot.type="contour",scheme=2,main="Imp 2") x1.imp = impute(x1, rnorm(100)) x2.imp = impute(x2, rnorm(100)) m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp = x2.knots)) plot(m3,plot.type="contour",scheme=2,main="Imp 3") results = list(m1,m2,m3) #Combine the results according to rubin's rules reps=3 bhat=results[[1]]$coeff for (i in 2:reps){ bhat=bhat+results[[i]]$coeff } bhat = bhat/reps W=results[[1]]$Vp for (i in 2:reps){ W = W+results[[i]]$Vp } W = W/reps B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat) for (i in 2:reps){ B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat) } B=B/(reps-1) Vb = W+(1+1/reps)*B #Put the results into a convenient gam object MI = results[[1]] MI$coefficients=bhat MI$Vp = Vb #and summarize summary(MI) plot(MI,plot.type="contour",scheme=2,main="MI") When I do something similar with non-fake data I get wacky f-statistics and p-values. For example, F could be >100 and p could equal 1. This probably has something to do with degrees of freedom. My easy question: what goes on under the hood with gam to generate these values? What parts of a gam object are called up? My harder question: how might one construct principled analogs of these statistics in an MI context, when degrees of freedom will vary across models, depending on the imputed data? Has anybody thought about this, or do I have have some serious pencil-and-paper ahead of me? Thanks, Andrew
Bert Gunter
2013-May-08 15:59 UTC
[R] gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)
?summary.gam ## The Help page Since the Help page is presumably not enough, why don't you look at the code?? R is open source. summary.gam ## at the prompt -- Bert On Wed, May 8, 2013 at 7:12 AM, Andrew Crane-Droesch <andrewcd at gmail.com> wrote:> Dear All, > > I'm using gam for a project that involves multiple imputation, and it has > led me to a question about how f-statistics/p-values work in gam. > Specifically, how do the values in summary(gam) get generated? As is made > clear by the dumb example below, I'm manipul;ating gam objects to reflect > the MI procedures that I'm using. I don't trust the f-statistics/p-values > that I'm getting, but I'd like to know how to further manipulate these > objects to get trustworthy values. Part of that invovles knowing how the > output in summary(gam) gets generated, and from what elements of a gam > object. > > Here is the example: > > library(mgcv) > par(mfrow=c(2,2)) > impute <- function (a, a.impute){ > ifelse (is.na(a), a.impute, a) > } > > #make some dumb fake data > x1.knots = c(-1,-.5,0,.5,1) > x2.knots = c(-1,-.5,0,.5,1) > K=5 > x1 = rnorm(100) > x2 = rnorm(100) > y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100) > #some of it is missing > x1[81:100] = NA > x2[71:90] = NA > > #do a few dumb imputations, and fit models to the partially-imputed data > x1.imp = impute(x1, rnorm(100)) > x2.imp = impute(x2, rnorm(100)) > m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp > = x2.knots)) > plot(m1,plot.type="contour",scheme=2,main="Imp 1") > > x1.imp = impute(x1, rnorm(100)) > x2.imp = impute(x2, rnorm(100)) > m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp > = x2.knots)) > plot(m2,plot.type="contour",scheme=2,main="Imp 2") > > x1.imp = impute(x1, rnorm(100)) > x2.imp = impute(x2, rnorm(100)) > m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp > = x2.knots)) > plot(m3,plot.type="contour",scheme=2,main="Imp 3") > > results = list(m1,m2,m3) > > #Combine the results according to rubin's rules > reps=3 > bhat=results[[1]]$coeff > for (i in 2:reps){ > bhat=bhat+results[[i]]$coeff > } > bhat = bhat/reps > > W=results[[1]]$Vp > for (i in 2:reps){ > W = W+results[[i]]$Vp > } > W = W/reps > B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat) > for (i in 2:reps){ > B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat) > } > B=B/(reps-1) > Vb = W+(1+1/reps)*B > > #Put the results into a convenient gam object > MI = results[[1]] > MI$coefficients=bhat > MI$Vp = Vb > > #and summarize > summary(MI) > plot(MI,plot.type="contour",scheme=2,main="MI") > > When I do something similar with non-fake data I get wacky f-statistics and > p-values. For example, F could be >100 and p could equal 1. This probably > has something to do with degrees of freedom. > > My easy question: what goes on under the hood with gam to generate these > values? What parts of a gam object are called up? > > My harder question: how might one construct principled analogs of these > statistics in an MI context, when degrees of freedom will vary across > models, depending on the imputed data? Has anybody thought about this, or > do I have have some serious pencil-and-paper ahead of me? > > Thanks, > Andrew > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Simon Wood
2013-May-08 16:33 UTC
[R] gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)
For smooths the method is described in Wood 2013 On p-values for smooth components of an extended generalized additive model, Biometrika 100(1),221-228 http://biomet.oxfordjournals.org/content/early/2012/10/18/biomet.ass048.full.pdf+html best, Simon On 08/05/13 15:12, Andrew Crane-Droesch wrote:> Dear All, > > I'm using gam for a project that involves multiple imputation, and it > has led me to a question about how f-statistics/p-values work in gam. > Specifically, how do the values in summary(gam) get generated? As is > made clear by the dumb example below, I'm manipul;ating gam objects to > reflect the MI procedures that I'm using. I don't trust the > f-statistics/p-values that I'm getting, but I'd like to know how to > further manipulate these objects to get trustworthy values. Part of > that invovles knowing how the output in summary(gam) gets generated, and > from what elements of a gam object. > > Here is the example: > > library(mgcv) > par(mfrow=c(2,2)) > impute <- function (a, a.impute){ > ifelse (is.na(a), a.impute, a) > } > > #make some dumb fake data > x1.knots = c(-1,-.5,0,.5,1) > x2.knots = c(-1,-.5,0,.5,1) > K=5 > x1 = rnorm(100) > x2 = rnorm(100) > y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100) > #some of it is missing > x1[81:100] = NA > x2[71:90] = NA > > #do a few dumb imputations, and fit models to the partially-imputed data > x1.imp = impute(x1, rnorm(100)) > x2.imp = impute(x2, rnorm(100)) > m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, > x2.imp = x2.knots)) > plot(m1,plot.type="contour",scheme=2,main="Imp 1") > > x1.imp = impute(x1, rnorm(100)) > x2.imp = impute(x2, rnorm(100)) > m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, > x2.imp = x2.knots)) > plot(m2,plot.type="contour",scheme=2,main="Imp 2") > > x1.imp = impute(x1, rnorm(100)) > x2.imp = impute(x2, rnorm(100)) > m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, > x2.imp = x2.knots)) > plot(m3,plot.type="contour",scheme=2,main="Imp 3") > > results = list(m1,m2,m3) > > #Combine the results according to rubin's rules > reps=3 > bhat=results[[1]]$coeff > for (i in 2:reps){ > bhat=bhat+results[[i]]$coeff > } > bhat = bhat/reps > > W=results[[1]]$Vp > for (i in 2:reps){ > W = W+results[[i]]$Vp > } > W = W/reps > B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat) > for (i in 2:reps){ > B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat) > } > B=B/(reps-1) > Vb = W+(1+1/reps)*B > > #Put the results into a convenient gam object > MI = results[[1]] > MI$coefficients=bhat > MI$Vp = Vb > > #and summarize > summary(MI) > plot(MI,plot.type="contour",scheme=2,main="MI") > > When I do something similar with non-fake data I get wacky f-statistics > and p-values. For example, F could be >100 and p could equal 1. This > probably has something to do with degrees of freedom. > > My easy question: what goes on under the hood with gam to generate > these values? What parts of a gam object are called up? > > My harder question: how might one construct principled analogs of these > statistics in an MI context, when degrees of freedom will vary across > models, depending on the imputed data? Has anybody thought about this, > or do I have have some serious pencil-and-paper ahead of me? > > Thanks, > Andrew > > > >-- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283