Andrew Crane-Droesch
2013-Aug-15 17:03 UTC
[R] Modifying the internals of mgcv's G object to program a within estimator
I'm trying to write a function to make gam (in mgcv) give me the fixed-effects "within" estimator, which is equivalent to the OLS dummy variable estimator. Basically this involves subtracting the within-group means from the IVs and the DV, adding their overall means back in, and fitting the model. For OLS, this is equivalent to running a regression with a dummy variable for each group. With many groups, this gets computationally prohibitive, especially if you want to bootstrap. Eventually I want to apply this procedure to all of gam's splined terms, but for now I'm simply trying to modify gam's G object to do so only for one parametrically-represented variable. The G object feeds everything to gam needed to fit the model. The code below modifies the design matrix, the DV, and the formula. My question: what else needs to change in G to make the FE estimator equivalent to the dummy variable estimator? Unfortunately I can't find any technical information or well-commented code describing how gam works in this regard, nor good descriptions of what the constituents of G are. Nor can I look at the gam.setup function -- I think that it is compiled in C (or something). Responses that can make the estimate of the coefficient on x (below) equal for the two models would be really welcome, as would pointers to useful documentation. rm(list=ls()) library(mgcv) library(plyr) x = rexp(100) ID = ceiling(runif(100)*10) y = exp(x)+rnorm(100)+ID t = ceiling(runif(100)*5) data = data.frame(y,x,ID,t) data = data[with(data,order(ID,t)),] m = gam(y~x+as.factor(ID),fit=FALSE) transform = function(G,ID){ rep.row = function(x,n){matrix(rep(x,each=n),nrow=n)} X = G$X[,grepl(colnames(data.frame(ID=ID)),colnames(G$X))==FALSE] dmframe = as.data.frame(cbind(ID,X)) dm <- as.matrix(ddply(dmframe, "ID", colwise(scale, center = TRUE, scale = FALSE))) cm = rep.row(colMeans(dmframe[,2:ncol(dmframe)]),nrow(dmframe)) X = dm[,2:ncol(dm)]+cm G$X = X y = G$y dmframe = as.data.frame(cbind(ID,y)) dm <- ddply(dmframe, "ID", colwise(scale, center = TRUE, scale = FALSE)) dm = as.matrix(dm) cm = mean(dmframe[,2]) y = dm[,2]+cm G$y = y spec = as.character(G$formula)[3]; n=nchar(spec) G$formula = as.formula(paste(as.character(G$formula)[2],as.character(G$formula)[1],substr(spec,start=1,stop=n-15))) } #m = gam(y~s(x,k=7)+as.factor(ID),fit=FALSE,sp=0) out = transform(m,data$ID) G=m; ID=data$ID mfe = gam(out) summary(mfe) mlsdv = gam(y~x+as.factor(ID)) #mlsdv = gam(y~s(x,k=7)+as.factor(ID),sp=0) summary(mlsdv) #The coefficient on x should be equal for the two models. The intercept should represent the average of the fixed effects. cbind(as.matrix(t(mlsdv$coef))[,grepl(colnames(data.frame(ID=ID)),colnames(as.matrix(t(mlsdv$coef))))==FALSE], mfe$coef)