Andrew Crane-Droesch
2012-Oct-27 04:00 UTC
[R] [gam] [mgcv] Question in integrating a eiker-white "sandwich" VCV estimator into GAM
Dear List,
I'm just teaching myself semi-parametric techniques. Apologies in
advance for the long post.
I've got observational data and a longitudinal, semi-parametric model
that I want to fit in GAM (or potentially something equivalent), and I'm
not sure how to do it. I'm posting this to ask whether it is possible
to do what I want to do using "canned" commands and plotting routines.
If not, I'd probably have to spend some time programming from scratch.
The response is modeled as a function of a few dummy variables and
several continuous variables. To control for time-invariant
unobservable characteristics, I'm also including a "fixed effect"
in the
econometrics sense of the term -- an individual-specific intercept. I
also want to model the continuous variables flexibly -- I have no good
priors on the proper specification for the function form. The model is
the following:
y_{it} = \alpha_i + \beta_1(T_{it}) + f(continuous.vars_{it}) + e_{it}
To control for unobserved time-invariant heterogeneity, I want to
de-mean the data as follows:
y_{it}-\bar{y_i} = \beta_1(T_{it}-\bar{T_i}) +
f(continuous.vars_{it}-\bar{continuous.vars_i}) + e_{it} - \bar{e}_i
Fitting the demeaned model should give me coefficient estimates equal to
the non de-meaned model, including coefficient estimates on the spline
terms. However, there is certainly autocorrelation in the errors, and
potentially heteroskedasticity. The Ruppert et al textbook on
semiparametric regression uses GLS to account for correlated errors. I
haven't really used GLS much and I don't think it solves the
autocorrelation problem. I'm more accustomed to using a cluster-robust
"sandwich" estimator:
(X'X)^{-1} (sum_j(X_j' e_j e_j' X_j)) (X'X)^{-1}
In a penalized spline context, this would be something like the following:
(X'X+\lambda K)^{-1} (sum_j(X_j' e_j e_j' X_j))
(X'X+\lambda K)^{-1}
(where J are clusters -- units on whom observations are repeated).
As far as I can tell, there is nothing theoretically wrong with this
extension of the sandwich estimator to the semiparametric context,
though I'm still learning all of this. If anybody could point out any
potential problems in the above formulation, I'd be grateful.
So unless I am missing something with the theory (which is certainly
quite possible at this stage), my problem is with the implementation:
1. I want to run GAM on de-meaned data, including de-meaned spline
terms. How would I go about doing this? In octave, I would define a
matrix to include spline terms, and then de-mean that whole matrix
before fitting the model (via REML or ML, to get the smoothing
parameters). But if I do so manually in R and feed this matrix into
GAM, GAM would simply take them as more data, to be splined, unless I
specify them as linear terms. If I do that, I don't get the graphs,
which I want.
2. In the graphs given by plot(gam.object), I get confidence intervals
that correspond to standard errors that are NOT based on the sandwich
estimator, above. How could I get GAM to plot confidence intervals
based on the sandwich estimator for the vcv matrix?
3. Some of the terms are interactions. (i.e.: T*var2). I realize that
GAM has tensor product capabilities, but (1) frankly I don't understand
them yet, and (2) I want my semi-parametric fit to be comparable to
polynomial "parametric" fits. So, in addition to the plots given by
GAM, I'd like to be able to make a graph that adds the estimate of the
coefficient on T to the smooth function of T*var2, and also adds their
standard errors. Of course, I'd like these to be the standard errors
estimated by the "sandwich" estimator, above.
Thanks for any help and advice, and for bearing with my long post.
Best,
Andrew
--
*Andrew Crane-Droesch*
Energy and Resources Group
UC Berkeley
+1 215 435 2644
andrewcd@berkeley.edu
skype: andrew.crane-droesch
http://andrewcd.berkeley.edu
[[alternative HTML version deleted]]
Possibly Parallel Threads
- Help with efficient double sum of max (X_i, Y_i) (X & Y vectors)
- Why are the number of coefficients varying? [mgcv][gam]
- Calculation of VCV matrix of estimated coefficient
- Calculation of VCV matrix of estimated coefficient
- Quantiles of sums of independent discrete random variables
