thr3ads.net - R help - [R] model selection in lme when corARMA is assumed [Jul 2003]

If this information is useful, please help other people find it:
Share via:

Hanhan

2003-Jul-09 18:20 UTC

[R] model selection in lme when corARMA is assumed

I have a data analysis job for which lme may be used. Prof. Spencer Graves had
helped me much on that. I'm really appreciated for that. Could anybody else
in the list give me some hints from other perspectives? I hope I can learn as
much as possible for this complicated real data.
Thanks in advance.
Hanhan
 
To briefly describe my data: My data is health effect measurements (y) and
personal exposure to ozone and some other pollutants (x1,x2,x3,...). For each of
the totally 5 subjects, 3 weeks' daily data were available with some missing
values. To pool the 5 subjects together, I use lme in R as try1 <-
lme(y~x1+x2+x3,random=~1|sub,na.actionna.exclude). Is it proper to do so? (only
intercept will be treated as random)
 
If the initial model contain 8 variables with 3 of  them insignificant. My first
step would be to try various corARMA possibilities. There are several possible
results:> 1. Maybe there is one corARMA model (with the lowest AIC of  course) that makes all the eight variables significant. I'll happily 
 stop at that! (but this is not the case for my data)> 2. There is one corARMA model with a much lower AIC and anova test shows significant difference of it from the original model. But in the new 
model, there is still one or two insignificant variables. What can I 
do next? (that's the problem in my data)> 3. No correlation structure in the residuals. In this case, I'll use  drop1 on the original model, and test various corARMA again for the new 
model.  And the cycle from step 1 to 3 goes on.

try1$resid (or resid(try1)? I forget) would give me two columns of residuals,
the first colomn being residuals and the second being subject-specific
residuals.  I want to analyze the residuals using arima, so which colomn should
I use?
 
In my imagination (maybe a silly idea), considering all possible time lags
between variables, to cope with the regression above, I would make pilot
pairwise cross-correlation analysis first between y~x1, y~x2, y~x3, etc. After
that, if a linear model is assumed, I would like the equation to be as follows:
y[t]+b1*[t-1]+b2*[t-2]+...=c1*x1[t]+c2*x1[t-1]+c3*x1[t-2]+d1*x2[t]+d2*x2[t-1]+d3*x2[t-2]+e1*x3[t]+e2*x3[t-1]+e3*x3[t-2]+...
So I'll produce new time lag variables from y, x1,x2, x3,.... This will make
the equation much more complicated. Is this reasonable?
 
If x <- rnorm(100,5)  and y <- 1.5*x+rnorm(100,0). Delete the first value
in x and the last value in y, there would be no correlation between y~x, and the
residual would show no patern. But obviously y[t]~x[t-1] would give a wonderful
regression. So, in practice, if we encounter such situation (with no correlation
between two variables and no correlation structure in residuals), we should not
stop at that. But how can this be diagnosed but by time lag? Am I right in this
point?




Xianglu Han

206 Environmental Health Science

University of Georgia 30602

Phone: 706 255 2308

 



---------------------------------


	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jul 2003 - model selection in lme when corARMA is assumed

[R] model selection in lme when corARMA is assumed

Seemingly Similar Threads

Wisdom of the Ancients