I have a data analysis job for which lme may be used. Prof. Spencer Graves had
helped me much on that. I'm really appreciated for that. Could anybody else
in the list give me some hints from other perspectives? I hope I can learn as
much as possible for this complicated real data.
Thanks in advance.
Hanhan
To briefly describe my data: My data is health effect measurements (y) and
personal exposure to ozone and some other pollutants (x1,x2,x3,...). For each of
the totally 5 subjects, 3 weeks' daily data were available with some missing
values. To pool the 5 subjects together, I use lme in R as try1 <-
lme(y~x1+x2+x3,random=~1|sub,na.actionna.exclude). Is it proper to do so? (only
intercept will be treated as random)
If the initial model contain 8 variables with 3 of them insignificant. My first
step would be to try various corARMA possibilities. There are several possible
results:> 1. Maybe there is one corARMA model (with the lowest AIC of
course) that makes all the eight variables significant. I'll happily
stop at that! (but this is not the case for my data)> 2. There is one corARMA model with a much lower AIC and anova test
shows significant difference of it from the original model. But in the new
model, there is still one or two insignificant variables. What can I
do next? (that's the problem in my data)> 3. No correlation structure in the residuals. In this case, I'll use
drop1 on the original model, and test various corARMA again for the new
model. And the cycle from step 1 to 3 goes on.
try1$resid (or resid(try1)? I forget) would give me two columns of residuals,
the first colomn being residuals and the second being subject-specific
residuals. I want to analyze the residuals using arima, so which colomn should
I use?
In my imagination (maybe a silly idea), considering all possible time lags
between variables, to cope with the regression above, I would make pilot
pairwise cross-correlation analysis first between y~x1, y~x2, y~x3, etc. After
that, if a linear model is assumed, I would like the equation to be as follows:
y[t]+b1*[t-1]+b2*[t-2]+...=c1*x1[t]+c2*x1[t-1]+c3*x1[t-2]+d1*x2[t]+d2*x2[t-1]+d3*x2[t-2]+e1*x3[t]+e2*x3[t-1]+e3*x3[t-2]+...
So I'll produce new time lag variables from y, x1,x2, x3,.... This will make
the equation much more complicated. Is this reasonable?
If x <- rnorm(100,5) and y <- 1.5*x+rnorm(100,0). Delete the first value
in x and the last value in y, there would be no correlation between y~x, and the
residual would show no patern. But obviously y[t]~x[t-1] would give a wonderful
regression. So, in practice, if we encounter such situation (with no correlation
between two variables and no correlation structure in residuals), we should not
stop at that. But how can this be diagnosed but by time lag? Am I right in this
point?
Xianglu Han
206 Environmental Health Science
University of Georgia 30602
Phone: 706 255 2308
---------------------------------
[[alternative HTML version deleted]]