Hello r-list members,
I've been doing some linear modeling with a dataset structured as follows.
Tubes containing 500 larvae of Trichinella each were treated with one of
four different temperatures. Each day (or every 10 days depending on
treatment group), 3 tubes were selected from each treatment and all the dead
larvae were counted. The tubes were discarded. Final larvae counts were
averaged.
Then we have:
y = dead larvae (count)
X1 = Day: dead larvae contained on three tubes were quantified either daily
(tubes at -30) or every 10 days (tubes at -20 ?C, 4? C, and lab temp.). This
was done for each treatment until all the larvae in all three tubes were
dead.
X2: Temperature treatment (4 factors): -30? C, -20 ?C, 4? C, and lab
temperature.
Because we counted larvae for each treatment until all the 500 "larvae of
the day (batch of three tubes) were dead, the experiment was terminated at
different times for each treatment (e.g. day 95 for -30, day 200 for -20,
and so on). This led to a final dataset containing data collected over
different time ranges.
Days Dead_larvae Group
1 100 30 below
2 145 30 below
3 277 30 below
4 284 30 below
5 288 30 below
6 294 30 below
7 359 30 below
. . .
. . .
. . .
95 500 30 below
10 25 20 below
20 35 20 below
30 105 20 below
40 230 20 below
. . .
. . .
. . .
200 500 20 below
. . .
. . .
. . .
Model specification:
> my_model <- lm(Larvae_count ~ Days + I(Days^2) + Group, data = Data)
Call:
lm(formula = Larvas_muertas ~ Dias + I(Dias^2) + Grupo, data = Data)
Residuals:
Min 1Q Median 3Q Max
-356.983 -31.229 3.606 37.768 170.846
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.556e+02 6.634e+00 53.60 <2e-16 ***
Dias 2.403e+00 9.963e-02 24.11 <2e-16 ***
I(Dias^2) -2.732e-03 1.885e-04 -14.49 <2e-16 ***
Grupo-20 -1.422e+02 1.283e+01 -11.08 <2e-16 ***
Grupo4 -3.117e+02 1.188e+01 -26.23 <2e-16 ***
GrupoAmb -3.830e+02 1.212e+01 -31.59 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 67.04 on 348 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.7879, Adjusted R-squared: 0.7848
F-statistic: 258.5 on 5 and 348 DF, p-value: < 2.2e-16
Q1. Is the modeling approach / specification correct?
Q2. Is the fact that larvae were counted over different periods of time,
thus leading to markedly different ranges of X1 for each treatment, too bad
a thing? Might this lead to seriously biased estimates?
Q3. Am I incurring in violation of residual independence due to correlation
between residuals from different time points? If so, how can one deal with
it in R?
I know my question is both a statistical and R-related one, so apologies in
advance.
Best luck,
Luciano