Dylan Beaudette
2008-Jul-15 04:34 UTC
[R] meaning of tests presented in anova(ols(...)) {Design package}
Hi,
I am curious about how to interpret the table produced by
anova(ols(...)), from the Design package. I have a multiple linear
regression model, with some interaction, defined by:
ols(formula = log(ksat * 60 * 60) ~ log(sar) * pol(activity,
3) + log(conc) * pol(sand, 3), data = sm.clean, x = TRUE,
y = TRUE)
n Model L.R. d.f. R2 Sigma
1834 1203 14 0.48 1.2
Residuals:
Min 1Q Median 3Q Max
-5.033 -0.859 0.016 0.739 4.868
Coefficients:
Value Std. Error t Pr(>|t|)
Intercept 11.3886790 2.0220171 5.63 0.0000000205580
sar -4.3991263 1.0157588 -4.33 0.0000156609226
activity -40.0591221 5.6907822 -7.04 0.0000000000027
activity^2 33.0570116 5.0578520 6.54 0.0000000000819
activity^3 -8.1645147 1.3750370 -5.94 0.0000000034548
conc 0.3841260 0.0813200 4.72 0.0000024942478
sand -0.0096212 0.0327415 -0.29 0.7689032898947
sand^2 0.0008495 0.0008589 0.99 0.3227487169683
sand^3 0.0000025 0.0000066 0.39 0.6994987342042
sar * activity 12.8134698 2.9513942 4.34 0.0000149300007
sar * activity^2 -9.9981381 2.6310765 -3.80 0.0001494462966
sar * activity^3 2.1481278 0.7168339 3.00 0.0027662261037
conc * sand -0.0157426 0.0076013 -2.07 0.0384966958735
conc * sand^2 0.0003419 0.0001989 1.72 0.0857381555491
conc * sand^3 -0.0000027 0.0000015 -1.77 0.0777025949762
Looking at what I 'think' are "marginal p-values" i.e. results
of a
test against coef_i != 0, there are several terms with non-significant
coefficients (at p<0.05). Does a non-significant coefficient warrant
removal from the model, or perhaps a mention in the discussion?
Compared to the above example, what tests are performed when calling
anova() on this object? Here is the output in R:
Analysis of Variance Response: log(ksat * 60 * 60)
Factor d.f. Partial SS MS F
sar (Factor+Higher Order Factors) 4 168.43 42.11 27.0
All Interactions 3 142.13 47.38 30.4
activity (Factor+Higher Order Factors) 6 536.84 89.47 57.3
All Interactions 3 142.13 47.38 30.4
Nonlinear (Factor+Higher Order Factors) 4 257.25 64.31 41.2
conc (Factor+Higher Order Factors) 4 443.02 110.75 71.0
All Interactions 3 76.74 25.58 16.4
sand (Factor+Higher Order Factors) 6 1906.29 317.71 203.6
All Interactions 3 76.74 25.58 16.4
Nonlinear (Factor+Higher Order Factors) 4 263.00 65.75 42.1
sar * activity (Factor+Higher Order Factors) 3 142.13 47.38 30.4
Nonlinear 2 95.32 47.66 30.5
Nonlinear Interaction : f(A,B) vs. AB 2 95.32 47.66 30.5
conc * sand (Factor+Higher Order Factors) 3 76.74 25.58 16.4
Nonlinear 2 4.98 2.49 1.6
Nonlinear Interaction : f(A,B) vs. AB 2 4.98 2.49 1.6
TOTAL NONLINEAR 8 455.20 56.90 36.5
TOTAL INTERACTION 6 218.87 36.48 23.4
TOTAL NONLINEAR + INTERACTION 10 573.36 57.34 36.7
REGRESSION 14 2631.53 187.97 120.4
ERROR 1819 2839.25 1.56
P
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.203
0.203
<.0001
<.0001
<.0001
<.0001
Are more of the 'terms' significant (at p<0.05) due to pooling of
model terms? I have looked through Frank's book on the topic, but
can't quite wrap my head around what the above is telling me. I am
mostly interested in presenting a model for use as a applied tool, and
interpretation of terms / interaction is very important.
Thanks,
Dylan
Mark Difford
2008-Jul-15 09:24 UTC
[R] meaning of tests presented in anova(ols(...)) {Design package}
Hi Dylan,>> I am curious about how to interpret the table produced by >> anova(ols(...)), from the Design package.Frank will perhaps come in with more detail, but if he doesn't then you can get an understanding of what's being tested by doing the following on the saved object from your OLS call (see ?anova.Design): print(anova(ols$obj), which="sub") plot(anova(ols$obj)) HTH, Mark. Dylan Beaudette-2 wrote:> > Hi, > > I am curious about how to interpret the table produced by > anova(ols(...)), from the Design package. I have a multiple linear > regression model, with some interaction, defined by: > > ols(formula = log(ksat * 60 * 60) ~ log(sar) * pol(activity, > 3) + log(conc) * pol(sand, 3), data = sm.clean, x = TRUE, > y = TRUE) > > n Model L.R. d.f. R2 Sigma > 1834 1203 14 0.48 1.2 > > Residuals: > Min 1Q Median 3Q Max > -5.033 -0.859 0.016 0.739 4.868 > > Coefficients: > Value Std. Error t Pr(>|t|) > Intercept 11.3886790 2.0220171 5.63 0.0000000205580 > sar -4.3991263 1.0157588 -4.33 0.0000156609226 > activity -40.0591221 5.6907822 -7.04 0.0000000000027 > activity^2 33.0570116 5.0578520 6.54 0.0000000000819 > activity^3 -8.1645147 1.3750370 -5.94 0.0000000034548 > conc 0.3841260 0.0813200 4.72 0.0000024942478 > sand -0.0096212 0.0327415 -0.29 0.7689032898947 > sand^2 0.0008495 0.0008589 0.99 0.3227487169683 > sand^3 0.0000025 0.0000066 0.39 0.6994987342042 > sar * activity 12.8134698 2.9513942 4.34 0.0000149300007 > sar * activity^2 -9.9981381 2.6310765 -3.80 0.0001494462966 > sar * activity^3 2.1481278 0.7168339 3.00 0.0027662261037 > conc * sand -0.0157426 0.0076013 -2.07 0.0384966958735 > conc * sand^2 0.0003419 0.0001989 1.72 0.0857381555491 > conc * sand^3 -0.0000027 0.0000015 -1.77 0.0777025949762 > > > Looking at what I 'think' are "marginal p-values" i.e. results of a > test against coef_i != 0, there are several terms with non-significant > coefficients (at p<0.05). Does a non-significant coefficient warrant > removal from the model, or perhaps a mention in the discussion? > > Compared to the above example, what tests are performed when calling > anova() on this object? Here is the output in R: > > Analysis of Variance Response: log(ksat * 60 * 60) > > Factor d.f. Partial SS MS F > sar (Factor+Higher Order Factors) 4 168.43 42.11 > 27.0 > All Interactions 3 142.13 47.38 > 30.4 > activity (Factor+Higher Order Factors) 6 536.84 89.47 > 57.3 > All Interactions 3 142.13 47.38 > 30.4 > Nonlinear (Factor+Higher Order Factors) 4 257.25 64.31 > 41.2 > conc (Factor+Higher Order Factors) 4 443.02 110.75 > 71.0 > All Interactions 3 76.74 25.58 > 16.4 > sand (Factor+Higher Order Factors) 6 1906.29 317.71 > 203.6 > All Interactions 3 76.74 25.58 > 16.4 > Nonlinear (Factor+Higher Order Factors) 4 263.00 65.75 > 42.1 > sar * activity (Factor+Higher Order Factors) 3 142.13 47.38 > 30.4 > Nonlinear 2 95.32 47.66 > 30.5 > Nonlinear Interaction : f(A,B) vs. AB 2 95.32 47.66 > 30.5 > conc * sand (Factor+Higher Order Factors) 3 76.74 25.58 > 16.4 > Nonlinear 2 4.98 2.49 > 1.6 > Nonlinear Interaction : f(A,B) vs. AB 2 4.98 2.49 > 1.6 > TOTAL NONLINEAR 8 455.20 56.90 > 36.5 > TOTAL INTERACTION 6 218.87 36.48 > 23.4 > TOTAL NONLINEAR + INTERACTION 10 573.36 57.34 > 36.7 > REGRESSION 14 2631.53 187.97 > 120.4 > ERROR 1819 2839.25 1.56 > P > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > 0.203 > 0.203 > <.0001 > <.0001 > <.0001 > <.0001 > > Are more of the 'terms' significant (at p<0.05) due to pooling of > model terms? I have looked through Frank's book on the topic, but > can't quite wrap my head around what the above is telling me. I am > mostly interested in presenting a model for use as a applied tool, and > interpretation of terms / interaction is very important. > > Thanks, > > Dylan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/meaning-of-tests-presented-in-anova%28ols%28...%29%29-%7BDesign-package%7D-tp18458438p18461125.html Sent from the R help mailing list archive at Nabble.com.
Frank E Harrell Jr
2008-Jul-16 01:25 UTC
[R] meaning of tests presented in anova(ols(...)) {Design package}
Dylan Beaudette wrote:> Hi, > > I am curious about how to interpret the table produced by > anova(ols(...)), from the Design package. I have a multiple linear > regression model, with some interaction, defined by: > > ols(formula = log(ksat * 60 * 60) ~ log(sar) * pol(activity, > 3) + log(conc) * pol(sand, 3), data = sm.clean, x = TRUE, > y = TRUE) > > n Model L.R. d.f. R2 Sigma > 1834 1203 14 0.48 1.2 > > Residuals: > Min 1Q Median 3Q Max > -5.033 -0.859 0.016 0.739 4.868 > > Coefficients: > Value Std. Error t Pr(>|t|) > Intercept 11.3886790 2.0220171 5.63 0.0000000205580 > sar -4.3991263 1.0157588 -4.33 0.0000156609226 > activity -40.0591221 5.6907822 -7.04 0.0000000000027 > activity^2 33.0570116 5.0578520 6.54 0.0000000000819 > activity^3 -8.1645147 1.3750370 -5.94 0.0000000034548 > conc 0.3841260 0.0813200 4.72 0.0000024942478 > sand -0.0096212 0.0327415 -0.29 0.7689032898947 > sand^2 0.0008495 0.0008589 0.99 0.3227487169683 > sand^3 0.0000025 0.0000066 0.39 0.6994987342042 > sar * activity 12.8134698 2.9513942 4.34 0.0000149300007 > sar * activity^2 -9.9981381 2.6310765 -3.80 0.0001494462966 > sar * activity^3 2.1481278 0.7168339 3.00 0.0027662261037 > conc * sand -0.0157426 0.0076013 -2.07 0.0384966958735 > conc * sand^2 0.0003419 0.0001989 1.72 0.0857381555491 > conc * sand^3 -0.0000027 0.0000015 -1.77 0.0777025949762 > > > Looking at what I 'think' are "marginal p-values" i.e. results of a > test against coef_i != 0, there are several terms with non-significant > coefficients (at p<0.05). Does a non-significant coefficient warrant > removal from the model, or perhaps a mention in the discussion?No> > Compared to the above example, what tests are performed when calling > anova() on this object? Here is the output in R:Mark Difford gave a nice response for that. Frank> > Analysis of Variance Response: log(ksat * 60 * 60) > > Factor d.f. Partial SS MS F > sar (Factor+Higher Order Factors) 4 168.43 42.11 27.0 > All Interactions 3 142.13 47.38 30.4 > activity (Factor+Higher Order Factors) 6 536.84 89.47 57.3 > All Interactions 3 142.13 47.38 30.4 > Nonlinear (Factor+Higher Order Factors) 4 257.25 64.31 41.2 > conc (Factor+Higher Order Factors) 4 443.02 110.75 71.0 > All Interactions 3 76.74 25.58 16.4 > sand (Factor+Higher Order Factors) 6 1906.29 317.71 203.6 > All Interactions 3 76.74 25.58 16.4 > Nonlinear (Factor+Higher Order Factors) 4 263.00 65.75 42.1 > sar * activity (Factor+Higher Order Factors) 3 142.13 47.38 30.4 > Nonlinear 2 95.32 47.66 30.5 > Nonlinear Interaction : f(A,B) vs. AB 2 95.32 47.66 30.5 > conc * sand (Factor+Higher Order Factors) 3 76.74 25.58 16.4 > Nonlinear 2 4.98 2.49 1.6 > Nonlinear Interaction : f(A,B) vs. AB 2 4.98 2.49 1.6 > TOTAL NONLINEAR 8 455.20 56.90 36.5 > TOTAL INTERACTION 6 218.87 36.48 23.4 > TOTAL NONLINEAR + INTERACTION 10 573.36 57.34 36.7 > REGRESSION 14 2631.53 187.97 120.4 > ERROR 1819 2839.25 1.56 > P > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > 0.203 > 0.203 > <.0001 > <.0001 > <.0001 > <.0001 > > Are more of the 'terms' significant (at p<0.05) due to pooling of > model terms? I have looked through Frank's book on the topic, but > can't quite wrap my head around what the above is telling me. I am > mostly interested in presenting a model for use as a applied tool, and > interpretation of terms / interaction is very important. > > Thanks, > > Dylan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University