Dylan Beaudette
2008-Jul-15 04:34 UTC
[R] meaning of tests presented in anova(ols(...)) {Design package}
Hi, I am curious about how to interpret the table produced by anova(ols(...)), from the Design package. I have a multiple linear regression model, with some interaction, defined by: ols(formula = log(ksat * 60 * 60) ~ log(sar) * pol(activity, 3) + log(conc) * pol(sand, 3), data = sm.clean, x = TRUE, y = TRUE) n Model L.R. d.f. R2 Sigma 1834 1203 14 0.48 1.2 Residuals: Min 1Q Median 3Q Max -5.033 -0.859 0.016 0.739 4.868 Coefficients: Value Std. Error t Pr(>|t|) Intercept 11.3886790 2.0220171 5.63 0.0000000205580 sar -4.3991263 1.0157588 -4.33 0.0000156609226 activity -40.0591221 5.6907822 -7.04 0.0000000000027 activity^2 33.0570116 5.0578520 6.54 0.0000000000819 activity^3 -8.1645147 1.3750370 -5.94 0.0000000034548 conc 0.3841260 0.0813200 4.72 0.0000024942478 sand -0.0096212 0.0327415 -0.29 0.7689032898947 sand^2 0.0008495 0.0008589 0.99 0.3227487169683 sand^3 0.0000025 0.0000066 0.39 0.6994987342042 sar * activity 12.8134698 2.9513942 4.34 0.0000149300007 sar * activity^2 -9.9981381 2.6310765 -3.80 0.0001494462966 sar * activity^3 2.1481278 0.7168339 3.00 0.0027662261037 conc * sand -0.0157426 0.0076013 -2.07 0.0384966958735 conc * sand^2 0.0003419 0.0001989 1.72 0.0857381555491 conc * sand^3 -0.0000027 0.0000015 -1.77 0.0777025949762 Looking at what I 'think' are "marginal p-values" i.e. results of a test against coef_i != 0, there are several terms with non-significant coefficients (at p<0.05). Does a non-significant coefficient warrant removal from the model, or perhaps a mention in the discussion? Compared to the above example, what tests are performed when calling anova() on this object? Here is the output in R: Analysis of Variance Response: log(ksat * 60 * 60) Factor d.f. Partial SS MS F sar (Factor+Higher Order Factors) 4 168.43 42.11 27.0 All Interactions 3 142.13 47.38 30.4 activity (Factor+Higher Order Factors) 6 536.84 89.47 57.3 All Interactions 3 142.13 47.38 30.4 Nonlinear (Factor+Higher Order Factors) 4 257.25 64.31 41.2 conc (Factor+Higher Order Factors) 4 443.02 110.75 71.0 All Interactions 3 76.74 25.58 16.4 sand (Factor+Higher Order Factors) 6 1906.29 317.71 203.6 All Interactions 3 76.74 25.58 16.4 Nonlinear (Factor+Higher Order Factors) 4 263.00 65.75 42.1 sar * activity (Factor+Higher Order Factors) 3 142.13 47.38 30.4 Nonlinear 2 95.32 47.66 30.5 Nonlinear Interaction : f(A,B) vs. AB 2 95.32 47.66 30.5 conc * sand (Factor+Higher Order Factors) 3 76.74 25.58 16.4 Nonlinear 2 4.98 2.49 1.6 Nonlinear Interaction : f(A,B) vs. AB 2 4.98 2.49 1.6 TOTAL NONLINEAR 8 455.20 56.90 36.5 TOTAL INTERACTION 6 218.87 36.48 23.4 TOTAL NONLINEAR + INTERACTION 10 573.36 57.34 36.7 REGRESSION 14 2631.53 187.97 120.4 ERROR 1819 2839.25 1.56 P <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.203 0.203 <.0001 <.0001 <.0001 <.0001 Are more of the 'terms' significant (at p<0.05) due to pooling of model terms? I have looked through Frank's book on the topic, but can't quite wrap my head around what the above is telling me. I am mostly interested in presenting a model for use as a applied tool, and interpretation of terms / interaction is very important. Thanks, Dylan
Mark Difford
2008-Jul-15 09:24 UTC
[R] meaning of tests presented in anova(ols(...)) {Design package}
Hi Dylan,>> I am curious about how to interpret the table produced by >> anova(ols(...)), from the Design package.Frank will perhaps come in with more detail, but if he doesn't then you can get an understanding of what's being tested by doing the following on the saved object from your OLS call (see ?anova.Design): print(anova(ols$obj), which="sub") plot(anova(ols$obj)) HTH, Mark. Dylan Beaudette-2 wrote:> > Hi, > > I am curious about how to interpret the table produced by > anova(ols(...)), from the Design package. I have a multiple linear > regression model, with some interaction, defined by: > > ols(formula = log(ksat * 60 * 60) ~ log(sar) * pol(activity, > 3) + log(conc) * pol(sand, 3), data = sm.clean, x = TRUE, > y = TRUE) > > n Model L.R. d.f. R2 Sigma > 1834 1203 14 0.48 1.2 > > Residuals: > Min 1Q Median 3Q Max > -5.033 -0.859 0.016 0.739 4.868 > > Coefficients: > Value Std. Error t Pr(>|t|) > Intercept 11.3886790 2.0220171 5.63 0.0000000205580 > sar -4.3991263 1.0157588 -4.33 0.0000156609226 > activity -40.0591221 5.6907822 -7.04 0.0000000000027 > activity^2 33.0570116 5.0578520 6.54 0.0000000000819 > activity^3 -8.1645147 1.3750370 -5.94 0.0000000034548 > conc 0.3841260 0.0813200 4.72 0.0000024942478 > sand -0.0096212 0.0327415 -0.29 0.7689032898947 > sand^2 0.0008495 0.0008589 0.99 0.3227487169683 > sand^3 0.0000025 0.0000066 0.39 0.6994987342042 > sar * activity 12.8134698 2.9513942 4.34 0.0000149300007 > sar * activity^2 -9.9981381 2.6310765 -3.80 0.0001494462966 > sar * activity^3 2.1481278 0.7168339 3.00 0.0027662261037 > conc * sand -0.0157426 0.0076013 -2.07 0.0384966958735 > conc * sand^2 0.0003419 0.0001989 1.72 0.0857381555491 > conc * sand^3 -0.0000027 0.0000015 -1.77 0.0777025949762 > > > Looking at what I 'think' are "marginal p-values" i.e. results of a > test against coef_i != 0, there are several terms with non-significant > coefficients (at p<0.05). Does a non-significant coefficient warrant > removal from the model, or perhaps a mention in the discussion? > > Compared to the above example, what tests are performed when calling > anova() on this object? Here is the output in R: > > Analysis of Variance Response: log(ksat * 60 * 60) > > Factor d.f. Partial SS MS F > sar (Factor+Higher Order Factors) 4 168.43 42.11 > 27.0 > All Interactions 3 142.13 47.38 > 30.4 > activity (Factor+Higher Order Factors) 6 536.84 89.47 > 57.3 > All Interactions 3 142.13 47.38 > 30.4 > Nonlinear (Factor+Higher Order Factors) 4 257.25 64.31 > 41.2 > conc (Factor+Higher Order Factors) 4 443.02 110.75 > 71.0 > All Interactions 3 76.74 25.58 > 16.4 > sand (Factor+Higher Order Factors) 6 1906.29 317.71 > 203.6 > All Interactions 3 76.74 25.58 > 16.4 > Nonlinear (Factor+Higher Order Factors) 4 263.00 65.75 > 42.1 > sar * activity (Factor+Higher Order Factors) 3 142.13 47.38 > 30.4 > Nonlinear 2 95.32 47.66 > 30.5 > Nonlinear Interaction : f(A,B) vs. AB 2 95.32 47.66 > 30.5 > conc * sand (Factor+Higher Order Factors) 3 76.74 25.58 > 16.4 > Nonlinear 2 4.98 2.49 > 1.6 > Nonlinear Interaction : f(A,B) vs. AB 2 4.98 2.49 > 1.6 > TOTAL NONLINEAR 8 455.20 56.90 > 36.5 > TOTAL INTERACTION 6 218.87 36.48 > 23.4 > TOTAL NONLINEAR + INTERACTION 10 573.36 57.34 > 36.7 > REGRESSION 14 2631.53 187.97 > 120.4 > ERROR 1819 2839.25 1.56 > P > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > 0.203 > 0.203 > <.0001 > <.0001 > <.0001 > <.0001 > > Are more of the 'terms' significant (at p<0.05) due to pooling of > model terms? I have looked through Frank's book on the topic, but > can't quite wrap my head around what the above is telling me. I am > mostly interested in presenting a model for use as a applied tool, and > interpretation of terms / interaction is very important. > > Thanks, > > Dylan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/meaning-of-tests-presented-in-anova%28ols%28...%29%29-%7BDesign-package%7D-tp18458438p18461125.html Sent from the R help mailing list archive at Nabble.com.
Frank E Harrell Jr
2008-Jul-16 01:25 UTC
[R] meaning of tests presented in anova(ols(...)) {Design package}
Dylan Beaudette wrote:> Hi, > > I am curious about how to interpret the table produced by > anova(ols(...)), from the Design package. I have a multiple linear > regression model, with some interaction, defined by: > > ols(formula = log(ksat * 60 * 60) ~ log(sar) * pol(activity, > 3) + log(conc) * pol(sand, 3), data = sm.clean, x = TRUE, > y = TRUE) > > n Model L.R. d.f. R2 Sigma > 1834 1203 14 0.48 1.2 > > Residuals: > Min 1Q Median 3Q Max > -5.033 -0.859 0.016 0.739 4.868 > > Coefficients: > Value Std. Error t Pr(>|t|) > Intercept 11.3886790 2.0220171 5.63 0.0000000205580 > sar -4.3991263 1.0157588 -4.33 0.0000156609226 > activity -40.0591221 5.6907822 -7.04 0.0000000000027 > activity^2 33.0570116 5.0578520 6.54 0.0000000000819 > activity^3 -8.1645147 1.3750370 -5.94 0.0000000034548 > conc 0.3841260 0.0813200 4.72 0.0000024942478 > sand -0.0096212 0.0327415 -0.29 0.7689032898947 > sand^2 0.0008495 0.0008589 0.99 0.3227487169683 > sand^3 0.0000025 0.0000066 0.39 0.6994987342042 > sar * activity 12.8134698 2.9513942 4.34 0.0000149300007 > sar * activity^2 -9.9981381 2.6310765 -3.80 0.0001494462966 > sar * activity^3 2.1481278 0.7168339 3.00 0.0027662261037 > conc * sand -0.0157426 0.0076013 -2.07 0.0384966958735 > conc * sand^2 0.0003419 0.0001989 1.72 0.0857381555491 > conc * sand^3 -0.0000027 0.0000015 -1.77 0.0777025949762 > > > Looking at what I 'think' are "marginal p-values" i.e. results of a > test against coef_i != 0, there are several terms with non-significant > coefficients (at p<0.05). Does a non-significant coefficient warrant > removal from the model, or perhaps a mention in the discussion?No> > Compared to the above example, what tests are performed when calling > anova() on this object? Here is the output in R:Mark Difford gave a nice response for that. Frank> > Analysis of Variance Response: log(ksat * 60 * 60) > > Factor d.f. Partial SS MS F > sar (Factor+Higher Order Factors) 4 168.43 42.11 27.0 > All Interactions 3 142.13 47.38 30.4 > activity (Factor+Higher Order Factors) 6 536.84 89.47 57.3 > All Interactions 3 142.13 47.38 30.4 > Nonlinear (Factor+Higher Order Factors) 4 257.25 64.31 41.2 > conc (Factor+Higher Order Factors) 4 443.02 110.75 71.0 > All Interactions 3 76.74 25.58 16.4 > sand (Factor+Higher Order Factors) 6 1906.29 317.71 203.6 > All Interactions 3 76.74 25.58 16.4 > Nonlinear (Factor+Higher Order Factors) 4 263.00 65.75 42.1 > sar * activity (Factor+Higher Order Factors) 3 142.13 47.38 30.4 > Nonlinear 2 95.32 47.66 30.5 > Nonlinear Interaction : f(A,B) vs. AB 2 95.32 47.66 30.5 > conc * sand (Factor+Higher Order Factors) 3 76.74 25.58 16.4 > Nonlinear 2 4.98 2.49 1.6 > Nonlinear Interaction : f(A,B) vs. AB 2 4.98 2.49 1.6 > TOTAL NONLINEAR 8 455.20 56.90 36.5 > TOTAL INTERACTION 6 218.87 36.48 23.4 > TOTAL NONLINEAR + INTERACTION 10 573.36 57.34 36.7 > REGRESSION 14 2631.53 187.97 120.4 > ERROR 1819 2839.25 1.56 > P > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > <.0001 > 0.203 > 0.203 > <.0001 > <.0001 > <.0001 > <.0001 > > Are more of the 'terms' significant (at p<0.05) due to pooling of > model terms? I have looked through Frank's book on the topic, but > can't quite wrap my head around what the above is telling me. I am > mostly interested in presenting a model for use as a applied tool, and > interpretation of terms / interaction is very important. > > Thanks, > > Dylan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Apparently Analagous Threads
- extra panel arguments to plot.nmGroupedData {nlme}
- Problem with nls.lm function of minpack.lm package.
- Pacaging/build issues with AIX and vac (dovecot-2.2.25)
- DF and intercept term meaning for mixed (lme) models
- Choosing the optimum lag order of ARIMA model