Hi all! I'm fairly new to R and not too experienced with regression. Because of one or both of those traits, I'm not seeing why some terms are being dropped from my model when doing a regression using lm(). I am trying to do a regression on some experimental data d, which has two numeric predictors, p1 and p2, and one numeric response, r. The aim is to compare polynomial models in p1 and p2 up to third order. I don't understand why lm() doesn't return coefficients for the p1^3 and p2^3 terms. Similar loss of terms happened when I tried orthonormal polynomials to third order. I'm satisfied with the second-order regression, by the way, but I'd still like to understand why the third-order regression doesn't work like I'd expect. Can anyone offer a pointer to help me understand this? Here's what I'm seeing in R 1.9.1 for Windows. Note the NA's for p1^3 and p2^3 in the last summary.> d <- read.csv("DOE_data.csv") > d$p1[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0 [34] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 [67] 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2> d$p2[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 [34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 [67] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2> summary(d$r)Min. 1st Qu. Median Mean 3rd Qu. Max. 18.68 19.88 21.94 21.48 22.64 24.36> summary(d.lm1 <- lm(r ~ p1 + p2, data=d))Call: lm(formula = r ~ p1 + p2, data = d) Residuals: Min 1Q Median 3Q Max -0.58107 -0.09248 0.02492 0.26061 0.49617 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 18.66417 0.06591 283.17 <2e-16 *** p1 1.96145 0.04036 48.60 <2e-16 *** p2 0.85801 0.04036 21.26 <2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.3126 on 87 degrees of freedom Multiple R-Squared: 0.97, Adjusted R-squared: 0.9693 F-statistic: 1407 on 2 and 87 DF, p-value: < 2.2e-16> summary(d.lm2 <- update(d.lm1, . ~ . + I(p1^2) + I(p2^2) + I(p1 * p2)))Call: lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2), data = d) Residuals: Min 1Q Median 3Q Max -0.106813 -0.021568 0.003214 0.025083 0.084687 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 18.701098 0.011265 1660.14 <2e-16 *** p1 2.674525 0.019511 137.08 <2e-16 *** p2 0.984765 0.019511 50.47 <2e-16 *** I(p1^2) -0.489210 0.008875 -55.12 <2e-16 *** I(p2^2) -0.196050 0.008875 -22.09 <2e-16 *** I(p1 * p2) 0.265345 0.006275 42.28 <2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.03969 on 84 degrees of freedom Multiple R-Squared: 0.9995, Adjusted R-squared: 0.9995 F-statistic: 3.598e+04 on 5 and 84 DF, p-value: < 2.2e-16> summary(d.lm3 <- update(d.lm2, . ~ . + I(p1^3) + I(p2^3) + I(p1^2*p2) +I(p1*p2^2))) Call: lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2) + I(p1^3) + I(p2^3) + I(p1^2 * p2) + I(p1 * p2^2), data = d) Residuals: Min 1Q Median 3Q Max -0.089823 -0.017707 0.001952 0.020820 0.059302 Coefficients: (2 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 18.728958 0.009657 1939.365 < 2e-16 *** p1 2.604190 0.022970 113.376 < 2e-16 *** p2 0.860080 0.022970 37.444 < 2e-16 *** I(p1^2) -0.463725 0.010950 -42.348 < 2e-16 *** I(p2^2) -0.137955 0.010950 -12.598 < 2e-16 *** I(p1 * p2) 0.432505 0.024486 17.664 < 2e-16 *** I(p1^3) NA NA NA NA I(p2^3) NA NA NA NA I(p1^2 * p2) -0.025485 0.008482 -3.005 0.00353 ** I(p1 * p2^2) -0.058095 0.008482 -6.849 1.26e-09 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.03097 on 82 degrees of freedom Multiple R-Squared: 0.9997, Adjusted R-squared: 0.9997 F-statistic: 4.221e+04 on 7 and 82 DF, p-value: < 2.2e-16 Thanks and best regards, John
John Pitney wrote:> Hi all! > > I'm fairly new to R and not too experienced with regression. Because > of one or both of those traits, I'm not seeing why some terms are being > dropped from my model when doing a regression using lm(). > > I am trying to do a regression on some experimental data d, which has > two numeric predictors, p1 and p2, and one numeric response, r. The aim > is to compare polynomial models in p1 and p2 up to third order. I don't > understand why lm() doesn't return coefficients for the p1^3 and p2^3 > terms. Similar loss of terms happened when I tried orthonormal > polynomials to third order. > > I'm satisfied with the second-order regression, by the way, but I'd > still like to understand why the third-order regression doesn't work > like I'd expect. > > Can anyone offer a pointer to help me understand this? > > Here's what I'm seeing in R 1.9.1 for Windows. Note the NA's for p1^3 > and p2^3 in the last summary. > > >>d <- read.csv("DOE_data.csv") >>d$p1 > > [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0 > [34] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 > [67] 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 > >>d$p2 > > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 > [34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 > [67] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > >>summary(d$r) > > Min. 1st Qu. Median Mean 3rd Qu. Max. > 18.68 19.88 21.94 21.48 22.64 24.36 > >>summary(d.lm1 <- lm(r ~ p1 + p2, data=d)) > > > Call: > lm(formula = r ~ p1 + p2, data = d) > > Residuals: > Min 1Q Median 3Q Max > -0.58107 -0.09248 0.02492 0.26061 0.49617 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 18.66417 0.06591 283.17 <2e-16 *** > p1 1.96145 0.04036 48.60 <2e-16 *** > p2 0.85801 0.04036 21.26 <2e-16 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.3126 on 87 degrees of freedom > Multiple R-Squared: 0.97, Adjusted R-squared: 0.9693 > F-statistic: 1407 on 2 and 87 DF, p-value: < 2.2e-16 > > >>summary(d.lm2 <- update(d.lm1, . ~ . + I(p1^2) + I(p2^2) + I(p1 * p2))) > > > Call: > lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2), data = d) > > Residuals: > Min 1Q Median 3Q Max > -0.106813 -0.021568 0.003214 0.025083 0.084687 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 18.701098 0.011265 1660.14 <2e-16 *** > p1 2.674525 0.019511 137.08 <2e-16 *** > p2 0.984765 0.019511 50.47 <2e-16 *** > I(p1^2) -0.489210 0.008875 -55.12 <2e-16 *** > I(p2^2) -0.196050 0.008875 -22.09 <2e-16 *** > I(p1 * p2) 0.265345 0.006275 42.28 <2e-16 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.03969 on 84 degrees of freedom > Multiple R-Squared: 0.9995, Adjusted R-squared: 0.9995 > F-statistic: 3.598e+04 on 5 and 84 DF, p-value: < 2.2e-16 > > >>summary(d.lm3 <- update(d.lm2, . ~ . + I(p1^3) + I(p2^3) + I(p1^2*p2) + > > I(p1*p2^2))) > > Call: > lm(formula = r ~ p1 + p2 + I(p1^2) + I(p2^2) + I(p1 * p2) + I(p1^3) + > I(p2^3) + I(p1^2 * p2) + I(p1 * p2^2), data = d) > > Residuals: > Min 1Q Median 3Q Max > -0.089823 -0.017707 0.001952 0.020820 0.059302 > > Coefficients: (2 not defined because of singularities)Did you miss reading the above line? Seems you supplied a singular model to `lm' and since the default for `lm' is `singular.ok = TRUE,' it just pivoted these columns out in the QR-decomposition. --sundar> Estimate Std. Error t value Pr(>|t|) > (Intercept) 18.728958 0.009657 1939.365 < 2e-16 *** > p1 2.604190 0.022970 113.376 < 2e-16 *** > p2 0.860080 0.022970 37.444 < 2e-16 *** > I(p1^2) -0.463725 0.010950 -42.348 < 2e-16 *** > I(p2^2) -0.137955 0.010950 -12.598 < 2e-16 *** > I(p1 * p2) 0.432505 0.024486 17.664 < 2e-16 *** > I(p1^3) NA NA NA NA > I(p2^3) NA NA NA NA > I(p1^2 * p2) -0.025485 0.008482 -3.005 0.00353 ** > I(p1 * p2^2) -0.058095 0.008482 -6.849 1.26e-09 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.03097 on 82 degrees of freedom > Multiple R-Squared: 0.9997, Adjusted R-squared: 0.9997 > F-statistic: 4.221e+04 on 7 and 82 DF, p-value: < 2.2e-16 >