Tal Galili
2011-Dec-02 14:51 UTC
[R] Unexplained behavior of level names when using ordered factors in lm?
Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254)> a <- data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) > summary(lm(y ~ x, a))The output I get includes factor levels which are not relevant to what I am actually using: Call:> lm(formula = y ~ x, data = a) > Residuals: > Min 1Q Median 3Q Max > -1.4096 -0.6400 -0.1244 0.5886 2.1891 > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.03276 0.15169 -0.216 0.830 > x.L -0.28968 0.33866 -0.855 0.398 > x.Q -0.38813 0.33851 -1.147 0.259 > x.C -0.27183 0.34027 -0.799 0.430 > x^4 0.25993 0.33935 0.766 0.449 > Residual standard error: 0.9564 on 35 degrees of freedom > Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 > F-statistic: 0.8202 on 4 and 35 DF, p-value: 0.5211I am guessing that this is having something to do with the contrast matrix that is used, but this is not clear to me. Can anyone suggest a good read, or an explanation? Thanks. ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
David Winsemius
2011-Dec-02 15:04 UTC
[R] Unexplained behavior of level names when using ordered factors in lm?
On Dec 2, 2011, at 9:51 AM, Tal Galili wrote:> Hello dear all, > > I am unable to understand why when I run the following three lines: > > set.seed(4254) >> a <- data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) >> summary(lm(y ~ x, a)) > > > The output I get includes factor levels which are not relevant to > what I am > actually using: > > Call: >> lm(formula = y ~ x, data = a) >> Residuals: >> Min 1Q Median 3Q Max >> -1.4096 -0.6400 -0.1244 0.5886 2.1891 >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) -0.03276 0.15169 -0.216 0.830 >> x.L -0.28968 0.33866 -0.855 0.398 >> x.Q -0.38813 0.33851 -1.147 0.259 >> x.C -0.27183 0.34027 -0.799 0.430 >> x^4 0.25993 0.33935 0.766 0.449Those are polynomial contrasts: linear, quadratic, cubic and quartic. If you don't want contrasts based on ordered factors then just use regular factors. You should probably be looking at: ?"C" (...yet another function whose name should be avoided in naming data- objects.) -- David.>> Residual standard error: 0.9564 on 35 degrees of freedom >> Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 >> F-statistic: 0.8202 on 4 and 35 DF, p-value: 0.5211 > > > I am guessing that this is having something to do with the contrast > matrix > that is used, but this is not clear to me. > Can anyone suggest a good read, or an explanation? > > Thanks. > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il > (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Bert Gunter
2011-Dec-02 15:06 UTC
[R] Unexplained behavior of level names when using ordered factors in lm?
?ordered ?C ?contr.poly If you don't know what polynomial contrasts are, consult any good linear models text. MASS has a good, though a bit terse, section on this. -- Bert On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili <tal.galili at gmail.com> wrote:> Hello dear all, > > I am unable to understand why when I run the following three lines: > > set.seed(4254) >> a <- data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) >> summary(lm(y ~ x, a)) > > > The output I get includes factor levels which are not relevant to what I am > actually using: > > Call: >> lm(formula = y ~ x, data = a) >> Residuals: >> ? ? Min ? ? ?1Q ?Median ? ? ?3Q ? ? Max >> -1.4096 -0.6400 -0.1244 ?0.5886 ?2.1891 >> Coefficients: >> ? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|) >> (Intercept) -0.03276 ? ?0.15169 ?-0.216 ? ?0.830 >> x.L ? ? ? ? -0.28968 ? ?0.33866 ?-0.855 ? ?0.398 >> x.Q ? ? ? ? -0.38813 ? ?0.33851 ?-1.147 ? ?0.259 >> x.C ? ? ? ? -0.27183 ? ?0.34027 ?-0.799 ? ?0.430 >> x^4 ? ? ? ? ?0.25993 ? ?0.33935 ? 0.766 ? ?0.449 >> Residual standard error: 0.9564 on 35 degrees of freedom >> Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 >> F-statistic: 0.8202 on 4 and 35 DF, ?p-value: 0.5211 > > > I am guessing that this is having something to do with the contrast matrix > that is used, but this is not clear to me. > Can anyone suggest a good read, or an explanation? > > Thanks. > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | ?972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm