Hello; I am having a problems with the interpretation of models using ordered or unordered predictors. I am running models in lmer but I will try to give a simplified example data set using lm. Both in the example and in my real data set I use a predictor variable referring to 3 consecutive days of an experiment. It is a factor, and I thought it would be more correct to consider it ordered. Below is my example code with my comments/ideas along it. Can someone help me to understand what is happening? Thanks a lot in advance; Catarina Miranda y<-c(72,25,24,2,18,38,62,30,78,34,67,21,97,79,64,53,27,81) Day<-c(rep("Day 1",6),rep("Day 2",6),rep("Day 3",6)) dataf<-data.frame(y,Day) str(dataf) #Day is not ordered #'data.frame': 18 obs. of 2 variables: # $ y : num 72 25 24 2 18 38 62 30 78 34 ... # $ Day: Factor w/ 3 levels "Day 1","Day 2",..: 1 1 1 1 1 1 2 2 2 2 ... summary(lm(y~Day,data=dataf)) #Day 2 is not significantly different from Day 1, but Day 3 is. # #Call: #lm(formula = y ~ Day, data = dataf) # #Residuals: # Min 1Q Median 3Q Max #-39.833 -14.458 -3.833 13.958 42.167 # #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 29.833 9.755 3.058 0.00797 ** #DayDay 2 18.833 13.796 1.365 0.19234 #DayDay 3 37.000 13.796 2.682 0.01707 * #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 23.9 on 15 degrees of freedom #Multiple R-squared: 0.3241, Adjusted R-squared: 0.234 #F-statistic: 3.597 on 2 and 15 DF, p-value: 0.05297 # dataf$Day<-ordered(dataf$Day) str(dataf) # "Day 1"<"Day 2"<"Day 3" #'data.frame': 18 obs. of 2 variables: # $ y : num 72 25 24 2 18 38 62 30 78 34 ... # $ Day: Ord.factor w/ 3 levels "Day 1"<"Day 2"<..: 1 1 1 1 1 1 2 2 2 2 ... summary(lm(y~Day,data=dataf)) #Significances reversed (or "Day.L" and "Day.Q" are not sinonimous "Day 2" and "Day 3"?): Day 2 (".L") is significantly different from Day 1, but Day 3 (.Q) isn't. #Call: #lm(formula = y ~ Day, data = dataf) # #Residuals: # Min 1Q Median 3Q Max #-39.833 -14.458 -3.833 13.958 42.167 # #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 48.4444 5.6322 8.601 3.49e-07 *** #Day.L 26.1630 9.7553 2.682 0.0171 * #Day.Q -0.2722 9.7553 -0.028 0.9781 #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 23.9 on 15 degrees of freedom #Multiple R-squared: 0.3241, Adjusted R-squared: 0.234 #F-statistic: 3.597 on 2 and 15 DF, p-value: 0.05297 [[alternative HTML version deleted]]
Ordered factors use orthogonal polynomial contrasts by default. The .L and .Q stand for the linear and quadratic terms. Unordered factors use "treatment" contrasts although (they're actually not contrasts), that are interpreted as you described. If you do not know what this means, you need to do some reading on linear models/multiple regression. Try posting on http://stats.stackexchange.com/ or, as always, consult your local statistician for help. V&R's MASS book also contains a useful but terse discussion on these issues. Cheers, Bert On Tue, Nov 15, 2011 at 7:00 AM, Catarina Miranda < catarina.miranda@gmail.com> wrote:> Hello; > > I am having a problems with the interpretation of models using ordered or > unordered predictors. > I am running models in lmer but I will try to give a simplified example > data set using lm. > Both in the example and in my real data set I use a predictor variable > referring to 3 consecutive days of an experiment. It is a factor, and I > thought it would be more correct to consider it ordered. > Below is my example code with my comments/ideas along it. > Can someone help me to understand what is happening? > > Thanks a lot in advance; > > Catarina Miranda > > > y<-c(72,25,24,2,18,38,62,30,78,34,67,21,97,79,64,53,27,81) > > Day<-c(rep("Day 1",6),rep("Day 2",6),rep("Day 3",6)) > > dataf<-data.frame(y,Day) > > str(dataf) #Day is not ordered > #'data.frame': 18 obs. of 2 variables: > # $ y : num 72 25 24 2 18 38 62 30 78 34 ... > # $ Day: Factor w/ 3 levels "Day 1","Day 2",..: 1 1 1 1 1 1 2 2 2 2 ... > > summary(lm(y~Day,data=dataf)) #Day 2 is not significantly different from > Day 1, but Day 3 is. > # > #Call: > #lm(formula = y ~ Day, data = dataf) > # > #Residuals: > # Min 1Q Median 3Q Max > #-39.833 -14.458 -3.833 13.958 42.167 > # > #Coefficients: > # Estimate Std. Error t value Pr(>|t|) > #(Intercept) 29.833 9.755 3.058 0.00797 ** > #DayDay 2 18.833 13.796 1.365 0.19234 > #DayDay 3 37.000 13.796 2.682 0.01707 * > #--- > #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > # > #Residual standard error: 23.9 on 15 degrees of freedom > #Multiple R-squared: 0.3241, Adjusted R-squared: 0.234 > #F-statistic: 3.597 on 2 and 15 DF, p-value: 0.05297 > # > > dataf$Day<-ordered(dataf$Day) > > str(dataf) # "Day 1"<"Day 2"<"Day 3" > #'data.frame': 18 obs. of 2 variables: > # $ y : num 72 25 24 2 18 38 62 30 78 34 ... > # $ Day: Ord.factor w/ 3 levels "Day 1"<"Day 2"<..: 1 1 1 1 1 1 2 2 2 2 ... > > summary(lm(y~Day,data=dataf)) #Significances reversed (or "Day.L" and > "Day.Q" are not sinonimous "Day 2" and "Day 3"?): Day 2 (".L") is > significantly different from Day 1, but Day 3 (.Q) isn't. > > #Call: > #lm(formula = y ~ Day, data = dataf) > # > #Residuals: > # Min 1Q Median 3Q Max > #-39.833 -14.458 -3.833 13.958 42.167 > # > #Coefficients: > # Estimate Std. Error t value Pr(>|t|) > #(Intercept) 48.4444 5.6322 8.601 3.49e-07 *** > #Day.L 26.1630 9.7553 2.682 0.0171 * > #Day.Q -0.2722 9.7553 -0.028 0.9781 > #--- > #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > # > #Residual standard error: 23.9 on 15 degrees of freedom > #Multiple R-squared: 0.3241, Adjusted R-squared: 0.234 > #F-statistic: 3.597 on 2 and 15 DF, p-value: 0.05297 > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]]
On Tue, Nov 15, 2011 at 9:00 AM, Catarina Miranda <catarina.miranda at gmail.com> wrote:> Hello; > > I am having a problems with the interpretation of models using ordered or > unordered predictors. > I am running models in lmer but I will try to give a simplified example > data set using lm. > Both in the example and in my real data set I use a predictor variable > referring to 3 consecutive days of an experiment. It is a factor, and I > thought it would be more correct to consider it ordered. > Below is my example code with my comments/ideas along it. > Can someone help me to understand what is happening?Dear Catarina: I have had the same question, and I hope my answers help you understand what's going on. The short version: http://pj.freefaculty.org/R/WorkingExamples/orderedFactor-01.R The longer version, "Working with Ordinal Predictors" http://pj.freefaculty.org/ResearchPapers/MidWest09/Midwest09.pdf HTH pj> > Thanks a lot in advance; > > Catarina Miranda > > > y<-c(72,25,24,2,18,38,62,30,78,34,67,21,97,79,64,53,27,81) > > Day<-c(rep("Day 1",6),rep("Day 2",6),rep("Day 3",6)) > > dataf<-data.frame(y,Day) > > str(dataf) #Day is not ordered > #'data.frame': ? 18 obs. of ?2 variables: > # $ y ?: num ?72 25 24 2 18 38 62 30 78 34 ... > # $ Day: Factor w/ 3 levels "Day 1","Day 2",..: 1 1 1 1 1 1 2 2 2 2 ... > > summary(lm(y~Day,data=dataf)) ?#Day 2 is not significantly different from > Day 1, but Day 3 is. > # > #Call: > #lm(formula = y ~ Day, data = dataf) > # > #Residuals: > # ? ?Min ? ? ?1Q ?Median ? ? ?3Q ? ? Max > #-39.833 -14.458 ?-3.833 ?13.958 ?42.167 > # > #Coefficients: > # ? ? ? ? ? ?Estimate Std. Error t value Pr(>|t|) > #(Intercept) ? 29.833 ? ? ?9.755 ? 3.058 0.00797 ** > #DayDay 2 ? ? ?18.833 ? ? 13.796 ? 1.365 ?0.19234 > #DayDay 3 ? ? ?37.000 ? ? 13.796 ? 2.682 ?0.01707 * > #--- > #Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > # > #Residual standard error: 23.9 on 15 degrees of freedom > #Multiple R-squared: 0.3241, ? ? Adjusted R-squared: 0.234 > #F-statistic: 3.597 on 2 and 15 DF, ?p-value: 0.05297 > # > > dataf$Day<-ordered(dataf$Day) > > str(dataf) # "Day 1"<"Day 2"<"Day 3" > #'data.frame': ? 18 obs. of ?2 variables: > # $ y ?: num ?72 25 24 2 18 38 62 30 78 34 ... > # $ Day: Ord.factor w/ 3 levels "Day 1"<"Day 2"<..: 1 1 1 1 1 1 2 2 2 2 ... > > summary(lm(y~Day,data=dataf)) #Significances reversed (or "Day.L" and > "Day.Q" are not sinonimous "Day 2" and "Day 3"?): Day 2 (".L") is > significantly different from Day 1, but Day 3 (.Q) isn't. > > #Call: > #lm(formula = y ~ Day, data = dataf) > # > #Residuals: > # ? ?Min ? ? ?1Q ?Median ? ? ?3Q ? ? Max > #-39.833 -14.458 ?-3.833 ?13.958 ?42.167 > # > #Coefficients: > # ? ? ? ? ? ?Estimate Std. Error t value Pr(>|t|) > #(Intercept) ?48.4444 ? ? 5.6322 ? 8.601 3.49e-07 *** > #Day.L ? ? ? ?26.1630 ? ? 9.7553 ? 2.682 ? 0.0171 * > #Day.Q ? ? ? ?-0.2722 ? ? 9.7553 ?-0.028 ? 0.9781 > #--- > #Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > # > #Residual standard error: 23.9 on 15 degrees of freedom > #Multiple R-squared: 0.3241, ? ? Adjusted R-squared: 0.234 > #F-statistic: 3.597 on 2 and 15 DF, ?p-value: 0.05297 > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas