Hello, Everybody:
This may not be a "bug", but for me it is an unexpected outcome. A
factor variable's levels
do not retain their ordering after the levels function is used. I
supply an example in which
a factor with values "BC" "AD" (in that order) is
unintentionally
re-alphabetized by the levels
function.
To me, this is very bad behavior. Would you agree?
# Paul Johnson 2012-02-05
x <-
c("AD","BC","AD","BC","AD","BC")
xf <- factor(x, levels=c("BC", "AD"),
labels=c("Before Christ","After Christ"))
y <- rnorm(6)
m1 <- lm (y ~ xf )
plot(y ~ xf)
abline (m1)
## Just a little problem the line does not "go through" the box
## plot in the right spot because contrasts(xf) is 0,1 but
## the plot uses xf in 1,2.
xlevels <- levels(xf)
newdf <- data.frame(xf=xlevels)
ypred <- predict(m1, newdata=newdf)
##Watch now: the plot comes out "reversed", AC before BC
plot(ypred ~ newdf$xf)
## Ah. Now I see:
levels(newdf$xf)
## Why doesnt newdf$xf respect the ordering of the levels?
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
On 25.02.2012 19:16, Paul Johnson wrote:> Hello, Everybody: > > This may not be a "bug", but for me it is an unexpected outcome. A > factor variable's levels > do not retain their ordering after the levels function is used. I > supply an example in which > a factor with values "BC" "AD" (in that order) is unintentionally > re-alphabetized by the levels > function. > > To me, this is very bad behavior. Would you agree? > > > # Paul Johnson 2012-02-05 > > x<- c("AD","BC","AD","BC","AD","BC") > xf<- factor(x, levels=c("BC", "AD"), labels=c("Before Christ","After Christ")) > y<- rnorm(6) > > m1<- lm (y ~ xf ) > > plot(y ~ xf) > > abline (m1) > ## Just a little problem the line does not "go through" the box > ## plot in the right spot because contrasts(xf) is 0,1 but > ## the plot uses xf in 1,2. > > xlevels<- levels(xf) > newdf<- data.frame(xf=xlevels) > > ypred<- predict(m1, newdata=newdf) > > ##Watch now: the plot comes out "reversed", AC before BC > plot(ypred ~ newdf$xf) > > ## Ah. Now I see: > > levels(newdf$xf) > ## Why doesnt newdf$xf respect the ordering of the levels?Because xlevels was a character and you coerced it to a factor by calling data.frame(xf=xlevels) on it without telling anything about the orderiung, hence it got sorted lexicographically. Uwe Ligges> > >