Ravi Varadhan
2015-Jan-21 22:37 UTC
[R] Re-order levels of a categorical (factor) variable
Hi,
I have a fairly elementary problem that I am unable to figure out. I have a
continuous variable, Y, repeatedly measured at multiple times, T. The variable
T is however is coded as a factor variable having these levels:
c("Presurgery", "Day 30", "Day 60", "Day
180", "Day 365").
When I plot the boxplot, e.g., boxplot(Y ~ T), it displays the boxes in this
order: c("Day 180", "Day 30", "Day 365",
"Day 60", "Presurgery").
Is there a way to control the order of the boxes such that they are plotted in a
particular order that I want, for example: c("Presurgery", "Day
30", "Day 60", "Day 180", "Day 365")?
More generally, is there a simple way to redefine the ordering of the
categorical variable such that this ordering will be used in whatever operation
is done? I looked at relevel, reorder, etc., but they did not seem to be
applicable to my problem.
Thanks for any help.
Best,
Ravi
[[alternative HTML version deleted]]
William Dunlap
2015-Jan-22 00:36 UTC
[R] Re-order levels of a categorical (factor) variable
Are you sure the factors of T are in the order you think they are? (Are you
sure you are using the expected version of T.) Use print(levels(T)) to
make
sure.
I tried
timeCats <- c("Presurgery", "Day 30", "Day
60", "Day 180", "Day 365")
d <- data.frame(T = factor(rep(timeCats, 11:15), levels=timeCats),
Y=seq_len(sum(11:15)))
boxplot(Y ~ T, data=d)
and the boxes and labels are in the order given in 'timeCats'.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Jan 21, 2015 at 2:37 PM, Ravi Varadhan <ravi.varadhan at jhu.edu>
wrote:
> Hi,
> I have a fairly elementary problem that I am unable to figure out. I have
> a continuous variable, Y, repeatedly measured at multiple times, T. The
> variable T is however is coded as a factor variable having these levels:
> c("Presurgery", "Day 30", "Day 60",
"Day 180", "Day 365").
> When I plot the boxplot, e.g., boxplot(Y ~ T), it displays the boxes in
> this order: c("Day 180", "Day 30", "Day
365", "Day 60", "Presurgery").
> Is there a way to control the order of the boxes such that they are
> plotted in a particular order that I want, for example:
c("Presurgery",
> "Day 30", "Day 60", "Day 180", "Day
365")?
>
> More generally, is there a simple way to redefine the ordering of the
> categorical variable such that this ordering will be used in whatever
> operation is done? I looked at relevel, reorder, etc., but they did not
> seem to be applicable to my problem.
>
> Thanks for any help.
>
> Best,
> Ravi
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Bill/Ravi: I believe the problem is that the factor is automatically created when a data frame is created by read.table(). By default, the levels are lexicographically ordered. The following reproduces the problem and gives a solution.>library(lattice)> z <- data.frame(y = 1:9, x = rep(c("pre", "day2","day10"))) > xyplot(y~x,data=z) ## x axis order is day 10, day2, pre> levels(z$x)[1] "day10" "day2" "pre"> z$x <- factor(as.character(z$x),levels=c(levels(z$x)[3:1])) ## explicitly defines level order > xyplot(y~x,data=z) ## desired plotCheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Jan 21, 2015 at 4:36 PM, William Dunlap <wdunlap at tibco.com> wrote:> Are you sure the factors of T are in the order you think they are? (Are you > sure you are using the expected version of T.) Use print(levels(T)) to > make > sure. > > I tried > timeCats <- c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365") > d <- data.frame(T = factor(rep(timeCats, 11:15), levels=timeCats), > Y=seq_len(sum(11:15))) > boxplot(Y ~ T, data=d) > and the boxes and labels are in the order given in 'timeCats'. > > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Jan 21, 2015 at 2:37 PM, Ravi Varadhan <ravi.varadhan at jhu.edu> > wrote: > >> Hi, >> I have a fairly elementary problem that I am unable to figure out. I have >> a continuous variable, Y, repeatedly measured at multiple times, T. The >> variable T is however is coded as a factor variable having these levels: >> c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365"). >> When I plot the boxplot, e.g., boxplot(Y ~ T), it displays the boxes in >> this order: c("Day 180", "Day 30", "Day 365", "Day 60", "Presurgery"). >> Is there a way to control the order of the boxes such that they are >> plotted in a particular order that I want, for example: c("Presurgery", >> "Day 30", "Day 60", "Day 180", "Day 365")? >> >> More generally, is there a simple way to redefine the ordering of the >> categorical variable such that this ordering will be used in whatever >> operation is done? I looked at relevel, reorder, etc., but they did not >> seem to be applicable to my problem. >> >> Thanks for any help. >> >> Best, >> Ravi >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.