Ravi Varadhan
2015-Jan-21 22:37 UTC
[R] Re-order levels of a categorical (factor) variable
Hi, I have a fairly elementary problem that I am unable to figure out. I have a continuous variable, Y, repeatedly measured at multiple times, T. The variable T is however is coded as a factor variable having these levels: c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365"). When I plot the boxplot, e.g., boxplot(Y ~ T), it displays the boxes in this order: c("Day 180", "Day 30", "Day 365", "Day 60", "Presurgery"). Is there a way to control the order of the boxes such that they are plotted in a particular order that I want, for example: c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365")? More generally, is there a simple way to redefine the ordering of the categorical variable such that this ordering will be used in whatever operation is done? I looked at relevel, reorder, etc., but they did not seem to be applicable to my problem. Thanks for any help. Best, Ravi [[alternative HTML version deleted]]
William Dunlap
2015-Jan-22 00:36 UTC
[R] Re-order levels of a categorical (factor) variable
Are you sure the factors of T are in the order you think they are? (Are you sure you are using the expected version of T.) Use print(levels(T)) to make sure. I tried timeCats <- c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365") d <- data.frame(T = factor(rep(timeCats, 11:15), levels=timeCats), Y=seq_len(sum(11:15))) boxplot(Y ~ T, data=d) and the boxes and labels are in the order given in 'timeCats'. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jan 21, 2015 at 2:37 PM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote:> Hi, > I have a fairly elementary problem that I am unable to figure out. I have > a continuous variable, Y, repeatedly measured at multiple times, T. The > variable T is however is coded as a factor variable having these levels: > c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365"). > When I plot the boxplot, e.g., boxplot(Y ~ T), it displays the boxes in > this order: c("Day 180", "Day 30", "Day 365", "Day 60", "Presurgery"). > Is there a way to control the order of the boxes such that they are > plotted in a particular order that I want, for example: c("Presurgery", > "Day 30", "Day 60", "Day 180", "Day 365")? > > More generally, is there a simple way to redefine the ordering of the > categorical variable such that this ordering will be used in whatever > operation is done? I looked at relevel, reorder, etc., but they did not > seem to be applicable to my problem. > > Thanks for any help. > > Best, > Ravi > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Bill/Ravi: I believe the problem is that the factor is automatically created when a data frame is created by read.table(). By default, the levels are lexicographically ordered. The following reproduces the problem and gives a solution.>library(lattice)> z <- data.frame(y = 1:9, x = rep(c("pre", "day2","day10"))) > xyplot(y~x,data=z) ## x axis order is day 10, day2, pre> levels(z$x)[1] "day10" "day2" "pre"> z$x <- factor(as.character(z$x),levels=c(levels(z$x)[3:1])) ## explicitly defines level order > xyplot(y~x,data=z) ## desired plotCheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Jan 21, 2015 at 4:36 PM, William Dunlap <wdunlap at tibco.com> wrote:> Are you sure the factors of T are in the order you think they are? (Are you > sure you are using the expected version of T.) Use print(levels(T)) to > make > sure. > > I tried > timeCats <- c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365") > d <- data.frame(T = factor(rep(timeCats, 11:15), levels=timeCats), > Y=seq_len(sum(11:15))) > boxplot(Y ~ T, data=d) > and the boxes and labels are in the order given in 'timeCats'. > > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Jan 21, 2015 at 2:37 PM, Ravi Varadhan <ravi.varadhan at jhu.edu> > wrote: > >> Hi, >> I have a fairly elementary problem that I am unable to figure out. I have >> a continuous variable, Y, repeatedly measured at multiple times, T. The >> variable T is however is coded as a factor variable having these levels: >> c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365"). >> When I plot the boxplot, e.g., boxplot(Y ~ T), it displays the boxes in >> this order: c("Day 180", "Day 30", "Day 365", "Day 60", "Presurgery"). >> Is there a way to control the order of the boxes such that they are >> plotted in a particular order that I want, for example: c("Presurgery", >> "Day 30", "Day 60", "Day 180", "Day 365")? >> >> More generally, is there a simple way to redefine the ordering of the >> categorical variable such that this ordering will be used in whatever >> operation is done? I looked at relevel, reorder, etc., but they did not >> seem to be applicable to my problem. >> >> Thanks for any help. >> >> Best, >> Ravi >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.