Schwab,Wilhelm K
2010-Feb-25 23:51 UTC
[R] Ordering categories on a boxplot - a serious trap??
Hello all, I think I probably did something stupid, and R's part was to allow me to do it. My goal was to control the order of factor levels appearing horizontally on a boxplot. Enter search engines and perhaps some creative stupidity on my part, and I came up with the following: v=read.table("factor-order.txt",header=TRUE); levels(v$doseGroup) = c("L", "M", "H"); boxplot(v$dose~v$doseGroup); A good way to see the trap is to evaluate: v=read.table("factor-order.txt",header=TRUE); par(mfrow=c(2,1)); boxplot(v$dose~v$doseGroup); levels(v$doseGroup) = c("L", "M", "H"); boxplot(v$dose~v$doseGroup); par(mfrow=c(1,1)); The above creates two plots, one correct with the factors in an inconvient order, and one that is WRONG. In the latter, the labels appear in the desired order, but the data does not "move with them." I did not discover the problem until I repeated the same type of plot with something that had a known relationship with the levels, and the result was clearly not correct. I *think* the problem is to assign to the return value of levels(). How did I think to do that? I'm not really sure, but please look at https://stat.ethz.ch/pipermail/r-help/2008-August/171884.html Perhaps it does not say to do exactly what I did, but it sure was easy to follow to the mistake, it appeared to do what I wanted, and the consequences of the mistake are ugly. Perhaps levels() should return something that is immutable?? If I am looking at this correctly, levels() is an accident waiting to happen. What should I have done? It seems: read data and order factor levels v=read.table("factor-order.txt",header=TRUE); group = factor(v$doseGroup,levels = c("L", "M", "H") ); boxplot(v$dose~group); One disappointment is that the above factor() call apparently needs to be repeated for any subset of v - I'm still trying to get my mind around that one. Can anyone confirm this? It strikes me as a trap that should be addressed so that an error results rather than a garbage graph. Bill --- Wilhelm K. Schwab, Ph.D.
William Dunlap
2010-Feb-26 00:13 UTC
[R] Ordering categories on a boxplot - a serious trap??
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Schwab,Wilhelm K > Sent: Thursday, February 25, 2010 3:51 PM > To: r-help at r-project.org > Subject: [R] Ordering categories on a boxplot - a serious trap?? > > Hello all, > > I think I probably did something stupid, and R's part was to > allow me to do it. My goal was to control the order of > factor levels appearing horizontally on a boxplot. Enter > search engines and perhaps some creative stupidity on my > part, and I came up with the following: > > v=read.table("factor-order.txt",header=TRUE); > levels(v$doseGroup) = c("L", "M", "H"); > boxplot(v$dose~v$doseGroup);levels<- translated the current level labels into another language, it did not change the integer codes of the factor. If you want to reorder the levels call factor(..., levels=). E.g., > z <- factor(c("Small","Large","Medium","Small")) > str(z) Factor w/ 3 levels "Large","Medium",..: 3 1 2 3 > str(factor(z, levels=c("Small","Medium","Large"))) Factor w/ 3 levels "Small","Medium",..: 1 3 2 1 You can relabel them also by using the labels= argument to factor > str(factor(z, levels=c("Small","Medium","Large"), labels=c("S","M","L"))) Factor w/ 3 levels "S","M","L": 1 3 2 1 Calling levels<- changes nothing but the level labels: > zcopy <- z > levels(zcopy) <- c("Small","Medium","Large") > str(zcopy) Factor w/ 3 levels "Small","Medium",..: 3 1 2 3 levels<- is handy for low-level manipulations but not for general use. Even factor(,levels=) can be a bit dangerous: if a new level is misspelled it will silently add NA's to the data: > str(factor(z, levels=c("Smal", "Medium", "Large"))) Factor w/ 3 levels "Smal","Medium",..: NA 3 2 NA Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > > A good way to see the trap is to evaluate: > > v=read.table("factor-order.txt",header=TRUE); > par(mfrow=c(2,1)); > boxplot(v$dose~v$doseGroup); > levels(v$doseGroup) = c("L", "M", "H"); > boxplot(v$dose~v$doseGroup); > par(mfrow=c(1,1)); > > The above creates two plots, one correct with the factors in > an inconvient order, and one that is WRONG. In the latter, > the labels appear in the desired order, but the data does not > "move with them." I did not discover the problem until I > repeated the same type of plot with something that had a > known relationship with the levels, and the result was > clearly not correct. > > I *think* the problem is to assign to the return value of > levels(). How did I think to do that? I'm not really sure, > but please look at > > https://stat.ethz.ch/pipermail/r-help/2008-August/171884.html > > > Perhaps it does not say to do exactly what I did, but it sure > was easy to follow to the mistake, it appeared to do what I > wanted, and the consequences of the mistake are ugly. > Perhaps levels() should return something that is immutable?? > If I am looking at this correctly, levels() is an accident > waiting to happen. > > What should I have done? It seems: > > read data and order factor levels > v=read.table("factor-order.txt",header=TRUE); > group = factor(v$doseGroup,levels = c("L", "M", "H") ); > boxplot(v$dose~group); > > > One disappointment is that the above factor() call apparently > needs to be repeated for any subset of v - I'm still trying > to get my mind around that one. > > Can anyone confirm this? It strikes me as a trap that should > be addressed so that an error results rather than a garbage graph. > > Bill > > > --- > Wilhelm K. Schwab, Ph.D. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Schwab,Wilhelm K
2010-Feb-26 01:40 UTC
[R] Ordering categories on a boxplot - a serious trap??
Phil, That works[*], but I still think there is a big problem given how easy it is to do the wrong thing, and that searches lead to dangerous instructions. Hopefully this will serve to keep others out of trouble, but so might an immutable return value from levels(). [*] I have not yet done anything with selecting parts of the data frame. Using a separate factor, I quickly hit trouble with size mismatches, though I could probably work around them by recreating the factor after any such change. Proceeding with caution... Bill --- Wilhelm K. Schwab, Ph.D. -----Original Message----- From: Phil Spector [mailto:spector at stat.berkeley.edu] Sent: Thursday, February 25, 2010 7:06 PM To: Schwab,Wilhelm K Subject: Re: [R] Ordering categories on a boxplot - a serious trap?? Wilhelm - I don't know if this is correct for your problem because you didn't provide a reproducible example, but perhaps you could try v$doseGroup = factor(v$doseGroup,levels=c("L", "M", "H")) instead of setting the levels directly. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 25 Feb 2010, Schwab,Wilhelm K wrote:> Hello all, > > I think I probably did something stupid, and R's part was to allow me to do it. My goal was to control the order of factor levels appearing horizontally on a boxplot. Enter search engines and perhaps some creative stupidity on my part, and I came up with the following: > > v=read.table("factor-order.txt",header=TRUE); > levels(v$doseGroup) = c("L", "M", "H"); > boxplot(v$dose~v$doseGroup); > > > A good way to see the trap is to evaluate: > > v=read.table("factor-order.txt",header=TRUE); > par(mfrow=c(2,1)); > boxplot(v$dose~v$doseGroup); > levels(v$doseGroup) = c("L", "M", "H"); > boxplot(v$dose~v$doseGroup); > par(mfrow=c(1,1)); > > The above creates two plots, one correct with the factors in an inconvient order, and one that is WRONG. In the latter, the labels appear in the desired order, but the data does not "move with them." I did not discover the problem until I repeated the same type of plot with something that had a known relationship with the levels, and the result was clearly not correct. > > I *think* the problem is to assign to the return value of levels(). > How did I think to do that? I'm not really sure, but please look at > > https://stat.ethz.ch/pipermail/r-help/2008-August/171884.html > > > Perhaps it does not say to do exactly what I did, but it sure was easy to follow to the mistake, it appeared to do what I wanted, and the consequences of the mistake are ugly. Perhaps levels() should return something that is immutable?? If I am looking at this correctly, levels() is an accident waiting to happen. > > What should I have done? It seems: > > read data and order factor levels > v=read.table("factor-order.txt",header=TRUE); > group = factor(v$doseGroup,levels = c("L", "M", "H") ); > boxplot(v$dose~group); > > > One disappointment is that the above factor() call apparently needs to be repeated for any subset of v - I'm still trying to get my mind around that one. > > Can anyone confirm this? It strikes me as a trap that should be addressed so that an error results rather than a garbage graph. > > Bill > > > --- > Wilhelm K. Schwab, Ph.D. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >