Thaler,Thorn,LAUSANNE,Applied Mathematics
2012-Mar-19 14:44 UTC
[Rd] Design choice of plot.design for formulas
Dear all, Today I figured out that the formula interface of plot.design is kind of counter intuitive. Suppose the following setting ddf <- expand.grid(a=factor(1:3), b=factor(1:3)) ddf$y <- rnorm(9) plot.design(y ~ a + b, data=ddf) which does what it should do, basically printing the means for the respective levels of the factors. I had to learn that the function does not care at all whether I specify a variable at the LHS or the RHS of the formula. Thus, the following commands are all equivalent plot.design(~ y + a + b, data=ddf) plot.design(a ~ y + b, data=ddf) plot.design(b ~ y + a, data=ddf) A closer look into the code revealed that the function basically looks whether a variable is numeric or a factor. All factors are supposed to be stratification factors, while all numerical variables are supposed to be responses. While the former assumption makes sense, the latter is misleading in conjunction with the formula interface: ddf$z <- sample(3, 9, TRUE) plot.design(y ~ a + z, data=ddf) In my reading that should produce a plot where a and z are regarded as stratification factors, while y is the response. Instead the function regards y and z as responses. So my question: is there a particular reason why the formatting of a variable in a data frame (factor vs. numerical) takes precedence over the specification in the formula interface of plot.design? Is it the case that one cannot specify multiple responses otherwise? In this case, I was wondering whether an approach like in lattice where one can specify multiple responses would be useful: ddf$y.new <- rnorm(9) lattice:::xyplot(y + y.new ~ a, data = ddf, pch = 15) Thanks for your feedback. Kind regards, -Thorn