To whom it may concern, I happened to run the following R code just to check the layout of the output, but found that the code doesn't work the way I thought it should work. ''> lm(rnorm(100) ~ rnorm(100))Call: lm(formula = rnorm(100) ~ rnorm(100)) Coefficients: (Intercept) -0.07966 Warning messages: 1: In model.matrix.default(mt, mf, contrasts) : the response appeared on the right-hand side and was dropped 2: In model.matrix.default(mt, mf, contrasts) : problem with term 1 in model.matrix: no columns are assigned " It appears that rnorm(100) produces the same array of numbers on both sides of the ~ sign. It can be further verified by having the same error message if we do x <- rnorm(100) and lm(x ~ x). I would expect the two rnorm(100) functions in the lm function return two different arrays of numbers, but am open to hear reasons from the other side. Thanks, -- *Xu Tian* [[alternative HTML version deleted]]
Martin Maechler
2017-Aug-03 16:11 UTC
[Rd] rnorm is not truly random used in the lm function
>>>>> Victor Tian <tianxu03 at gmail.com> >>>>> on Thu, 3 Aug 2017 09:49:57 -0400 writes:> To whom it may concern, > I happened to run the following R code just to check the layout of the > output, but found that the code doesn't work the way I thought it should > work. yes, your expectations were wrong. >> lm(rnorm(100) ~ rnorm(100)) > Call: > lm(formula = rnorm(100) ~ rnorm(100)) > Coefficients: > (Intercept) > -0.07966 > Warning messages: > 1: In model.matrix.default(mt, mf, contrasts) : > the response appeared on the right-hand side and was dropped > 2: In model.matrix.default(mt, mf, contrasts) : > problem with term 1 in model.matrix: no columns are assigned > It appears that rnorm(100) produces the same array of numbers on both sides > of the ~ sign. Indeed. And all this has nothing to do with lm() but rather with how formulas in R have been treated probably "forever". [I assume not only in R, but rather since the time formulas where introduced into the S language (for "S version 3") a few years before R was born. But I can no longer verify or disprove this assumption.] Even more revealing may be this:> f <- rnorm(9) ~ rnorm(9) > str(f)Class 'formula' language rnorm(9) ~ rnorm(9) ..- attr(*, ".Environment")=<environment: R_GlobalEnv>> (mm <- model.matrix(f))(Intercept) 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 attr(,"assign") [1] 0 Warning messages: 1: In model.matrix.default(f) : the response appeared on the right-hand side and was dropped 2: In model.matrix.default(f) : problem with term 1 in model.matrix: no columns are assigned>--------- BTW: One of the goals of formulas, notably in R since they got an environment attached, is a clean way to deal with non-standard evaluation (=: NSE). [ Some of us would claim it is the only clean way to deal with NSE in R, and all new functionality using NSE should use formulas, but recently tidyverse-scholars have claimed to be able to deal with it cleanly w/o the use of formulas, but via "tidy evaluation" ] Using random expressions in a formula is therefore typically not a good idea, because you don't realy know when the terms in the formula will be evaluated. For lm() and all other good formula-based statistical modeling functions, the evaluation happens via model.matrix(). As you've noticed from that warning, model.matrix() tries to help the user by checking terms and eliminating those that appear on both sides of the '~'. This has been documented on the help page [ ?model.matrix ] for (almost exactly 14) years, the "Details:" section ending with _> By convention, if the response variable also appears on the _> right-hand side of the formula it is dropped (with a warning), _> although interactions involving the term are retained. I hope this explains the issue. And yes: Do *not* use rnorm() in formulas. Martin -- Martin M?chler Seminar f?r Statistik, ETH Z?rich // R Core Team
I did it purely based on the intuition I built from elsewhere and maybe in R as well. To summarise, it's basically a matter of evaluation ordering issue. It looks like the model.matrix() function has a higher precedence over rnorm(100), i.e., outside in rather than inside out in this specific case? If the inner parts are evaluated first, as in most of the cases, the two norm(100) expressions will no longer be the same. I guess it's because they appear the same to model.matrix()? This would raise another question, how does model.matrix() judges if two variables are the same on both sides of the ~ sign? By the input literal? Please clarify. Thanks, Victor On Thu, Aug 3, 2017 at 12:11 PM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> >>>>> Victor Tian <tianxu03 at gmail.com> > >>>>> on Thu, 3 Aug 2017 09:49:57 -0400 writes: > > > To whom it may concern, > > I happened to run the following R code just to check the layout of > the > > output, but found that the code doesn't work the way I thought it > should > > work. > > yes, your expectations were wrong. > > >> lm(rnorm(100) ~ rnorm(100)) > > > Call: > > lm(formula = rnorm(100) ~ rnorm(100)) > > > Coefficients: > > (Intercept) > > -0.07966 > > > Warning messages: > > 1: In model.matrix.default(mt, mf, contrasts) : > > the response appeared on the right-hand side and was dropped > > 2: In model.matrix.default(mt, mf, contrasts) : > > problem with term 1 in model.matrix: no columns are assigned > > > > It appears that rnorm(100) produces the same array of numbers on > both sides > > of the ~ sign. > > Indeed. And all this has nothing to do with lm() but rather with > how formulas in R have been treated probably "forever". > [I assume not only in R, but rather since the time formulas > where introduced into the S language (for "S version 3") a few > years before R was born. But I can no longer verify or disprove > this assumption.] > > Even more revealing may be this: > > > f <- rnorm(9) ~ rnorm(9) > > str(f) > Class 'formula' language rnorm(9) ~ rnorm(9) > ..- attr(*, ".Environment")=<environment: R_GlobalEnv> > > (mm <- model.matrix(f)) > (Intercept) > 1 1 > 2 1 > 3 1 > 4 1 > 5 1 > 6 1 > 7 1 > 8 1 > 9 1 > attr(,"assign") > [1] 0 > Warning messages: > 1: In model.matrix.default(f) : > the response appeared on the right-hand side and was dropped > 2: In model.matrix.default(f) : > problem with term 1 in model.matrix: no columns are assigned > > > --------- > > BTW: One of the goals of formulas, notably in R since they got an > environment attached, is a clean way to deal with non-standard > evaluation (=: NSE). > [ Some of us would claim it is the only clean way to deal with NSE in R, > and all new functionality using NSE should use formulas, > but recently tidyverse-scholars have claimed to be able to deal > with it cleanly w/o the use of formulas, but via "tidy evaluation" ] > > Using random expressions in a formula is therefore typically not > a good idea, because you don't realy know when the terms in the > formula will be evaluated. > For lm() and all other good formula-based statistical modeling > functions, the evaluation happens via model.matrix(). > > As you've noticed from that warning, model.matrix() tries to > help the user by checking terms and eliminating those that > appear on both sides of the '~'. > This has been documented on the help page [ ?model.matrix ] for > (almost exactly 14) years, the "Details:" section ending with > > _> By convention, if the response variable also appears on the > _> right-hand side of the formula it is dropped (with a warning), > _> although interactions involving the term are retained. > > > I hope this explains the issue. > And yes: Do *not* use rnorm() in formulas. > > Martin > > -- > Martin M?chler > Seminar f?r Statistik, ETH Z?rich // R Core Team >-- *Xu Tian* [[alternative HTML version deleted]]