Dear List-members, Hopefully someone will help through my confusion: In order to get the same coefficients as we get from the following ## require (MASS) summary ( lm(Gas ~ Insul/Temp - 1, data = whiteside) ) ...................... we need to do the following (if we use model.matrix to specify the model) ## summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data = whiteside) ) ...................... That is, we need to take out "two intercepts." Is this "correct"? Shouldn't lm check to see if an intercept has been requested as part of the model formula? If I do ## summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside)), data=whiteside)) ....................... we get a strange model. But the formula part of this model qualifies as a valid formula ## as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside)) ---------------- just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside) But we know that the _correct_ formula is the following ## as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1) ----------------- (Sorry, this is getting really long) --- So, my question/confusion comes down to wanting to know why lm() doesn't check to see if an intercept has been specified when the model has been specified using model.matrix. Regards, Mark. -- View this message in context: http://www.nabble.com/lm-model.matrix-confusion-%28--bug%29-tp14292188p14292188.html Sent from the R help mailing list archive at Nabble.com.
Whoops! Sorry, forgot my session stuff, just in case ... R version 2.6.1 RC (2007-11-22 r43520) i386-pc-mingw32 locale: LC_COLLATE=English_South Africa.1252;LC_CTYPE=English_South Africa.1252;LC_MONETARY=English_South Africa.1252;LC_NUMERIC=C;LC_TIME=English_South Africa.1252 attached base packages: [1] tcltk splines stats graphics grDevices utils datasets methods base other attached packages: [1] debug_1.1.0 mvbutils_1.1.1 MASS_7.2-38 sfsmisc_1.0-0 ade4_1.4-5 Design_2.1-1 [7] survival_2.34 Hmisc_3.4-3 loaded via a namespace (and not attached): [1] cluster_1.11.9 gamlss_1.7-0 grid_2.6.1 lattice_0.17-2 latticeExtra_0.3-1 [6] rcompgen_0.1-17 tools_2.6.1 Mark Difford wrote:> > Dear List-members, > > Hopefully someone will help through my confusion: > > In order to get the same coefficients as we get from the following > > ## > require (MASS) > summary ( lm(Gas ~ Insul/Temp - 1, data = whiteside) ) > > ...................... > > we need to do the following (if we use model.matrix to specify the model) > > ## > summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data = whiteside) ) > > ...................... > > That is, we need to take out "two intercepts." Is this "correct"? > Shouldn't lm check to see if an intercept has been requested as part of > the model formula? > > If I do > ## > summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1, > data=whiteside)), data=whiteside)) > > ....................... > > we get a strange model. But the formula part of this model qualifies as a > valid formula > ## > as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside)) > > ---------------- > > just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside) > > But we know that the _correct_ formula is the following > > ## > as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1) > > ----------------- > > (Sorry, this is getting really long) --- So, my question/confusion comes > down to wanting to know why lm() doesn't check to see if an intercept has > been specified when the model has been specified using model.matrix. > > Regards, > Mark. > >-- View this message in context: http://www.nabble.com/lm-model.matrix-confusion-%28--bug%29-tp14292188p14292212.html Sent from the R help mailing list archive at Nabble.com.
G'day Mark, On Wed, 12 Dec 2007 02:05:54 -0800 (PST) Mark Difford <mark_difford at yahoo.co.uk> wrote:> In order to get the same coefficients as we get from the following[...]> we need to do the following (if we use model.matrix to specify the > model)By why would you want to do this?> ## > summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data > whiteside) ) > > That is, we need to take out "two intercepts." Is this "correct"?Yes.> Shouldn't lm check to see if an intercept has been requested as part > of the model formula?No, it does not. In the Details section of lm's help page you will find the following: A formula has an implied intercept term. To remove this use either 'y ~ x - 1' or 'y ~ 0 + x'. See 'formula' for more details of allowed formulae. Thus, except if you explicitly ask for a constant term not be included, lm will add a constant term (a column of ones) additionally to what ever you have specified on the right hand side of the formula.> If I do > ## > summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1, > data=whiteside)), data=whiteside)) > > we get a strange model.Well, you get a model in which not all parameters are identifiable, and a particular parameter that is not identifiable is estimated by NA. I am not sure what is strange about this.> But the formula part of this model qualifies > as a valid formula > ## > as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside))Debatable, the above command only shows that it can be coerced into a valid formula. :)> just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside) > > But we know that the _correct_ formula is the following> ## > as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1)Why is this formula any more correct than the other one? Both specify exactly the same model. It is just that one does it in an overparameterised way.> (Sorry, this is getting really long) --- So, my question/confusion > comes down to wanting to know why lm() doesn't check to see if an > intercept has been specified when the model has been specified using > model.matrix.Because lm() is documented not to check this. If you do not want to have an intercept in the model you have to specifically ask it for. Also, comparing the output of summary( lm(Gas ~ Insul/Temp - 1, data = whiteside) ) and summary( lm(Gas ~ Insul/Temp, data = whiteside ) ) you can see that lm() does not check whether there is an implicit intercept in the model. Compare the (Adjusted) R-squared values returned; one case is using the formula for models with no intercept the other one the formula for models with intercept. Similar story with the reported F-statistics. Cheers, Berwin =========================== Full address ============================Berwin A Turlach Tel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability +65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg Singapore 117546 http://www.stat.nus.edu.sg/~statba
On Wed, 2007-12-12 at 02:05 -0800, Mark Difford wrote:> Dear List-members, > > Hopefully someone will help through my confusion: > > In order to get the same coefficients as we get from the following > > ## > require (MASS) > summary ( lm(Gas ~ Insul/Temp - 1, data = whiteside) ) > > ...................... > > we need to do the following (if we use model.matrix to specify the model) > > ## > summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data = whiteside) ) > > ...................... > > That is, we need to take out "two intercepts." Is this "correct"?Yes - if you insist on doing things that way> > Shouldn't lm check to see if an intercept has been requested as part of the > model formula?It does, (i.e. whether the user specified -1 in the formula argument), but you specified it inside model.matrix() so the formula parsing code never sees it. One wouldn't want lm to need to know about all functions that have/could ever be written in R, past or future, to know how to divine that here you wanted "-1" to mean "remove intercept please Mr. R Parser" and not "subtract 1 from this result please Mr. R Parser". You are really abusing the reason for having the formula interface. The whole point of it is to make it easy for user to specify a model, from which R generates the model matrix for you. Why use the formula at all if you have already produced your model matrix? See ?lm.fit for a fast way of fitting linear models (used within lm() ) if you have all the bits in place (i.e. you already have a model matrix) and are happy to take care of other details yourself. <snip />> (Sorry, this is getting really long) --- So, my question/confusion comes > down to wanting to know why lm() doesn't check to see if an intercept has > been specified when the model has been specified using model.matrix.I guess because you can't programme around every possible usage that a user might dream up to try. If it did, lm would be an absolute pig and run like one. It is slow enough as it is with all the user-friendly stuff in there to help. (By slow enough, I mean it is perfectly speedy in routine use, but you wouldn't want to use it in a permutation test-like environment if you were after speed - you'd be better off working with lm.fit directly) The model.matrix bit is returning a matrix (without an entry for an intercept), which is fine on the rhs of a formula. As all you've done here is (effectively) create the following model:> form <- formula(paste("Gas ~",+ paste(colnames(model.matrix (~ Insul/Temp-1, + data=whiteside)), collapse = " + ")))> formGas ~ InsulBefore + InsulAfter + InsulBefore:Temp + InsulAfter:Temp I.e., that is what your convoluted formula is being interpreted as, you then still need to remove the intercept that R will automagically add to the model when it subsequently creates the model matrix. HTH G> > Regards, > Mark. >-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%