Michael Pronath
2000-Sep-26 08:12 UTC
[R] lm -- significance of x coefficient when I(x^2) is used
In "Modern Applied Statistics with S-Plus" 3rd ed., footnote on page 153 regarding a model lm(Gas~Insul/(Temp+I(Temp^2))-1,whiteside), I read "Notice that when the quadratic terms are present, first degree coefficients mean 'the slope of the curve at temperature zero', so a non-significant value does not mean that the linear term is not needed. Removing the non-significant linear term for the 'after' group, for example, would be unjustified." AFAIK, t-test for significance of a coefficient is not based on the assumption that the variables of the linear model are "independent". What if I only got the model matrix X and I don't know, that one column is simply the square of another: Do I have to examine the model matrix for polynomial dependencies between its columns, to know if t-test significance is "significant"? If |t| is small for the 'slope of the curve at temperature zero', doesn't that just mean that 'slope of the curve at temperature zero' is not significantly different from 0 and that I had better set it to 0, i.e. omit the linear term? My only explanation for this is, that R somehow "detects" polynomial expressions in model formulae and treats them specially. Could anybody tell me a bit more on this subject? Michael Pronath -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Martyn Plummer
2000-Sep-26 09:00 UTC
[R] lm -- significance of x coefficient when I(x^2) is used
On 26-Sep-00 Michael Pronath wrote:> > In "Modern Applied Statistics with S-Plus" 3rd ed., footnote on page 153 > regarding a model lm(Gas~Insul/(Temp+I(Temp^2))-1,whiteside), I read > > "Notice that when the quadratic terms are present, first degree > coefficients mean 'the slope of the curve at temperature zero', so a > non-significant value does not mean that the linear term is not > needed. Removing the non-significant linear term for the 'after' > group, for example, would be unjustified." > > AFAIK, t-test for significance of a coefficient is not based on the > assumption that the variables of the linear model are "independent". What > if I only got the model matrix X and I don't know, that one column is > simply the square of another: Do I have to examine the model matrix for > polynomial dependencies between its columns, to know if t-test significance > is "significant"?I would hope that you never conduct an analysis without knowing what the variables mean! Seriously, though, it is not a question that can be answered by the formal representation of the model, but by its interpretation. If `the slope of the curve at x=0' has some physical interpretation, you _might_ be justified in eliminating the linear term while keeping the quadratic one. But temperature is a good example of where this is not the case. If you eliminate the linear term, then change your temperature scale (from Centigrade to Fahrenheit), the linear term will pop back out again. In effect you are testing a "hypothesis" that has no meaning. Don't do this. Martyn -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Bill Venables
2000-Sep-26 12:33 UTC
[R] lm -- significance of x coefficient when I(x^2) is used
At 10:12 26/09/00 +0200, Michael Pronath wrote:> > In "Modern Applied Statistics with S-Plus" 3rd ed., footnote on page 153 > regarding a model lm(Gas~Insul/(Temp+I(Temp^2))-1,whiteside), I read > > "Notice that when the quadratic terms are present, first degree > coefficients mean 'the slope of the curve at temperature zero', so a > non-significant value does not mean that the linear term is not > needed. Removing the non-significant linear term for the 'after' > group, for example, would be unjustified."I accept full responsibilty... and I am sticking to my guns. I think this is a crucial point of inference often misunderstood. It is at the core of the reason why putting significance stars routinely on t-statistics is NOT a good idea - by doing so you encourage confusion and unjustified inferences.> > AFAIK, t-test for significance of a coefficient is not based on the > assumption that the variables of the linear model are "independent".Quite correct, it is not.> What > if I only got the model matrix X and I don't know, that one column is > simply the square of another: Do I have to examine the model matrix for > polynomial dependencies between its columns, to know if t-test significance > is "significant"?Let me start to answer by asking you "What does a `significant' t-test result mean?" To me it means that if someone were to pose the null hypothesis that the mean (or in this case regression coefficient) were zero, you would have, by convention, strong enough evidence to reject it. If the result were `non-significant', by contrast, it does NOT allow you to assert that the regression coefficient IS zero, it only means that you do not have enough EVIDENCE to reject such a claim. The real question is whether or not it is a claim anyone would have good reason to make in the first place - sometimes it would be, sometimes not. In the case above I would say from the context there is no good reason for anyone a priori to claim that the derivative at temperature 0 should be zero, that is, that the curve should necessarily be flat at that rather arbitrary point.> > If |t| is small for the 'slope of the curve at temperature zero', doesn't > that just mean that 'slope of the curve at temperature zero' is not > significantly different from 0yes it does, but it is most likely not significantly different from 0 in a range of temperature values near 0degC as well, where should you constrain the curve to be flat? It comes back to the question of where would someone have good reason a priori to pose the question "Is the curve flat HERE, and this very special temperature?" If there is no good reason to pose the question, why force the model to conform to this arbitrary restrictrion? Notice that this is not quite the same thing as variable selection where Occam's principle is the good reason for considering whether or not coefficients are zero.> and that I had better set it to 0, i.e. omit > the linear term?Ahh ... and you were doing so well, too! Shocking as it may sound, non-significance, by itself, is not a good enough reason to omit terms in a regression, (provided of course you had a good reason for including them in the first place).> > My only explanation for this is, that R somehow "detects" polynomial > expressions in model formulae and treats them specially.No, it doesn't but it would be nice if it could. The same sort of consideration comes into play when you have factor models with interactions: no main effect term is removed when a higher way interaction involving it is still present in the model. This is exactly the same principle at work.> > Could anybody tell me a bit more on this subject?Only that it is often called "the marginality principle", it has caused endless, heated and ultimately futile debates in the past, and that when you finally get to see why it makes sense to think this way you immediately see the exceptions and you start to get a deeper understanding of what significance tests and model selection are all about. What strikes me as the crucial question in model selection problems like this is "what group of transformations of the regressor variables should the model selection process be invariant with respect to?" For simple polynomial regressions it is often (but certainly not always) reasonable to require the model selection process to be invariant with respect to changes of origin and scale in the predictor. This immediately tells you that you should at every stage be considering the leading (highest degree) coefficient only and not those lower down, since by a change of origin and scale you leave the highest degree coefficient t-statistic invariant, but you can make any of the other t-statistics just about any value you damn well like, and certainly zero. Similarly with spatial regressions it may be reasonable to require that the selection process be invariant with respect to affine transformations of latitude and longitude, and, of course, sometimes not. Think about it and then, usually, I would say think some more... Bill Venables.> > Michael Pronath >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.->r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html >Send "info", "help", or "[un]subscribe" >(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._> >-- Bill Venables, Statistician Tel. +61 7 3826 7251 CSIRO Marine Laboratories, Fax. +61 7 3826 7304 Cleveland, Qld, 4163 Email: Bill.Venables at cmis.csiro.au AUSTRALIA http://www.cmis.csiro.au/bill.venables/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
John Maindonald
2000-Sep-26 22:25 UTC
[R] lm -- significance of x coefficient when I(x^2) is used
> From: Jim Lindsey <james.lindsey at luc.ac.be> > > Bill's excellent discussion (which I do not reproduce here) describes > precisely why I was (and still am) dead against having stars printed > as the default option in lm and glm. Let's get rid of that and let > people turn them on, if they want, in those cases in which they make > sense.I agree. When using R for teaching, they are a serious distraction, and I find it absolutely necessary to turn them off. There are very obtrusive. If there was a way to vary the shade of grey in which the coefficients or t-statistics were printed, that would be more acceptable! John Maindonald email : john.maindonald at anu.edu.au Statistical Consulting Unit, phone : (6249)3998 c/o CMA, SMS, fax : (6249)5549 John Dedman Mathematical Sciences Building Australian National University Canberra ACT 0200 Australia -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._