Today's GNU R tutorial in http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics points out how bad statistical practice is being further perpetuated, by virtue of "significance stars" still being the default in printed output from lm models. ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795.html Sent from the R devel mailing list archive at Nabble.com.
Dear Frank, I'd like to second your implicit motion to make options(show.signif.stars=FALSE) the default. Thanks for raising this point. John On Thu, 7 Feb 2013 05:32:04 -0800 (PST) Frank Harrell <f.harrell at vanderbilt.edu> wrote:> Today's GNU R tutorial in > http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics > points out how bad statistical practice is being further perpetuated, by > virtue of "significance stars" still being the default in printed output > from lm models. > > > > > ----- > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
There are only a few things in R where we override the global defaults on a departmental level -- we really don't like to do so. But "show.signif.stars" is one of the 3. The other 2 if you are curious: set stringsAsFactors=FALSE and make NA included by default in the output of table. We've been overriding both of these for 10+ years. Terry Therneau
Thanks for bringing this up, Frank. Since many of us are "educators," I'd like to suggest a bolder approach. Discontinue even offering the stars as an option. Sadly, we can't stop reporting p-values, as the world expects them, but does R need to cater to that attitude by offering star display? For that matter, why not have R report confidence intervals as a default? Many years ago, I wrote a short textbook on stat, and included a substantial section on the dangers of significance testing. All three internal reviewers liked it, but the funny part is that all three said, "I agree with this, but no one else will." :-) Norm
I appreciate Tim's comments. I myself have a "social science" paper coming out soon in which I felt forced to use p-values, given their ubiquity. However, I also told readers of the paper that confidence intervals are much more informative and I do provide them. As I said earlier, there is no avoiding that, and R needs to report p-values for that reason. Instead, the question is what to do about the stars; I proposed eliminating them altogether. Star-crazed users know how to determine them themselves from the p-values, but deleting them from R would send a message. I did say my proposal was "bold," which really meant I was suggesting that R do SOMETHING to send that message, not necessarily star elimination. One such "something" would be the proposal I made, which would be to add confidence intervals to the output. This too could be just an option, but again offering that option would send a message. Indeed, I would suggest that the help page explain that confidence intervals are more informative. (The help page could make a similar statement regarding the stars.) When I pitch R to people, I say that in addition to the large function and library base and the nice graphics capabilities, R is above all Statistically Correct--it's written by statisticians who know what they are doing, rather than some programmer simply implementing a formula from a textbook. I know that a lot of people feel this is one of R's biggest strengths. Given that, one might argue that R should do what it can to help users engage in good statistical practice. I think this was Frank's point. Norm
Please do not change the defaults for the show.signif.stars option or for the default.stringsAsFactors option. Backward compatibility is more important than your convenience. The same sort of argument could be made for changing the default of the "[" function from drop = TRUE to drop FALSE. It would lead to less gotchas when coding and make R a saner programming language (less infernoish), but would annoy and confuse ordinary users and is not "the R way". In any case your philosophical arguments about signif stars are bogus. Non-simultaneous have exactly the same problem as these "regression stars". As I once said in a paper, they are something "users think they can interpret" with the unstated implication that they really cannot. Charlie's law of users says ordinary users of statistics actually ignore confidence levels and treat all confidence intervals as if they cover (i. e., take the true confidence level to be 100%). You cannot fix lack of user understanding of statistics by any such simplistic idea. Yes R is a prime example of "worse is better", but it is the way it is. Don't try to turn it into C++. Thank you. -- Charles Geyer Professor, School of Statistics University of Minnesota charlie@stat.umn.edu [[alternative HTML version deleted]]