I apologize if this question is not completely appropriate for this list. I have been using SAS for a while and am now in the process of learning some C and R as a part of my graduate studies. All of the statistical packages I have used generally yield p-values as a default output to standard procedures. This week I have been reading "Testing Precise Hypotheses" by J.O. Berger & Mohan Delampady, Statistical Science, Vol. 2, No. 3, 317-355 and "Bayesian Analysis: A Look at Today and Thoughts of Tomorrow" by J.O. Berger, JASA, Vol. 95, No. 452, p. 1269 - 1276, both as supplements to my Math Stat. course. It appears, based on these articles, that p-values are more or less useless. If this is indeed the case, then why is a p-value typically given as a default output? For example, I know that PROC MIXED and lme( ) both yield p-values for fixed effects terms. The theory I am learning does not seem to match what is commonly available in the software, and I am just wondering why. Thanks, Greg
On 27-Apr-04 Greg Tarpinian wrote:> I apologize if this question is not completely > appropriate for this list.Never mind! (I'm only hoping that my response is ... )> [...] > This week I have been reading "Testing Precise > Hypotheses" by J.O. Berger & Mohan Delampady, > Statistical Science, Vol. 2, No. 3, 317-355 and > "Bayesian Analysis: A Look at Today and Thoughts of > Tomorrow" by J.O. Berger, JASA, Vol. 95, No. 452, p. > 1269 - 1276, both as supplements to my Math Stat. > course. > > It appears, based on these articles, that p-values are > more or less useless.I don't have these articles available, but I'm guessing that they stress the Bayesian approach to inference. Saying "p-values are more or less useless" is controversial. Bayesians consider p-values to be approximately irrelevant to the real question, which is what you can say about the probability that a hypothesis is true/false, or what is the probability that a parameter lies in a particular range (sometimes the same question); and the "probability" they refer to is a posterior probability distribution on hypotheses, or over parameter values. The "P-value" which is emitted at the end of standard analysis is not such a probability, but instead is that part of a distribution over the sample space which is defined by a "cut-off" value of a test statistic calculated from the data. So they are different entities. Numerically they may coincide; indeed, for statistical problems with a certain structure the P-value is equal to the Bayesian posterior probability when a particular prior distribution is adopted.> If this is indeed the case, > then why is a p-value typically given as a default > output? For example, I know that PROC MIXED and > lme( ) both yield p-values for fixed effects terms.P-values are not as useless as sometimes claimed. They at least offer a measure of discrepancy between data and hypothesis (the smaller the P-value, the more discrepant the data), and they offer this measure on a standard scale, the "probabiltiy scale" -- the chance of getting something at least as discrepant, if the hypothesis being tested is true. What "discrepant" objectively means is defined by the test statistic used in calculating the P-value: larger values of the test statistic correspond to more discrepant data. Confidence intervals are essentially aggregates of hypotheses which have not been rejected at a significance level equal to 1 minus the P-value. The P-value/confidence-interval approach (often called the "frequentist approach") gives results which do not depend on assuming any prior distribution on the parameters/hypotheses, and therefore could be called "objective" in that they avoid being accused of importing "subjective" information into the inference in the form of a Bayesion prior distribution. This can have the consequence that your confidence interval may include values in a range which, a priori, you do not acept as plausible; or exclude a range of values in which you are a priori confident that the real value lies. The Bayesian comment on this situation is that the frequentist approach is "incoherent", to which the frequentist might respond "well, I just got an unlucky experiment this time" (which is bound to occur with due frequency).> The theory I am learning does not seem to match what > is commonly available in the software, and I am just > wondering why.The standard ritual for evaluating statistical estimates and hypothesis tests is frequentist (as above). Rightly interpreted, it is by no means useless. For complex historical reasons, it has become the norm in "research methodology", and this is essentially why it is provided by the standard software packages (otherwise pharmaceutical companies would never buy the software, since they need this in order to get past the FDA or other regulatory authority). However, because this is the "norm", such results often have more meaning attributed to them than they can support, by people disinclined to delve into what "rightly interpreted" might mean. This is not a really clean answer to your question; but then your question touches on complex and conflicting issues! Hoping this helps (and hoping that I am not poking a hornets' nest here)! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 27-Apr-04 Time: 22:25:22 ------------------------------ XFMail ------------------------------
The Bayesian framework is surely a good framework for thinking about inference, and for exploring common misinterpretations of p-values. P-values are surely unhelpful and to be avoided in cases where there is `strong' prior evidence. I will couch the discussion that follows in terms of confidence intervals, which makes the discussion simpler, rather than in terms of p-values. The prior evidence is in my sense strong if it leads to a Bayesian credible interval that is very substantially different from the frequentist confidence interval (though I prefer the term `coverage interval'). Typically the intervals will be similar if a "diffuse" prior is used, i.e., all values over a wide enough range are, on some suitable scale, a-priori equally likely. This is, in my view, the message that you should take from your reading. Examples of non-diffuse priors are what Berger focuses on. Consider for example his discussion of one of Jeffreys' analyses, where Jeffreys puts 50% of the probability on on a point value of a a continuous parameter, i.e., there is a large spike in the prior at that point. Berger commonly has scant commentary on the specific features of his priors that make the Bayesian results seem very different (at least to the extent of having a different "feel") from the frequentist results. His paper in vol 18, no 1 of Statistical Science (pp.1-32; pp.12-27 are comment from other) seems more judicious in this respect than some of his earlier papers. It is interesting to speculate how R's model fitting routines might be tuned to allow a Bayesian interpretation. What family or families of priors would be on offer, and/or used by default? What default mechanisms would be suitable & useful for indicating the sensitivity of results to the choice of prior? John Maindonald.> From: Greg Tarpinian <sasprog474 at yahoo.com> > Date: 28 April 2004 6:32:06 AM > To: r-help at stat.math.ethz.ch > Subject: [R] p-values > > > I apologize if this question is not completely > appropriate for this list. > > I have been using SAS for a while and am now in the > process of learning some C and R as a part of my > graduate studies. All of the statistical packages I > have used generally yield p-values as a default output > to standard procedures. > > This week I have been reading "Testing Precise > Hypotheses" by J.O. Berger & Mohan Delampady, > Statistical Science, Vol. 2, No. 3, 317-355 and > "Bayesian Analysis: A Look at Today and Thoughts of > Tomorrow" by J.O. Berger, JASA, Vol. 95, No. 452, p. > 1269 - 1276, both as supplements to my Math Stat. > course. > > It appears, based on these articles, that p-values are > more or less useless. If this is indeed the case, > then why is a p-value typically given as a default > output? For example, I know that PROC MIXED and > lme( ) both yield p-values for fixed effects terms. > > The theory I am learning does not seem to match what > is commonly available in the software, and I am just > wondering why. > > Thanks, > GregJohn Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.
This is, of course, not strictly about R. But if there should be a decision to pursue such matters on this list, then we'd need another list to which such discussion might be diverted. I've pulled Frank's "Regression Modeling Stratregies" down from my shelf and looked to see what he says about inferential issues. There is a suggestion, in the introduction, that modeling provides the groundwork that can be used a point of departure for a variety of inferential interpretations. As far as I can see Bayesian interpretations are never really explicitly discussed, though the word Bayesian does appear in a couple of places in the text. Frank, do you now have ideas on how you would (perhaps, in a future edition, will) push the discussion in a more overtly Bayesian direction? What might be the style of a modeling book, aimed at practical data analysts who of necessity must (mostly, at least) use off-the-shelf software, that "seriously entertains" the Bayesian approach? R provides a lot of help for those who want a frequentist interpretation, even to including by default the *, **, *** labeling that some of us deplore. There is no similar help for those who want at least the opportunity to place the output from a modeling exercise in a Bayesian context of some description. There is surely a strong argument for the use of a more neutral form of default output, even to the excluding of p-values, on the argument that they also push too strongly in the direction of a frequentist interpretative framework. There seems, unfortunately, to be a dearth of good ideas on how the assist the placing of output from modeling functions such as R provides in an explicitly Bayesian framework. Or is it, at least in part, that I am unaware of what is out there? That, I guess, is the point of my question to Frank. Is it just too technically demanding to go much beyond trying to get users to understand that a Bayesian credible interval can, if there is an informative prior, be very different from a frequentist CI, that they really do need to pause if there is an informative prior lurking somewhere in the undergrowth? John Maindonald. Frank Harrell wrote:> They [p-values] are objective only in the sense that > subjectivity is deferred in a difficult to document way > when P-values are translated into decisions.> The statement that frequentist methods are the norm, which I'm > afraid is usually true, is a sad comment on the state of much > of "scientific" inquiry. IMHO P-values are so defective that > the imperfect Bayesian approach should be seriously entertained.John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.
I am sure you are aware of this, but for the record I wanted to mention that the book "Bayesian Data Analysis", 2nd Edition, by Gelman, Carlin, Stern, and Rubin, published by Chapman and Hall/CRC contains an appendix (appendix C) on computations with R and BUGS. Hopefully Frank will have a section in his book in the future? John Maindonald <john.maindonald at anu.edu.au> Sent by: r-help-bounces at stat.math.ethz.ch 04/29/2004 12:49 AM To: r-help at stat.math.ethz.ch cc: ryan.elmore at anu.edu.au, alan.welsh at anu.edu.au Subject: Re:[R] p-values This is, of course, not strictly about R. But if there should be a decision to pursue such matters on this list, then we'd need another list to which such discussion might be diverted. I've pulled Frank's "Regression Modeling Stratregies" down from my shelf and looked to see what he says about inferential issues. There is a suggestion, in the introduction, that modeling provides the groundwork that can be used a point of departure for a variety of inferential interpretations. As far as I can see Bayesian interpretations are never really explicitly discussed, though the word Bayesian does appear in a couple of places in the text. Frank, do you now have ideas on how you would (perhaps, in a future edition, will) push the discussion in a more overtly Bayesian direction? What might be the style of a modeling book, aimed at practical data analysts who of necessity must (mostly, at least) use off-the-shelf software, that "seriously entertains" the Bayesian approach? R provides a lot of help for those who want a frequentist interpretation, even to including by default the *, **, *** labeling that some of us deplore. There is no similar help for those who want at least the opportunity to place the output from a modeling exercise in a Bayesian context of some description. There is surely a strong argument for the use of a more neutral form of default output, even to the excluding of p-values, on the argument that they also push too strongly in the direction of a frequentist interpretative framework. There seems, unfortunately, to be a dearth of good ideas on how the assist the placing of output from modeling functions such as R provides in an explicitly Bayesian framework. Or is it, at least in part, that I am unaware of what is out there? That, I guess, is the point of my question to Frank. Is it just too technically demanding to go much beyond trying to get users to understand that a Bayesian credible interval can, if there is an informative prior, be very different from a frequentist CI, that they really do need to pause if there is an informative prior lurking somewhere in the undergrowth? John Maindonald. Frank Harrell wrote:> They [p-values] are objective only in the sense that > subjectivity is deferred in a difficult to document way > when P-values are translated into decisions.> The statement that frequentist methods are the norm, which I'm > afraid is usually true, is a sad comment on the state of much > of "scientific" inquiry. IMHO P-values are so defective that > the imperfect Bayesian approach should be seriously entertained.John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
One might begin by considering _conditional_ p-values as elaborated by Hubbard and Bayarri and especially Sellke, Bayarri, and Berger. Record Number: 1545 @article{ Hubbard2003, Author = {Hubbard, R. and Bayarri, M. J.}, Title = {Confusion over measures of evidence ($p$)'s versus errors ($\alpha$'s) in classical statistical testing}, Journal = {The American Statistician}, Volume = {57}, Number = {3}, Pages = {171--182}, Abstract = {Confusion surrounding the reporting and interpretation of results of classical statistical tests is widespread among applied researchers, most of who erroneously believe that such tests are prescribed by a single coherent theory of statistical evidence.}, Keywords = {p-values; Bayesian analysis; Fisher; hypothesis test; conditional error probabilities; conditional alpha; Bayes factor; posterior probability; significance probability; significance test; Neyman-Pearson Theory;}, Year = {2003} } Record Number: 1546 @article{ Sellke2001, Author = {Sellke, T. and Bayarri, M. J. and Berger, J. O.}, Title = {Calibration of $p$ values for testing precise null hypotheses.}, Journal = {The American Statistician}, Volume = {55}, Number = {1}, Pages = {62--71}, Abstract = {$P$ values are the most commonly used toll to measure evidence against a hypothesis or hypothesized model. Unfortunately, they are often incorrectly viewed as an error probability for rejection of the hypothesis or, even worse, as the posterior probability that the hypothesis is true.}, Keywords = {Bayes factor; Bayesian robustness; conditional alpha; conditional error probabilities; odds;}, Year = {2001} } Joe -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ted Harding Sent: Thursday, April 29, 2004 10:39 AM To: Thomas Lumley Cc: r-help at stat.math.ethz.ch; John Maindonald Subject: Re:[R] p-values On 29-Apr-04 Thomas Lumley wrote:> On Thu, 29 Apr 2004, John Maindonald wrote: > >> This is, of course, not strictly about R. But if there should be >> a decision to pursue such matters on this list, then we'd need >> another list to which such discussion might be diverted. >> > > Ted Harding started such a list (stats-discuss) quite some time ago. > IIRC it was to divert discussions like this from allstat.I did indeed! But it hardly ever received any postings -- my suspicion, which was reinforced by private comments from a number of people, was that because it ws *not* allstat (and therefore would not catch the eye of UK people that posters might hope to reach) it could not be expected to. As one person put it: "One list to find them all ... ". The R list is special, in many ways, and you can get views and information on practically anything from some of the best in the world, so long as it is R-related (even sometimes remotely). The present thread was started by Greg Tarpinian asking a question in a place where he thought he might get a response, even though not in an R context (though it seems oone may develop -- linking R to Bayesian inference). After a few public postings, interested parties have retired to another room (where others are welcome to join us) for a while; we now number 6. Agreed it could be continued on another list (even stats-discuss, which still exists though totally dormant), but this may not suit everyone. Things are doing fine at the moment. But it might prove to be a useful overspill area from teh R list -- someone starts a ball rolling which doesn;t relly belong here, and other could chase it on stats-discuss. So I'm keeping options open. There's also a list stat-l at lists.mcgill.ca which is active, though comfortably low-traffic. Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 29-Apr-04 Time: 16:39:04 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html