R-Listers. The following is a rant originally sent privately to Frank Harrell in response to remarks he made on this list. The ideas are not new or original, but he suggested I share it with the list, as he felt that it might be of wider interest, nonetheless. I have real doubts about this, and I apologize in advance to those who agree that I should have kept my remarks private. In view of this, if you wish to criticize my remarks on list, that's fine, but I won't respond (I've said enough already!). I would be happy to discuss issues (a little) further off list with anyone who wishes to bother, but not on list. Also, Frank sent me a relevant reference for those who might wish to read a more thoughtful consideration of the issues: @ARTICLE{far92cos, author = {Faraway, J. J.}, year = 1992, title = {The cost of data analysis}, journal = J Comp Graphical Stat, volume = 1, pages = {213-229}, annote = {bootstrap; validation; predictive accuracy; modeling strategy; regression diagnostics;model uncertainty} } I welcome further relevant references, pro or con! Finally, I need to emphasize that these are clearly my very personal views and do not reflect those of my company or colleagues. Cheers to all ... ----------- The relevant portion of Frank's original comment was in a thread about K-S tests for the goodness of fit of a parametric distribution: ...> If you use the empirical CDF to select a parametric > distribution, the final estimate of the distribution will inherit the > variance of the ECDF. > The main reason statisticians think that > parametric curve fits are far more efficient than > nonparametric ones is > that they don't account for model uncertainty in their final > confidence > intervals. > > -- Frank HarrellMy reply: That's a perceptive remark, but I would go further... You mentioned **model** uncertainty. In fact, in any data analysis in which we explore the data first to choose a model, fit the model (parametric or non..), and then use whatever (pivots from parametric analysis; bootstrapping;...) to say something about "model uncertainty," we're always kidding ourselves and our colleagues because we fail to take into account the considerable variability introduced by our initial subjective exploration and subsequent choice of modeling strategy. One can only say (at best) that the stated model uncertainty is an underestimate of the true uncertainty. And very likely a considerable underestimate because of the model choice subjectivity. Now I in no way wish to discourage or abridge data exploration; only to point out that we statisticians have promulgated a self-serving and unrealistic view of the value of formal inference in quantifying true scientific uncertainty when we do such exploration -- and that there is therefore something fundamentally contradictory in our own rhetoric and methods. Taking a larger view, I think this remark is part of the deeper epistemological issue of characterizing what can be scientifically "known" or, indeed, defining the difference between science and art, say. My own view is that scientific certainty is a fruitless concept: we build models that we benchmark against our subjective measurements (as the measurements themselves depend on earlier scientific models) of "reality." Insofar as data can limit or support our flights of modeling fancy, they do; but in the end, it is neither an objective process nor one whose "uncertainty" can be strictly quantified. In creating the illusion that "statistical methods" can overcome these limitations, I think we have both done science a disservice and relegated ourselves to an isolated, fringe role in scientific inquiry. Needless to say, opposing viewpoints to such iconclastic remarks are cheerfully welcomed. Best regards, Bert Gunter
On Wed, 12 Jan 2005, Berton Gunter wrote:>R-Listers. > >The following is a rant originally sent privately to Frank Harrell in >response to remarks he made on this list. The ideas are not new or original, >but he suggested I share it with the list, as he felt that it might be of >wider interest, nonetheless. I have real doubts about this, and I apologize >in advance to those who agree that I should have kept my remarks private. >In view of this, if you wish to criticize my remarks on list, that's fine, >but I won't respond (I've said enough already!). I would be happy to discuss >issues (a little) further off list with anyone who wishes to bother, but not >on list. > >Also, Frank sent me a relevant reference for those who might wish to read a >more thoughtful consideration of the issues: > >@ARTICLE{far92cos, > author = {Faraway, J. J.}, > year = 1992, > title = {The cost of data analysis}, > journal = J Comp Graphical Stat, > volume = 1, > pages = {213-229}, > annote = {bootstrap; validation; predictive accuracy; modeling strategy; > regression diagnostics;model uncertainty} >} > >I welcome further relevant references, pro or con! > >Finally, I need to emphasize that these are clearly my very personal views >and do not reflect those of my company or colleagues. > >Cheers to all ... >----------- > >The relevant portion of Frank's original comment was in a thread about K-S >tests for the goodness of fit of a parametric distribution: > >... >> If you use the empirical CDF to select a parametric >> distribution, the final estimate of the distribution will inherit the >> variance of the ECDF. >> The main reason statisticians think that >> parametric curve fits are far more efficient than >> nonparametric ones is >> that they don't account for model uncertainty in their final >> confidence >> intervals. >> >> -- Frank Harrell > >My reply: > >That's a perceptive remark, but I would go further... You mentioned >**model** uncertainty. In fact, in any data analysis in which we explore the >data first to choose a model, fit the model (parametric or non..), and then >use whatever (pivots from parametric analysis; bootstrapping;...) to say >something about "model uncertainty," we're always kidding ourselves and our >colleagues because we fail to take into account the considerable variability >introduced by our initial subjective exploration and subsequent choice of >modeling strategy. One can only say (at best) that the stated model >uncertainty is an underestimate of the true uncertainty. And very likely a >considerable underestimate because of the model choice subjectivity. > >Now I in no way wish to discourage or abridge data exploration; only to >point out that we statisticians have promulgated a self-serving and >unrealistic view of the value of formal inference in quantifying true >scientific uncertainty when we do such exploration -- and that there is >therefore something fundamentally contradictory in our own rhetoric and >methods. Taking a larger view, I think this remark is part of the deeper >epistemological issue of characterizing what can be scientifically "known" >or, indeed, defining the difference between science and art, say. My own >view is that scientific certainty is a fruitless concept: we build models >that we benchmark against our subjective measurements (as the measurements >themselves depend on earlier scientific models) of "reality." Insofar as >data can limit or support our flights of modeling fancy, they do; but in the >end, it is neither an objective process nor one whose "uncertainty" can be >strictly quantified.I totally agree with the above and I am totally unqualified to comment on the below. You (and others) might find these papers interesting... http://www.santafe.edu/~chaos/chaos/pubs.htm Specifically papers like... Synchronizing to the Environment: Information Theoretic Constraints on Agent Learning. http://www.santafe.edu/~cmg/papers/stte.pdf Is Anything Ever New? Considering Emergence. http://www.santafe.edu/~cmg/papers/EverNew.pdf Observing Complexity and The Complexity of Observation http://www.santafe.edu/~cmg/papers/OCACO.pdf What Lies Between Order and Chaos? http://www.santafe.edu/~cmg/papers/wlboac.pdf And probably many more.>In creating the illusion that "statistical methods" can >overcome these limitations, I think we have both done science a disservice >and relegated ourselves to an isolated, fringe role in scientific inquiry. > >Needless to say, opposing viewpoints to such iconclastic remarks are >cheerfully welcomed.Does it make any difference to the mass of Saturn? Dan.> >Best regards, > >Bert Gunter > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
I have often noted that "statistics can't prove a damn thing, but they can be really useful in disproving something." Having spent most of 80s and half of the 90s with the Australian Bureau of Statistics to find out how you collect these numbers, I am disconcerted at the apparent disregard for measurement issues such as bias, input error, questionnaire design etc etc. ... Science wars ... the real world ... and the not so real world. Having only recently discovered what our esteemed J Baron does I should say that a lot of his work requires us to ask how we use (abuse?) the tools we have. Having said that some of my most influential work has come from data exploration within fields where I would describe myself as a complete novice. Using ony the phrase "the data seems to indicate" realtionship x with y or some variant and asking if this is an accepted norm has produced some unexpected paradigm shifts. Someone on the list has a footline of something along the lines of "All models are wrong, but some of them are useful." I think this is attributed to Box. As most of us know some of the advice on this list has more sage than others. That all concludes to say the manner in which we deal with non-model uncertainty, impacts upon the degree to which we perform a disservice to science/ourselves. I think you are being unduly pessimistic, but then again I might just be a cynic masquerading as a realist. Tom> -----Original Message-----...> That's a perceptive remark, but I would go further... You mentioned > **model** uncertainty. In fact, in any data analysis in which > we explore the > data first to choose a model, fit the model (parametric or > non..), and then > use whatever (pivots from parametric analysis; > bootstrapping;...) to say > something about "model uncertainty," we're always kidding > ourselves and our > colleagues because we fail to take into account the > considerable variability > introduced by our initial subjective exploration and > subsequent choice of > modelling strategy. One can only say (at best) that the stated model > uncertainty is an underestimate of the true uncertainty. And > very likely a > considerable underestimate because of the model choice subjectivity. > > Now I in no way wish to discourage or abridge data > exploration; only to > point out that we statisticians have promulgated a self-serving and > unrealistic view of the value of formal inference in quantifying true > scientific uncertainty when we do such exploration -- and > that there is > therefore something fundamentally contradictory in our own > rhetoric and > methods. Taking a larger view, I think this remark is part of > the deeper > epistemological issue of characterizing what can be > scientifically "known" > or, indeed, defining the difference between science and art, > say. My own > view is that scientific certainty is a fruitless concept: we > build models > that we benchmark against our subjective measurements (as the > measurements > themselves depend on earlier scientific models) of "reality." > Insofar as > data can limit or support our flights of modeling fancy, they > do; but in the > end, it is neither an objective process nor one whose > "uncertainty" can be > strictly quantified. In creating the illusion that > "statistical methods" can > overcome these limitations, I think we have both done science > a disservice > and relegated ourselves to an isolated, fringe role in > scientific inquiry. > > Needless to say, opposing viewpoints to such iconclastic remarks are > cheerfully welcomed. > > Best regards, > > Bert Gunter > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >