Mulholland, Tom
2005-Feb-03 01:30 UTC
[R] Displaying a distribution -- was: Combining two histograms
I am immediately reminded of something I read which goes "A sufficiently trained statistician can read the vagaries of a Q-Q plot like a sharman can read a chicken's entrails, with a similar recourse to scientific principles. Interpreting Q-Q plots is more a visceral than an intellectual exercise. The uninitiated are often mystified by the process. Experience is the key here." http://www.maths.murdoch.edu.au/units/statsnotes/samplestats/qqplot.html Having said that I would suggest many people have difficulty understanding density plots, but think that they can understand histograms. I am currently undergoing shaman training ;-) and find that my interpretation of the plots owes more to experience than it does to a structured method of analysis. I see the technique as additional rather than as a replacement for density estimates. As for the order of exploration, I tend to be non-linear in my explorations. In my perfect world I would like them to be simultaneous. The order of any information presentation can impact upon the output, so I tend to have lists of processes to be done without pre-ordaining the order. It could be that I see exploration as a different process to analysis. That is I am more ad-hoc with the generation of pieces of the puzzle and more structured with putting the picture together. Tom.> -----Original Message----- > From: Berton Gunter [mailto:gunter.berton at gene.com] > Sent: Thursday, 3 February 2005 12:52 AM > To: 'Deepayan Sarkar'; r-help at stat.math.ethz.ch > Subject: [R] Displaying a distribution -- was: Combining two > histograms > > > May I take this off topic a little to seek collective wisdom > (and so feel > free to reply privately). > > The catalyst is Deepayan's remark: > > > Histograms were appropriate for drawing density estimates by > > hand in the good old days, but I can imagine very few > situations where I > > would not prefer to use smoother density estimates when I have the > > computational power to do so. > > > > Deepayan > > Generally, I agree; but the appearance and thus one's perception and > interpretation of both histograms and density plots depend upon the > parameters chosen for the display (bin boundaries for > histograms; bandwidth > and kernel for density plots). Important data peculiarities > like arbitrary > rounding, favoring of certain values, resolution limitations, > and so forth > are therefore often lost. I would instead advocate that > simple quantile > plots -- plot(ppoints(x),sort(x)) -- or perhaps normal > qqplots always be the > first plot used to explore univariate data distributions. I > believe this > conforms to Bill Cleveland's recommendations, who says in the > first sentence > on p. 17 of VISUALIZING DATA on visualizing univariate data: > "Quantiles are > essential to visualizing distributions." > > While it is true that many people may be unfamiliar with > quantile plots, I > think we need to improve modern statistical practice not only > by abandoning > histograms in favor of density plots, but also by always first using > quantile plots and explaining why this is necessary. > > Difficult issue: What should one do when when there are, say, > a million > values? > > Alternative views? > > > -- Bert Gunter > Genentech Non-Clinical Statistics > South San Francisco, CA > > "The business of the statistician is to catalyze the > scientific learning > process." - George E. P. Box > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >