> >>>>> "PD" == Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk> writes: > > PD> "Venables, Bill (CMIS, Cleveland)" <Bill.Venables at cmis.CSIRO.AU> > PD> writes: > >> The fact that every elementary book on statistics does it this way > >> does not make it correct. To be helpful, a histogram really has to > >> be a non-parametric density estimator, period. > >> > >> Enough already of polemics. > > PD> Not quite! There is a reason for doing it the other way, namely > PD> that the concept of a histogram generally comes before the concept > PD> of a probability density, pedagogically. It is very easy to explain > PD> that you chop up the axis into bins and count the number of data > PD> points that fall in each of them. I bet that half of the MDs that I > PD> teach never quite understand the density (hell, the author of the > PD> textbook I use managed to plot three identical gaussian curves with > PD> identical y axis but different x axes... and he's a > PD> statistician). So for the basic uses of the histogram, one would be > PD> replacing a perfectly intuitive simple unit with a substantially > PD> more complex one. > > I agree 100% with Peter. > Being a mathematician I agree with Bill that for us, a histogram is a > (very suboptimal) density estimate; but average statistics software users > *do* learn histograms differently..I hope there are many of us that agree 100% with Bill. Bad practice, as enshrined in the default behaviour of histogram, should be discouraged. We should aim to introduce density-based histograms from the outset, and the default behaviour of histograms in many packages acts against this principle. The current default behaviour conveys a misleading and arguably useless summary, and I don't go with the argument that we should persist with it because it is simple to understand where the numbers come from. Cheers, David. --------------------------------------------------------------------- David Wooff, Director, Statistics and Mathematics Consultancy Unit, Department of Mathematical Sciences, University of Durham. Science Laboratories, South Road, Durham, DH1 3LE, UK. Tel. 0191 374 4531, Fax 0191 374 7388. --------------------------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>>>>> D A Wooff writes:>> >>>>> "PD" == Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk> writes: >>PD> "Venables, Bill (CMIS, Cleveland)" <Bill.Venables at cmis.CSIRO.AU> PD> writes:>> >> The fact that every elementary book on statistics does it this way >> >> does not make it correct. To be helpful, a histogram really has to >> >> be a non-parametric density estimator, period. >> >> >> >> Enough already of polemics. >>PD> Not quite! There is a reason for doing it the other way, namely PD> that the concept of a histogram generally comes before the concept PD> of a probability density, pedagogically. It is very easy to explain PD> that you chop up the axis into bins and count the number of data PD> points that fall in each of them. I bet that half of the MDs that I PD> teach never quite understand the density (hell, the author of the PD> textbook I use managed to plot three identical gaussian curves with PD> identical y axis but different x axes... and he's a PD> statistician). So for the basic uses of the histogram, one would be PD> replacing a perfectly intuitive simple unit with a substantially PD> more complex one.>> >> I agree 100% with Peter. >> Being a mathematician I agree with Bill that for us, a histogram is a >> (very suboptimal) density estimate; but average statistics software users >> *do* learn histograms differently..> I hope there are many of us that agree 100% with Bill. Bad practice, > as enshrined in the default behaviour of histogram, should be > discouraged. We should aim to introduce density-based histograms from > the outset, and the default behaviour of histograms in many packages > acts against this principle. The current default behaviour conveys a > misleading and arguably useless summary, and I don't go with the > argument that we should persist with it because it is simple to > understand where the numbers come from.I side with Peter. In an elementary stats course ... Maybe have densityplot(..., method = "histogram") for the real thing? -k -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 08-Jun-99 D.A.Wooff at durham.ac.uk wrote:> > I hope there are many of us that agree 100% with Bill. Bad practice, > as enshrined in the default behaviour of histogram, should be > discouraged. We should aim to introduce density-based histograms from > the outset, and the default behaviour of histograms in many packages > acts against this principle. The current default behaviour conveys a > misleading and arguably useless summary and I don't go with the > argument that we should persist with it because it is simple to > understand where the numbers come from.What's going on? There's NOTHING wrong with a histogram as such. "Bad practice, as enshrined in the default behaviour of histogram"; "The current default behaviour conveys a misleading and arguably useless summary"; -- I respectfully disagree. Aka b****cks. If the histogram bin size matches the discretization of the data, then the histogram is equivalent to the data but simply represents it differently. What's wrong with that? If the bin size is coarser, then some information is lost of course. But the nature of the loss (no discrimination within bins) is well defined and unambiguous, and there is no interference between different bins. What (apart from the loss of this specific info) is wrong with that? I recently had some data of which I did histos with bin-size equal to data resolution. The following leapt to the eye (summarised in tabular form): X: 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 etc N: 856 0 730 0 0 723 0 584 0 0 425 1 319 0 0 220 etc Misleading and useless? Highly informative, according to me; and I probably would not have noticed it so readily without looking at the histogram. A density estimate would have made a real mush of it. A histogram binned to width 0.2 would have completely (but cleanly) concealed 90 per cent of it: the 10 per cent being the zero count for 2.8-2.9, 3.8-3.9, ... so in the end I would have done a raw histo anyway! Density estimates also lose information. Of course the nature of the loss is, theoretically, described in the definition of the smoothing procedure. But in practice it's far more difficult to hypothesise what may underlie a quirk in a density estimate, because of the interference between neighbouring data values. Density estimates have the merit of producing pictures which are much more suggestive of a continuously varying probability density curve. In some cases this may be usefully informative; in particular the desnity estimate is sensitive to any variation in data value. In other cases it may be merely cosmetic. In the worst cases it may give a seriously misleading impression (as of course histograms also could). Both methods have their uses, their (somewhat complementary) merits, and their (somewhat complementary) demerits. As usual, it's horses for courses. But, specifically (as I said to start with): There's NOTHING wrong with histograms as such. I don't understand why people suggest that there is. There may, however, be something seriously wrong with the way many people interpret them, or with the uses that software packages make of them. But those are different -- and possibly much more appropriate -- targets. Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Date: 08-Jun-99 Time: 12:43:54 ------------------------------ XFMail ------------------------------ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
PMFJI. Isn't this a correct definition? I am not a professional statistician, but this is the definition given in 3 different dictionaries and pretty well compares with the descriptions in the 6 or so statistics books I have sitting on my shelf. It seems to describe what I learned to call a "frequency histogram", as opposed to a "density histogram". histogram n : a bar chart representing a frequency distribution; heights of the bars represent observed frequencies Dr. Marc R. Feldesman email: feldesmanm at pdx.edu email: feldesman at ibm.net fax: 503-725-3905 "Math is hard. Let's go to the mall" Barbie Powered by: Monstrochoerus - the 300 MHz Pentium II -------------- next part -------------- An HTML attachment was scrubbed... URL: https://stat.ethz.ch/pipermail/r-help/attachments/19990608/df4761b4/attachment.html
Seemingly Similar Threads
- Add a density line to a cumulative histogram - second try
- Lattice: Superimposing histograms with different colors and transparency effects
- Need help putting histograms on the diagonal of a splom plot
- Combining two histograms
- Possible to "import" histograms in R?