thr3ads.net - R help - [R] histograms [Jun 1999]

If this information is useful, please help other people find it:
Share via:

D.A.Wooff@durham.ac.uk

1999-Jun-08 10:14 UTC

[R] histograms

> >>>>> "PD" == Peter Dalgaard BSA <p.dalgaard at
biostat.ku.dk> writes:
> 
>     PD> "Venables, Bill (CMIS, Cleveland)" <Bill.Venables
at cmis.CSIRO.AU>
>     PD> writes:
>     >> The fact that every elementary book on statistics does it this
way
>     >> does not make it correct.  To be helpful, a histogram really
has to
>     >> be a non-parametric density estimator, period.
>     >> 
>     >> Enough already of polemics.
> 
>     PD> Not quite! There is a reason for doing it the other way, namely
>     PD> that the concept of a histogram generally comes before the
concept
>     PD> of a probability density, pedagogically. It is very easy to
explain
>     PD> that you chop up the axis into bins and count the number of data
>     PD> points that fall in each of them. I bet that half of the MDs
that I
>     PD> teach never quite understand the density (hell, the author of
the
>     PD> textbook I use managed to plot three identical gaussian curves
with
>     PD> identical y axis but different x axes... and he's a
>     PD> statistician). So for the basic uses of the histogram, one would
be
>     PD> replacing a perfectly intuitive simple unit with a substantially
>     PD> more complex one.
> 
> I agree 100% with Peter.  
> Being a mathematician I agree with Bill that for us, a histogram is a
> (very suboptimal) density estimate;  but average statistics software users
> *do* learn histograms differently..  
I hope there are many of us that agree 100% with Bill. Bad practice,
as enshrined in the default behaviour of histogram, should be
discouraged.  We should aim to introduce density-based histograms from
the outset, and the default behaviour of histograms in many packages
acts against this principle. The current default behaviour conveys a
misleading and arguably useless summary, and I don't go with the
argument that we should persist with it because it is simple to
understand where the numbers come from.

Cheers,

David.

---------------------------------------------------------------------
  David Wooff, Director, Statistics and Mathematics Consultancy Unit,
  Department of Mathematical Sciences, University of Durham.
  Science Laboratories, South Road, Durham, DH1 3LE, UK.
  Tel. 0191 374 4531, Fax 0191 374 7388.
---------------------------------------------------------------------
  

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Kurt Hornik

1999-Jun-08 10:38 UTC

head link

[R] histograms

>>>>> D A Wooff writes:
>> >>>>> "PD" == Peter Dalgaard BSA
<p.dalgaard at biostat.ku.dk> writes:
>> PD> "Venables, Bill (CMIS, Cleveland)" <Bill.Venables at
cmis.CSIRO.AU>
PD> writes:>> >> The fact that every elementary book on statistics does it this
way
>> >> does not make it correct.  To be helpful, a histogram really
has to
>> >> be a non-parametric density estimator, period.
>> >> 
>> >> Enough already of polemics.
>> PD> Not quite! There is a reason for doing it the other way, namely
PD> that the concept of a histogram generally comes before the concept
PD> of a probability density, pedagogically. It is very easy to explain
PD> that you chop up the axis into bins and count the number of data
PD> points that fall in each of them. I bet that half of the MDs that I
PD> teach never quite understand the density (hell, the author of the
PD> textbook I use managed to plot three identical gaussian curves with
PD> identical y axis but different x axes... and he's a
PD> statistician). So for the basic uses of the histogram, one would be
PD> replacing a perfectly intuitive simple unit with a substantially
PD> more complex one.>> 
>> I agree 100% with Peter.  
>> Being a mathematician I agree with Bill that for us, a histogram is a
>> (very suboptimal) density estimate;  but average statistics software
users
>> *do* learn histograms differently..  
> I hope there are many of us that agree 100% with Bill. Bad practice,
> as enshrined in the default behaviour of histogram, should be
> discouraged.  We should aim to introduce density-based histograms from
> the outset, and the default behaviour of histograms in many packages
> acts against this principle. The current default behaviour conveys a
> misleading and arguably useless summary, and I don't go with the
> argument that we should persist with it because it is simple to
> understand where the numbers come from.
I side with Peter.  In an elementary stats course ...

Maybe have densityplot(..., method = "histogram") for the real thing?

-k

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

(Ted Harding)

1999-Jun-08 11:43 UTC

head link

[R] histograms

On 08-Jun-99 D.A.Wooff at durham.ac.uk wrote:> 
> I hope there are many of us that agree 100% with Bill. Bad practice,
> as enshrined in the default behaviour of histogram, should be
> discouraged.  We should aim to introduce density-based histograms from
> the outset, and the default behaviour of histograms in many packages
> acts against this principle. The current default behaviour conveys a
> misleading and arguably useless summary and I don't go with the
> argument that we should persist with it because it is simple to
> understand where the numbers come from.
What's going on? There's NOTHING wrong with a histogram as such.
"Bad practice, as enshrined in the default behaviour of histogram";
"The current default behaviour conveys a misleading and arguably useless
summary"; -- I respectfully disagree. Aka b****cks.

If the histogram bin size matches the discretization of the data,
then the histogram is equivalent to the data but simply represents
it differently. What's wrong with that?

If the bin size is coarser, then some information is lost of course.
But the nature of the loss (no discrimination within bins) is well
defined and unambiguous, and there is no interference between
different bins. What (apart from the loss of this specific info)
is wrong with that?

I recently had some data of which I did histos with bin-size equal to
data resolution. The following leapt to the eye (summarised in tabular
form):

X: 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 etc

N: 856   0 730   0   0 723   0 584   0   0 425   1 319   0   0 220 etc

Misleading and useless? Highly informative, according to me; and
I probably would not have noticed it so readily without looking at the
histogram. A density estimate would have made a real mush of it.
A histogram binned to width 0.2 would have completely (but cleanly)
concealed 90 per cent of it: the 10 per cent being the zero count
for 2.8-2.9, 3.8-3.9, ... so in the end I would have done a raw histo
anyway!

Density estimates also lose information. Of course the nature of the
loss is, theoretically, described in the definition of the smoothing
procedure. But in practice it's far more difficult to hypothesise
what may underlie a quirk in a density estimate, because of the
interference between neighbouring data values.

Density estimates have the merit of producing pictures which are much
more suggestive of a continuously varying probability density curve. In
some cases this may be usefully informative; in particular the desnity
estimate is sensitive to any variation in data value. In other cases it
may be merely cosmetic. In the worst cases it may give a seriously
misleading impression (as of course histograms also could).

Both methods have their uses, their (somewhat complementary) merits,
and their (somewhat complementary) demerits. As usual, it's horses
for courses.

But, specifically (as I said to start with): There's NOTHING wrong with
histograms as such.

I don't understand why people suggest that there is. There may, however,
be something seriously wrong with the way many people interpret them, or
with the uses that software packages make of them. But those are
different -- and possibly much more appropriate -- targets.

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Date: 08-Jun-99                                       Time: 12:43:54
------------------------------ XFMail ------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Marc R. Feldesman

1999-Jun-08 18:08 UTC

head link

[R] histograms

PMFJI.  Isn't this a correct definition?  I am not a professional
statistician,
but this is the definition given in 3 different dictionaries and pretty well
compares with the descriptions in the 6 or so statistics books I have sitting
on my shelf.  It seems to describe what I learned to call a "frequency
histogram", as opposed to a "density histogram".


histogram n : a bar chart representing a frequency distribution; heights of the
bars represent observed frequencies 




Dr. Marc R. Feldesman
email:  feldesmanm at pdx.edu
email:  feldesman at ibm.net
fax:    503-725-3905

"Math is hard.  Let's go to the mall"  Barbie

Powered by:  Monstrochoerus - the 300 MHz Pentium II 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://stat.ethz.ch/pipermail/r-help/attachments/19990608/df4761b4/attachment.html

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jun 1999 - histograms

[R] histograms

[R] histograms

[R] histograms

[R] histograms

Apparently Analagous Threads