Hello all.
I noticed that the default setting for breaks in the construction of histograms
in hist() is ?right = TRUE?.
I think ?right=FALSE? would be more consistent with usual definitions of lower
and upper limits for bins in applied statistics, and I suggest that you consider
making it the default for hist().
For example, I generated the following frequency distribution for duration of
hospitalization with a script in R specifying the cuts to be ?right = FALSE?
(from an exercise in Bernard Rosner?s Fundamentals of Biostatistics book).
number %
[0,5) 5 0.20
[5,10) 12 0.48
[10,15) 6 0.24
[15,20) 1 0.04
[20,25) 0 0.00
[25,30] 1 0.04
The actual boundaries for each bin are: 0-4, 5-9, 10-14, ? and so on since the
limits on the right are ?open?, with the exception of the last bin. This format
is in agreement with usual practice and recommendations. Actually, it is
compatible with the process described by Romer in his book (?from y inclusive to
y exclusive?).
If I use R to generate a histogram with 6 bins, I get the following:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: histogram1.pdf
Type: application/pdf
Size: 4457 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20171108/8db83513/attachment.pdf>
-------------- next part --------------
? which actually presents the histogram of the frequency distribution when the
?right? parameter is set as ?TRUE?:
number %
[0,5] 9 0.36
(5,10] 9 0.36
(10,15] 5 0.20
(15,20] 1 0.04
(20,25] 0 0.00
(25,30] 1 0.04
In this case, the real limits of the bins are 0-5, 6-10, 11-15, ? and so on.
If I edit the histogram command adding ?right = FALSE?, I can get the histogram
for my original frequency distribution. Compare bins 1 and 2 in both
distributions and histograms.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Histogram2.pdf
Type: application/pdf
Size: 4481 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20171108/8db83513/attachment-0001.pdf>
-------------- next part --------------
The actual choice of the argument for the ?right? parameter may be a matter of
choice, but I think most users of R would benefit from using bins with limits
that are closed to the left and open to the right, and so having this setting as
a default for hist().
I am aware I am writing from the limited perspective of my own field
(epidemiology and biostatistics), but there are plenty of examples that show the
need to consider changing the default. Here are just a few:
https://www.statcan.gc.ca/eng/concepts/definitions/age2
https://seer.cancer.gov/stdpopulations/stdpop.19ages.html
https://www.census.gov/data/tables/time-series/demo/income-poverty/cps-hinc/hinc-01.html
Thank you.
Jos?
Jos? G. Conde, MD, MPH
Professor, School of Medicine
Director, CentIT2
UPR Medical Sciences Campus
Tel (787) 763-9401 Fax (787) 758-5206
Email: jose.conde1 at upr.edu
URL: http://rcmi.rcm.upr.edu