thr3ads.net - R help - [R] Question about histogram [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Longe

2011-Jan-13 18:37 UTC

[R] Question about histogram

Dear list,

I'm new to R, please bear with my silly questions.  I'm trying to get an
understanding of why the results I get from a call to hist() are not as 
I thought I would get.  When I use the parameter freq=FALSE, I think the 
plot will contain bars that none of them is larger than 1, because 
they're probabilities.  But for my code, the bars exceeded 1.

The actual data seems immaterial.  I tried with dummy data:

 > hist(runif(1000), freq=FALSE)

and the histogram includes bars well over 1 in height.  The man page 
says that freq=FALSE produces densities, so that the total area is 1.  
Clearly if all the values are between 0 and 1, as is the case here, some 
of the bars stand out above 1, for the area to be 1.  I thought that it 
is the sum of the bar heights that would be 1, so that the bars reflect 
probabilities for each interval, rather than densities.  So, the answer 
to my question would be "because it's densities, not
probabilities", but
then the question is, why densities and not probabilities?

Regards,
L.

Jonathan P Daily

2011-Jan-13 20:21 UTC

head link

[R] Question about histogram

Because a histogram is descriptive and makes no assumptions about what it 
describes? Attaching a probability to the bars assumes that some random 
draw is being made. Suppose my data is a count of computers running a 
particular OS. What would be the value in reporting this as a probability 
that a randomly chosen computer is running Ubuntu? Density is more 
universal, IMO.
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
     - Jubal Early, Firefly

r-help-bounces at r-project.org wrote on 01/13/2011 01:37:01 PM:
> [image removed] 
> 
> [R] Question about histogram
> 
> Longe 
> 
> to:
> 
> r-help
> 
> 01/13/2011 03:11 PM
> 
> Sent by:
> 
> r-help-bounces at r-project.org
> 
> Dear list,
> 
> I'm new to R, please bear with my silly questions.  I'm trying to
get an
> understanding of why the results I get from a call to hist() are not as 
> I thought I would get.  When I use the parameter freq=FALSE, I think the 
> plot will contain bars that none of them is larger than 1, because 
> they're probabilities.  But for my code, the bars exceeded 1.
> 
> The actual data seems immaterial.  I tried with dummy data:
> 
>  > hist(runif(1000), freq=FALSE)
> 
> and the histogram includes bars well over 1 in height.  The man page 
> says that freq=FALSE produces densities, so that the total area is 1. 
> Clearly if all the values are between 0 and 1, as is the case here, some 
> of the bars stand out above 1, for the area to be 1.  I thought that it 
> is the sum of the bar heights that would be 1, so that the bars reflect 
> probabilities for each interval, rather than densities.  So, the answer 
> to my question would be "because it's densities, not
probabilities", but
> then the question is, why densities and not probabilities?
> 
> Regards,
> L.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Greg Snow

2011-Jan-13 20:22 UTC

head link

[R] Question about histogram

Densities allow you to then plot a reference distribution, or the result of a
call to density, or other density based lines on top of your histogram and
everything is appropriately scaled and is fairly easy.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Longe
> Sent: Thursday, January 13, 2011 11:37 AM
> To: r-help at r-project.org
> Subject: [R] Question about histogram
> 
> Dear list,
> 
> I'm new to R, please bear with my silly questions.  I'm trying to
get
> an
> understanding of why the results I get from a call to hist() are not as
> I thought I would get.  When I use the parameter freq=FALSE, I think
> the
> plot will contain bars that none of them is larger than 1, because
> they're probabilities.  But for my code, the bars exceeded 1.
> 
> The actual data seems immaterial.  I tried with dummy data:
> 
>  > hist(runif(1000), freq=FALSE)
> 
> and the histogram includes bars well over 1 in height.  The man page
> says that freq=FALSE produces densities, so that the total area is 1.
> Clearly if all the values are between 0 and 1, as is the case here,
> some
> of the bars stand out above 1, for the area to be 1.  I thought that it
> is the sum of the bar heights that would be 1, so that the bars reflect
> probabilities for each interval, rather than densities.  So, the answer
> to my question would be "because it's densities, not
probabilities",
> but
> then the question is, why densities and not probabilities?
> 
> Regards,
> L.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Dennis Murphy

2011-Jan-13 21:45 UTC

head link

[R] Question about histogram

Hi:

On Thu, Jan 13, 2011 at 10:37 AM, Longe <longeliver@gmail.com> wrote:
> Dear list,
>
> I'm new to R, please bear with my silly questions.  I'm trying to
get an
> understanding of why the results I get from a call to hist() are not as I
> thought I would get.  When I use the parameter freq=FALSE, I think the plot
> will contain bars that none of them is larger than 1, because they're
> probabilities.  But for my code, the bars exceeded 1.
>
Your perception is incorrect, I'm afraid; the bars in a histogram are not
probabilities, but rather crude estimates of the density in each
subinterval. The *area* of each rectangle gives an approximation to the
probability content (the integral of the density) in each corresponding
interval. (Think of the process of Riemann integration from calculus as an
analogy.)

An example of a continuous distribution whose density is greater than 1 is
the Uniform(0, 0.5) distribution (or any uniform distribution defined on an
interval of width < 1). The distribution is a rectangle with width 0.5 and
area 1 (since all continuous probability densities have total area 1 under
the density function by definition). The height of the rectangle is the
density of the uniform distribution...

As the width of the interval gets smaller, the density (height) must get
bigger since the area is fixed, and is in fact the reciprocal of its width
in the uniform case.
>
> The actual data seems immaterial.  I tried with dummy data:
>
> > hist(runif(1000), freq=FALSE)
>
> and the histogram includes bars well over 1 in height.  The man page says
> that freq=FALSE produces densities, so that the total area is 1.  Clearly
if
> all the values are between 0 and 1, as is the case here, some of the bars
> stand out above 1, for the area to be 1.  I thought that it is the sum of
> the bar heights that would be 1, so that the bars reflect probabilities for
> each interval, rather than densities.  So, the answer to my question would
> be "because it's densities, not probabilities", but then the
question is,
> why densities and not probabilities?
>
Histograms are meant to estimate continuous probability density functions.
OTOH, in a bar chart of a discrete distribution, relative frequencies are
estimated probabilities of each category because the probabilities are point
masses that add to 1. Perhaps this is the source of your confusion - a
histogram does not have the same interpretation as a bar chart, because it's
estimating a smooth curve over a continuous interval rather than a set of
(probability) masses at fixed points.

HTH,
Dennis
>
> Regards,
> L.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more reasonably related threads

R help - Jan 2011 - Question about histogram

[R] Question about histogram

[R] Question about histogram

[R] Question about histogram

[R] Question about histogram

Seemingly Similar Threads