Dear list, I'm new to R, please bear with my silly questions. I'm trying to get an understanding of why the results I get from a call to hist() are not as I thought I would get. When I use the parameter freq=FALSE, I think the plot will contain bars that none of them is larger than 1, because they're probabilities. But for my code, the bars exceeded 1. The actual data seems immaterial. I tried with dummy data: > hist(runif(1000), freq=FALSE) and the histogram includes bars well over 1 in height. The man page says that freq=FALSE produces densities, so that the total area is 1. Clearly if all the values are between 0 and 1, as is the case here, some of the bars stand out above 1, for the area to be 1. I thought that it is the sum of the bar heights that would be 1, so that the bars reflect probabilities for each interval, rather than densities. So, the answer to my question would be "because it's densities, not probabilities", but then the question is, why densities and not probabilities? Regards, L.
Because a histogram is descriptive and makes no assumptions about what it describes? Attaching a probability to the bars assumes that some random draw is being made. Suppose my data is a count of computers running a particular OS. What would be the value in reporting this as a probability that a randomly chosen computer is running Ubuntu? Density is more universal, IMO. -------------------------------------- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 "Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it." - Jubal Early, Firefly r-help-bounces at r-project.org wrote on 01/13/2011 01:37:01 PM:> [image removed] > > [R] Question about histogram > > Longe > > to: > > r-help > > 01/13/2011 03:11 PM > > Sent by: > > r-help-bounces at r-project.org > > Dear list, > > I'm new to R, please bear with my silly questions. I'm trying to get an> understanding of why the results I get from a call to hist() are not as > I thought I would get. When I use the parameter freq=FALSE, I think the> plot will contain bars that none of them is larger than 1, because > they're probabilities. But for my code, the bars exceeded 1. > > The actual data seems immaterial. I tried with dummy data: > > > hist(runif(1000), freq=FALSE) > > and the histogram includes bars well over 1 in height. The man page > says that freq=FALSE produces densities, so that the total area is 1. > Clearly if all the values are between 0 and 1, as is the case here, some> of the bars stand out above 1, for the area to be 1. I thought that it > is the sum of the bar heights that would be 1, so that the bars reflect > probabilities for each interval, rather than densities. So, the answer > to my question would be "because it's densities, not probabilities", but> then the question is, why densities and not probabilities? > > Regards, > L. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Densities allow you to then plot a reference distribution, or the result of a call to density, or other density based lines on top of your histogram and everything is appropriately scaled and is fairly easy. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Longe > Sent: Thursday, January 13, 2011 11:37 AM > To: r-help at r-project.org > Subject: [R] Question about histogram > > Dear list, > > I'm new to R, please bear with my silly questions. I'm trying to get > an > understanding of why the results I get from a call to hist() are not as > I thought I would get. When I use the parameter freq=FALSE, I think > the > plot will contain bars that none of them is larger than 1, because > they're probabilities. But for my code, the bars exceeded 1. > > The actual data seems immaterial. I tried with dummy data: > > > hist(runif(1000), freq=FALSE) > > and the histogram includes bars well over 1 in height. The man page > says that freq=FALSE produces densities, so that the total area is 1. > Clearly if all the values are between 0 and 1, as is the case here, > some > of the bars stand out above 1, for the area to be 1. I thought that it > is the sum of the bar heights that would be 1, so that the bars reflect > probabilities for each interval, rather than densities. So, the answer > to my question would be "because it's densities, not probabilities", > but > then the question is, why densities and not probabilities? > > Regards, > L. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi: On Thu, Jan 13, 2011 at 10:37 AM, Longe <longeliver@gmail.com> wrote:> Dear list, > > I'm new to R, please bear with my silly questions. I'm trying to get an > understanding of why the results I get from a call to hist() are not as I > thought I would get. When I use the parameter freq=FALSE, I think the plot > will contain bars that none of them is larger than 1, because they're > probabilities. But for my code, the bars exceeded 1. >Your perception is incorrect, I'm afraid; the bars in a histogram are not probabilities, but rather crude estimates of the density in each subinterval. The *area* of each rectangle gives an approximation to the probability content (the integral of the density) in each corresponding interval. (Think of the process of Riemann integration from calculus as an analogy.) An example of a continuous distribution whose density is greater than 1 is the Uniform(0, 0.5) distribution (or any uniform distribution defined on an interval of width < 1). The distribution is a rectangle with width 0.5 and area 1 (since all continuous probability densities have total area 1 under the density function by definition). The height of the rectangle is the density of the uniform distribution... As the width of the interval gets smaller, the density (height) must get bigger since the area is fixed, and is in fact the reciprocal of its width in the uniform case.> > The actual data seems immaterial. I tried with dummy data: > > > hist(runif(1000), freq=FALSE) > > and the histogram includes bars well over 1 in height. The man page says > that freq=FALSE produces densities, so that the total area is 1. Clearly if > all the values are between 0 and 1, as is the case here, some of the bars > stand out above 1, for the area to be 1. I thought that it is the sum of > the bar heights that would be 1, so that the bars reflect probabilities for > each interval, rather than densities. So, the answer to my question would > be "because it's densities, not probabilities", but then the question is, > why densities and not probabilities? >Histograms are meant to estimate continuous probability density functions. OTOH, in a bar chart of a discrete distribution, relative frequencies are estimated probabilities of each category because the probabilities are point masses that add to 1. Perhaps this is the source of your confusion - a histogram does not have the same interpretation as a bar chart, because it's estimating a smooth curve over a continuous interval rather than a set of (probability) masses at fixed points. HTH, Dennis> > Regards, > L. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]