Hello I'm doing some experiments with the various histogram functions and I have a two questions about the "prob" option and binning. First, here's a simple plot of my data using the default hist() function:> hist(data[,1], prob = TRUE, xlim = c(0, 35))http://go.sneakymustard.com/tmp/hist.jpg My first question is regarding the resulting plot from hist.scott() and hist.FD(), from the MASS package. I'm setting prob to TRUE in these functions, but as it can be seen in the images below, the value for the first bar of the histogram is well above 1.0. Shouldn't the total area be 1.0 in the case of prob = TRUE?> hist.scott(data[,1], prob = TRUE, xlim=c(0, 35))http://go.sneakymustard.com/tmp/scott.jpg> hist.FD(data[,1], prob = TRUE, xlim=c(0, 35))http://go.sneakymustard.com/tmp/FD.jpg Is there anything I can do to "fix" these plots? My second question is related to binning. Is there a function or package that allows one to use logarithmic binning in R, that is, create bins such that the length of a bin is a multiple of the length of the one before it? Pointers to the appropriate docs are welcome, I've been searching for this and couldn't find any info. Best regards, Andre
On 10/02/2008 8:14 PM, Andre Nathan wrote:> Hello > > I'm doing some experiments with the various histogram functions and I > have a two questions about the "prob" option and binning. > > First, here's a simple plot of my data using the default hist() > function: > >> hist(data[,1], prob = TRUE, xlim = c(0, 35)) > > http://go.sneakymustard.com/tmp/hist.jpg > > My first question is regarding the resulting plot from hist.scott() and > hist.FD(), from the MASS package. I'm setting prob to TRUE in these > functions, but as it can be seen in the images below, the value for the > first bar of the histogram is well above 1.0. Shouldn't the total area > be 1.0 in the case of prob = TRUE? > >> hist.scott(data[,1], prob = TRUE, xlim=c(0, 35))It looks to me as though the area is one. The first bar is about 3.6 units high, and about 0.2 units wide: area is 0.72. There are no gaps between bars in an R histogram, so the gaps you see in this jpg are bars with zero height. Duncan Murdoch> > http://go.sneakymustard.com/tmp/scott.jpg > >> hist.FD(data[,1], prob = TRUE, xlim=c(0, 35)) > > http://go.sneakymustard.com/tmp/FD.jpg > > Is there anything I can do to "fix" these plots? > > My second question is related to binning. Is there a function or package > that allows one to use logarithmic binning in R, that is, create bins > such that the length of a bin is a multiple of the length of the one > before it? > > Pointers to the appropriate docs are welcome, I've been searching for > this and couldn't find any info. > > Best regards, > Andre > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Andre, Regarding your first question, it is by no means clear there is anything to fix, in fact I'm sure there is nothing to fix. The fact that the height of any bar is greater than one is irrelevant - the width of the bar is much less than one, as is the product of height by width. Area is height x width, not just height.... Regarding the second question - logarithmic breaks. I'm not aware of anything currently available to do this, but the tools are there for you to do it yourself. The 'breaks' argument to hist allows you to specify your breaks explicitly (among other things) so it's just a matter of setting up the logarithmic (or, more precisely, 'geometric progression') bins yourself and relaying them on to hist. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Andre Nathan Sent: Monday, 11 February 2008 11:14 AM To: r-help at r-project.org Subject: [R] Questions about histograms Hello I'm doing some experiments with the various histogram functions and I have a two questions about the "prob" option and binning. First, here's a simple plot of my data using the default hist() function:> hist(data[,1], prob = TRUE, xlim = c(0, 35))http://go.sneakymustard.com/tmp/hist.jpg My first question is regarding the resulting plot from hist.scott() and hist.FD(), from the MASS package. I'm setting prob to TRUE in these functions, but as it can be seen in the images below, the value for the first bar of the histogram is well above 1.0. Shouldn't the total area be 1.0 in the case of prob = TRUE?> hist.scott(data[,1], prob = TRUE, xlim=c(0, 35))http://go.sneakymustard.com/tmp/scott.jpg> hist.FD(data[,1], prob = TRUE, xlim=c(0, 35))http://go.sneakymustard.com/tmp/FD.jpg Is there anything I can do to "fix" these plots? My second question is related to binning. Is there a function or package that allows one to use logarithmic binning in R, that is, create bins such that the length of a bin is a multiple of the length of the one before it? Pointers to the appropriate docs are welcome, I've been searching for this and couldn't find any info. Best regards, Andre ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.