There were some questions about hist() a couple of days ago which triggered this post. My question/suggestion is about the y-axis in hist. There are reasons to prefer making the y-axis density=relative frequency/bin width. One reason is that the height of the plot does not depend on the bin width; another is that if your histogram is in density then you can easily superimpose a smooth theoretical pdf on top--they will be on the same scale. (BTW the best intro stats book--Freedman et al--only shows students how to make density histograms) It doesn't seem easy to make density on the y-axis with the current hist(). The freq argument only lets you choose counts (TRUE) or relative frequency (FALSE). I would like to suggest that freq (or some renamed version of the argument) take on the values counts freqs densities So that you can easily do a density histogram. The strange thing to me is that if you do h<-hist(...) you get h$intensities, which is the densities. So hist calculates the densities, but doesn't let you plot them? May I suggest you call it $densities? I don't know why you call it intensities--I thought in stats intensity meant the instantaneous rate of a point process. Also I find the help quite obscure: intensities values f^(x[i]), as estimated density values. If all(diff(breaks) == 1), they are the relative frequencies counts/n and in general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] breaks[i]. May I suggest: densities estimated densities calculated by relative frequency/bin width, where relative frequency is count/n (I can't figure out why you've got powering ^ in there!) Bill Simpson -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Bill Simpson <wsi at gcal.ac.uk> writes:> There were some questions about hist() a couple of days ago which > triggered this post. My question/suggestion is about the y-axis in hist. > There are reasons to prefer making the y-axis density=relative > frequency/bin width. One reason is that the height of the plot does not > depend on the bin width; another is that if your histogram is in density > then you can easily superimpose a smooth theoretical pdf on top--they will > be on the same scale. (BTW the best intro stats book--Freedman et al--only > shows students how to make density histograms) > > It doesn't seem easy to make density on the y-axis with the current > hist(). The freq argument only lets you choose counts (TRUE) or relative > frequency (FALSE). I would like to suggest that freq (or some renamed > version of the argument) take on the values > counts > freqs > densities > So that you can easily do a density histogram.Look closer: freq=FALSE *does* give densitities. Try for instance x<-rnorm(500,sd=100) hist(x,freq=F) curve(dnorm(x,sd=100),add=T) freq=T gives absolute frequencies i.e. counts> The strange thing to me is that if you do > h<-hist(...) > you get h$intensities, which is the densities. So hist calculates the > densities, but doesn't let you plot them?Well, it does...> May I suggest you call it $densities? I don't know why you call it > intensities--I thought in stats intensity meant the instantaneous rate of > a point process.I tend to agree here. Maybe singular $density is better, though.> Also I find the help quite obscure: > > intensities > values f^(x[i]), as estimated density values. If > all(diff(breaks) == 1), they are the relative frequencies counts/n and in > general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] > breaks[i]. > > May I suggest: > > densities estimated densities calculated by relative frequency/bin > width, where relative frequency is count/n > (I can't figure out why you've got powering ^ in there!)I think that should read as "f hat". No particular reason to single out the case of unit bin width, I agree. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 17 Nov 2000, Peter Dalgaard BSA wrote:> Bill Simpson <wsi at gcal.ac.uk> writes:> > It doesn't seem easy to make density on the y-axis with the current > > hist(). The freq argument only lets you choose counts (TRUE) or relative > > frequency (FALSE). I would like to suggest that freq (or some renamed > > version of the argument) take on the values > > counts > > freqs > > densities > > So that you can easily do a density histogram. > > Look closer: freq=FALSE *does* give densitities. Try for instance > > x<-rnorm(500,sd=100) > hist(x,freq=F) > curve(dnorm(x,sd=100),add=T) > > freq=T gives absolute frequencies i.e. countsOK then, the help page and the plot are wrong. The help page says: freq logical; if TRUE, the histogram graphic is to present a representation of frequencies, i.e, the counts component of the result; if FALSE, relative frequencies (``probabilities''), the rel.freqs, are plotted. Defaults to TRUE iff breaks are equidistant. It says that freq=FALSE produces a plot with relative frequency on the y-axis. In general rel freq != density. The two will be equal only if the bin width is 1. So my suggestion still stands. Right now it seems you can have counts or densities on the y-axis (mis-labelled on the plot and mis-labelled in the help as "relative frequency"). Maybe some people want relative frequency, so allow them to do it too. rel freq = count/n density = (count/n)/bin width So at least the help page should be re-written and the axis label for the hist when freq=F should be changed to "Density". New help: freq logical; if TRUE, the histogram graphic is to present a representation of frequencies, i.e, the counts component of the result; if FALSE, densities, the rel.freqs/bin width, are plotted. Defaults to TRUE iff breaks are equidistant. There must be lots of people who have been confused by histogram that was supposedly rel freq but was really density. Bill -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._