On Mon, 20 Nov 2006, Ido M. Tamir wrote:
> Dear all,
>
> I would like to change the scale of a density plot
> to something that represents counts/bandwidth.
>
> What I am actually plotting is density of genomic
> data along the chromosome, and I need a simple
> representation of e.g. genes/50000 bases.
>
> pos <- c(123423,124313,124790,,....)
> den <- density(pos,bw=50000)
> plot(den, yaxt="n", ylab="features/50kb")
> etc...
>
> Is the following correct?
>
> library("MASS")
> par(mfrow=c(2,2))
> hist(geyser$duration)
> den <- density(geyser$duration)
> plot(den)
> hist(geyser$duration,breaks=20)
> plot(den,yaxt="n", ylab="")
> fm <- den$bw * max(den$y) * den$n
> at <- pretty(c(0,max(den$y)), n=5)
> lab <- pretty(c(0,fm), n=5)
> axis(2,at=at[1:6],lab=lab[1:6])
No. Your second histogram plot has bins of width 0.2, and so the bin
heights need to be divided by 299*0.2 to get frequencies. If you want to
plot what is effectively the expected counts, do something like
> library("MASS")
> hist(geyser$duration,breaks=20)
> den <- density(geyser$duration)
> lines(den$x, 299*0.2*den$y)
which is similar to figure 5.9 in MASS apart from the y scale.
(If you have read MASS you will be aware that this is a very poor example,
not even from a continuous distribution.)
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595