Using R-2.12.1 and latticeExtra-0.6-14, I would like to understand why a rootogram displaying samples from the Poisson distribution looks like I expected it, whereas a rootogram using the normal distribution does not: library(latticeExtra) rootogram(~rpois(1000, lambda = 50), dfun = function(x) dpois(x, lambda = 50)) rootogram(~rnorm(1000), dfun = function(x) dnorm(x,mean(x),sd(x))) I probably can't attach figures here. Thus a textual description of what I get may suffice: With increasing sample size, the rootogram using random samples from the Poisson distribution shows decreasing differences (bars are quickly approaching the zero line), whereas the displayed differences for random samples of the normal distribution are always large. The differences even increase with sample size, i.e, the hanging bars tend to vanish for very large samples.
On Sun, 16 Jan 2011, Hugo Mildenberger wrote:> Using R-2.12.1 and latticeExtra-0.6-14, I would like to understand > why a rootogram displaying samples from the Poisson distribution looks like I > expected it, whereas a rootogram using the normal distribution does not: > > library(latticeExtra) > rootogram(~rpois(1000, lambda = 50), dfun = function(x) dpois(x, lambda = 50)) > > rootogram(~rnorm(1000), dfun = function(x) dnorm(x,mean(x),sd(x))) > > I probably can't attach figures here. Thus a textual description of what > I get may suffice: With increasing sample size, the rootogram using > random samples from the Poisson distribution shows decreasing > differences (bars are quickly approaching the zero line), whereas the > displayed differences for random samples of the normal distribution are > always large. The differences even increase with sample size, i.e, the > hanging bars tend to vanish for very large samples.The normal distribution is a continuous distribution, i.e., the frequency for each observed value will essentially be 1/n and not converge to the density function. Hence, you would need to look at histogram or smoothed densities. Rootograms, on the other hand, are intended for discrete distributions. Z> ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
I was distracted enough by the possibility of hijacking hist() for this to give it a go. The following code implements a basic hanging rootogram based on a normal density with hist() breaks used as bins and bin midpoints used as the hanging location (not exact, I suspect, but perhaops good enough). Extensions to other distributions are reasonably obvious. S Ellison rootonorm <- function(x, breaks="Sturges", col="lightgrey", gap=0.2, ...) { h<-hist(x, breaks=breaks) nbins<-length(h$counts) mu<-mean(x) s<-sd(x) normdens<-dnorm(h$mids, mu, s) plot.range <- range(pretty(h$breaks)) plot(z <- seq(plot.range[1], plot.range[2], length.out=200), dens<-dnorm(z, mu,s), type="n", ...) d.gap <- min(diff(h$breaks)) * gap /2 for(i in 1:nbins) { rect(h$breaks[i]+d.gap, normdens[i]-h$density[i], h$breaks[i+1]-d.gap, normdens[i], col=col) } lines(z, dens, lwd=2) points(h$mids, normdens) } set.seed(17*13) y <- rnorm(500, 10,3) rootonorm(y)>>> Deepayan Sarkar <deepayan.sarkar at gmail.com> 17/01/2011 05:06:54 >>>On Sun, Jan 16, 2011 at 11:58 AM, Hugo Mildenberger <Hugo.Mildenberger at web.de> wrote:> Thank you very much for your qualified answers, and also for the > link to the Tukey paper. I appreciate Tukey's writings very much.Yes, thanks to Hadley for the nice reference, I hadn't seen it before.> Looking at the lattice code (below), a possible implementation might > involve binning, not so? > > I see a problematic part here: > > xx <- sort(unique(x)) > > Unique certainly works well with Poisson distributed data, but is > essentially a no-op when confronted with continous floating-point > numbers.True, but as Achim said, rootogram() is intended to work with data arising from discrete distributions, not continuous ones. I see now that this is not as explicit as it could be in the help page (although "frequency distribution" gives a hint), which I will try to improve. I don't think automatic handling of continuous distributions is simple (because it is not clear how you would specify the reference distribution). However, a little preliminary work will get you close with the current implementation: xnorm <- rnorm(1000) ## 'discretize' by binning and replacing data by bin midpoints h <- hist(xnorm, plot = FALSE) # add arguments for more control xdisc <- with(h, rep(mids, counts)) ## Option 1: Assume bin probabilities proportional to dnorm() norm.factor <- sum(dnorm(h$mids, mean(xnorm), sd(xnorm))) rootogram(~ xdisc, dfun = function(x) { dnorm(x, mean(xnorm), sd(xnorm)) / norm.factor }) ## Option 2: Compute probabilities explicitly using pnorm() ## pdisc <- diff(pnorm(h$breaks)) ## or estimated: pdisc <- diff(pnorm(h$breaks, mean = mean(xnorm), sd = sd(xnorm))) pdisc <- pdisc / sum(pdisc) rootogram(~ xdisc, dfun = function(x) { f <- factor(x, levels = h$mids) pdisc[f] }) -Deepayan ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
On Mon, Jan 17, 2011 at 8:24 PM, S Ellison <S.Ellison at lgc.co.uk> wrote:> I was distracted enough by the possibility of hijacking hist() for this > to give it a go. > > The following code implements a basic hanging rootogram based on a > normal density with hist() breaks used as bins and bin midpoints used as > the hanging location (not exact, I suspect, but perhaops good enough). > Extensions to other distributions are reasonably obvious.This implements the "hanging", but misses the "root" part --- you are supposed to plot the square roots of the frequencies. Easy enough to fix though. -Deepayan> > S Ellison > > > rootonorm <- function(x, breaks="Sturges", col="lightgrey", gap=0.2, > ...) { > ? ? ? ?h<-hist(x, breaks=breaks) > ? ? ? ?nbins<-length(h$counts) > ? ? ? ?mu<-mean(x) > ? ? ? ?s<-sd(x) > ? ? ? ?normdens<-dnorm(h$mids, mu, s) > > ? ? ? ?plot.range <- range(pretty(h$breaks)) > > ? ? ? ?plot(z <- seq(plot.range[1], plot.range[2], length.out=200), > ? ? ? ? ? ? ? ? ? ? ? ?dens<-dnorm(z, mu,s), type="n", ...) > > ? ? ? ?d.gap <- min(diff(h$breaks)) * gap /2 > > ? ? ? ?for(i in 1:nbins) { > ? ? ? ? ? ? ? ?rect(h$breaks[i]+d.gap, normdens[i]-h$density[i], > h$breaks[i+1]-d.gap, normdens[i], col=col) > > ? ? ? ?} > > ? ? ? ?lines(z, dens, lwd=2) > > ? ? ? ?points(h$mids, normdens) > > } > > set.seed(17*13) > y <- rnorm(500, 10,3) > rootonorm(y) > > >>>> Deepayan Sarkar <deepayan.sarkar at gmail.com> 17/01/2011 05:06:54 >>>> > On Sun, Jan 16, 2011 at 11:58 AM, Hugo Mildenberger > <Hugo.Mildenberger at web.de> wrote: >> Thank you very much for your qualified answers, and also for the >> link to the Tukey paper. I appreciate Tukey's writings very much. > > Yes, thanks to Hadley for the nice reference, I hadn't seen it before. > >> Looking at the lattice code (below), a possible implementation might >> involve ?binning, not so? >> >> I see a problematic part here: >> >> ? xx <- sort(unique(x)) >> >> Unique certainly works well with Poisson distributed data, but is >> essentially a no-op when confronted with continous floating-point >> numbers. > > True, but as Achim said, rootogram() is intended to work with data > arising from discrete distributions, not continuous ones. I see now > that this is not as explicit as it could be in the help page (although > "frequency distribution" gives a hint), which I will try to improve. > > I don't think automatic handling of continuous distributions is simple > (because it is not clear how you would specify the reference > distribution). However, a little preliminary work will get you close > with the current implementation: > > xnorm <- rnorm(1000) > > ## 'discretize' by binning and replacing data by bin midpoints > h <- hist(xnorm, plot = FALSE) # add arguments for more control > xdisc <- with(h, rep(mids, counts)) > > ## Option 1: Assume bin probabilities proportional to dnorm() > norm.factor <- sum(dnorm(h$mids, mean(xnorm), sd(xnorm))) > > rootogram(~ xdisc, > ? ? ? ? ?dfun = function(x) { > ? ? ? ? ? ? ?dnorm(x, mean(xnorm), sd(xnorm)) / norm.factor > ? ? ? ? ?}) > > ## Option 2: Compute probabilities explicitly using pnorm() > > ## pdisc <- diff(pnorm(h$breaks)) ## or estimated: > pdisc <- diff(pnorm(h$breaks, mean = mean(xnorm), sd = sd(xnorm))) > pdisc <- pdisc / sum(pdisc) > > rootogram(~ xdisc, > ? ? ? ? ?dfun = function(x) { > ? ? ? ? ? ? ?f <- factor(x, levels = h$mids) > ? ? ? ? ? ? ?pdisc[f] > ? ? ? ? ?}) > > -Deepayan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ******************************************************************* > This email and any attachments are confidential. Any u...{{dropped:9}}