thr3ads.net - R help - [R] rootogram for normal distributions [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Hugo Mildenberger

2011-Jan-16 10:45 UTC

[R] rootogram for normal distributions

Using R-2.12.1 and latticeExtra-0.6-14, I would like to understand 
why a rootogram displaying samples from the Poisson distribution looks like I 
expected it, whereas a rootogram using the normal distribution does not:

library(latticeExtra)
rootogram(~rpois(1000, lambda = 50), dfun = function(x) dpois(x, lambda = 50))

rootogram(~rnorm(1000), dfun = function(x) dnorm(x,mean(x),sd(x)))

I probably can't attach figures here. Thus a textual description of what I
get may
suffice: With increasing sample size, the rootogram using random samples 
from the Poisson distribution shows decreasing differences (bars are quickly 
approaching the zero line), whereas the displayed differences for random 
samples of the normal distribution are always large. The differences even
increase
with sample size, i.e,  the hanging bars tend to vanish for very large samples.

Achim Zeileis

2011-Jan-16 14:28 UTC

head link

[R] rootogram for normal distributions

On Sun, 16 Jan 2011, Hugo Mildenberger wrote:
> Using R-2.12.1 and latticeExtra-0.6-14, I would like to understand
> why a rootogram displaying samples from the Poisson distribution looks like
I
> expected it, whereas a rootogram using the normal distribution does not:
>
> library(latticeExtra)
> rootogram(~rpois(1000, lambda = 50), dfun = function(x) dpois(x, lambda =
50))
>
> rootogram(~rnorm(1000), dfun = function(x) dnorm(x,mean(x),sd(x)))
>
> I probably can't attach figures here. Thus a textual description of
what
> I get may suffice: With increasing sample size, the rootogram using 
> random samples from the Poisson distribution shows decreasing 
> differences (bars are quickly approaching the zero line), whereas the 
> displayed differences for random samples of the normal distribution are 
> always large. The differences even increase with sample size, i.e, the 
> hanging bars tend to vanish for very large samples.
The normal distribution is a continuous distribution, i.e., the frequency 
for each observed value will essentially be 1/n and not converge to the 
density function. Hence, you would need to look at histogram or smoothed 
densities. Rootograms, on the other hand, are intended for discrete 
distributions.
Z
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

S Ellison

2011-Jan-17 14:54 UTC

head link

[R] rootogram for normal distributions

I was distracted enough by the possibility of hijacking hist() for this
to give it a go. 

The following code implements a basic hanging rootogram based on a
normal density with hist() breaks used as bins and bin midpoints used as
the hanging location (not exact, I suspect, but perhaops good enough).
Extensions to other distributions are reasonably obvious.

S Ellison


rootonorm <- function(x, breaks="Sturges",
col="lightgrey", gap=0.2,
...) {
	h<-hist(x, breaks=breaks)
	nbins<-length(h$counts)
	mu<-mean(x)
	s<-sd(x)
	normdens<-dnorm(h$mids, mu, s)
	
	plot.range <- range(pretty(h$breaks))
	
	plot(z <- seq(plot.range[1], plot.range[2], length.out=200),
			dens<-dnorm(z, mu,s), type="n", ...)
	
	d.gap <- min(diff(h$breaks)) * gap /2
	
	for(i in 1:nbins) {
		rect(h$breaks[i]+d.gap, normdens[i]-h$density[i],
h$breaks[i+1]-d.gap, normdens[i], col=col)
	
	}
	
	lines(z, dens, lwd=2)
	
	points(h$mids, normdens) 
	
}

set.seed(17*13)
y <- rnorm(500, 10,3)
rootonorm(y)
 
>>> Deepayan Sarkar <deepayan.sarkar at gmail.com> 17/01/2011
05:06:54
>>>On Sun, Jan 16, 2011 at 11:58 AM, Hugo Mildenberger
<Hugo.Mildenberger at web.de> wrote:> Thank you very much for your qualified answers, and also for the
> link to the Tukey paper. I appreciate Tukey's writings very much.
Yes, thanks to Hadley for the nice reference, I hadn't seen it before.
> Looking at the lattice code (below), a possible implementation might
> involve  binning, not so?
>
> I see a problematic part here:
>
>   xx <- sort(unique(x))
>
> Unique certainly works well with Poisson distributed data, but is
> essentially a no-op when confronted with continous floating-point
> numbers.
True, but as Achim said, rootogram() is intended to work with data
arising from discrete distributions, not continuous ones. I see now
that this is not as explicit as it could be in the help page (although
"frequency distribution" gives a hint), which I will try to improve.

I don't think automatic handling of continuous distributions is simple
(because it is not clear how you would specify the reference
distribution). However, a little preliminary work will get you close
with the current implementation:

xnorm <- rnorm(1000)

## 'discretize' by binning and replacing data by bin midpoints
h <- hist(xnorm, plot = FALSE) # add arguments for more control
xdisc <- with(h, rep(mids, counts))

## Option 1: Assume bin probabilities proportional to dnorm()
norm.factor <- sum(dnorm(h$mids, mean(xnorm), sd(xnorm)))

rootogram(~ xdisc,
          dfun = function(x) {
              dnorm(x, mean(xnorm), sd(xnorm)) / norm.factor
          })

## Option 2: Compute probabilities explicitly using pnorm()

## pdisc <- diff(pnorm(h$breaks)) ## or estimated:
pdisc <- diff(pnorm(h$breaks, mean = mean(xnorm), sd = sd(xnorm)))
pdisc <- pdisc / sum(pdisc)

rootogram(~ xdisc,
          dfun = function(x) {
              f <- factor(x, levels = h$mids)
              pdisc[f]
          })

-Deepayan

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Deepayan Sarkar

2011-Jan-18 10:23 UTC

head link

[R] rootogram for normal distributions

On Mon, Jan 17, 2011 at 8:24 PM, S Ellison <S.Ellison at lgc.co.uk>
wrote:> I was distracted enough by the possibility of hijacking hist() for this
> to give it a go.
>
> The following code implements a basic hanging rootogram based on a
> normal density with hist() breaks used as bins and bin midpoints used as
> the hanging location (not exact, I suspect, but perhaops good enough).
> Extensions to other distributions are reasonably obvious.
This implements the "hanging", but misses the "root" part 
--- you are
supposed to plot the square roots of the frequencies. Easy enough to
fix though.

-Deepayan
>
> S Ellison
>
>
> rootonorm <- function(x, breaks="Sturges",
col="lightgrey", gap=0.2,
> ...) {
> ? ? ? ?h<-hist(x, breaks=breaks)
> ? ? ? ?nbins<-length(h$counts)
> ? ? ? ?mu<-mean(x)
> ? ? ? ?s<-sd(x)
> ? ? ? ?normdens<-dnorm(h$mids, mu, s)
>
> ? ? ? ?plot.range <- range(pretty(h$breaks))
>
> ? ? ? ?plot(z <- seq(plot.range[1], plot.range[2], length.out=200),
> ? ? ? ? ? ? ? ? ? ? ? ?dens<-dnorm(z, mu,s), type="n", ...)
>
> ? ? ? ?d.gap <- min(diff(h$breaks)) * gap /2
>
> ? ? ? ?for(i in 1:nbins) {
> ? ? ? ? ? ? ? ?rect(h$breaks[i]+d.gap, normdens[i]-h$density[i],
> h$breaks[i+1]-d.gap, normdens[i], col=col)
>
> ? ? ? ?}
>
> ? ? ? ?lines(z, dens, lwd=2)
>
> ? ? ? ?points(h$mids, normdens)
>
> }
>
> set.seed(17*13)
> y <- rnorm(500, 10,3)
> rootonorm(y)
>
>
>>>> Deepayan Sarkar <deepayan.sarkar at gmail.com> 17/01/2011
05:06:54
>>>>
> On Sun, Jan 16, 2011 at 11:58 AM, Hugo Mildenberger
> <Hugo.Mildenberger at web.de> wrote:
>> Thank you very much for your qualified answers, and also for the
>> link to the Tukey paper. I appreciate Tukey's writings very much.
>
> Yes, thanks to Hadley for the nice reference, I hadn't seen it before.
>
>> Looking at the lattice code (below), a possible implementation might
>> involve ?binning, not so?
>>
>> I see a problematic part here:
>>
>> ? xx <- sort(unique(x))
>>
>> Unique certainly works well with Poisson distributed data, but is
>> essentially a no-op when confronted with continous floating-point
>> numbers.
>
> True, but as Achim said, rootogram() is intended to work with data
> arising from discrete distributions, not continuous ones. I see now
> that this is not as explicit as it could be in the help page (although
> "frequency distribution" gives a hint), which I will try to
improve.
>
> I don't think automatic handling of continuous distributions is simple
> (because it is not clear how you would specify the reference
> distribution). However, a little preliminary work will get you close
> with the current implementation:
>
> xnorm <- rnorm(1000)
>
> ## 'discretize' by binning and replacing data by bin midpoints
> h <- hist(xnorm, plot = FALSE) # add arguments for more control
> xdisc <- with(h, rep(mids, counts))
>
> ## Option 1: Assume bin probabilities proportional to dnorm()
> norm.factor <- sum(dnorm(h$mids, mean(xnorm), sd(xnorm)))
>
> rootogram(~ xdisc,
> ? ? ? ? ?dfun = function(x) {
> ? ? ? ? ? ? ?dnorm(x, mean(xnorm), sd(xnorm)) / norm.factor
> ? ? ? ? ?})
>
> ## Option 2: Compute probabilities explicitly using pnorm()
>
> ## pdisc <- diff(pnorm(h$breaks)) ## or estimated:
> pdisc <- diff(pnorm(h$breaks, mean = mean(xnorm), sd = sd(xnorm)))
> pdisc <- pdisc / sum(pdisc)
>
> rootogram(~ xdisc,
> ? ? ? ? ?dfun = function(x) {
> ? ? ? ? ? ? ?f <- factor(x, levels = h$mids)
> ? ? ? ? ? ? ?pdisc[f]
> ? ? ? ? ?})
>
> -Deepayan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:9}}

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jan 2011 - rootogram for normal distributions

[R] rootogram for normal distributions

[R] rootogram for normal distributions

[R] rootogram for normal distributions

[R] rootogram for normal distributions

Maybe Matching Threads