thr3ads.net - R help - [R] Physical or Statistical Explanation for the "Funnel" Plot? [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Jason Rupert

2009-Mar-27 02:44 UTC

[R] Physical or Statistical Explanation for the "Funnel" Plot?

The R code below produces (after running for a few minutes on a decent computer)
the plot shown at the following location:

http://n2.nabble.com/Is-there-a-physical-and-quantitative-explanation-for-this-plot--td2542321.html

I'm just taking the mean of a given set of random variables, where the set
size is increased.  There appears to be a quick convergence and then a pretty
steady variance out to a set size of 10,0000.

I'm just wondering if there is a statistical explanation out there for this
convergence and it has been explored further.  Thanks again.

# First case
N<-100000
X<-rnorm(N)
step_size<-1


# Groups
g<-rep(1:(N/step_size),each=step_size)

# The result
tmp_output<-tapply(X[1:length(g)],g,mean)

length_tmp_output<-length(tmp_output)
tmp_x_vals<-rep(step_size,length_tmp_output)
plot(tmp_x_vals, tmp_output, xlim=c(0,10000))
#points(tmp_x_vals, tmp_output)

for(ii in 1:10000)
{   
	step_size<-ii

	# Groups
	g<-rep(1:(N/step_size),each=step_size)

	# The result
	#tmp_output<-tapply(X,g,mean)
	tmp_output<-tapply(X[1:length(g)],g,mean)

	length_tmp_output<-length(tmp_output)
	tmp_x_vals<-rep(step_size,length_tmp_output)
	points(tmp_x_vals, tmp_output)
}

Mike Miller

2009-Mar-27 04:34 UTC

head link

[R] Physical or Statistical Explanation for the "Funnel" Plot?

On Thu, 26 Mar 2009, Jason Rupert wrote:
> The R code below produces (after running for a few minutes on a decent 
> computer) the plot shown at the following location:
>
>
http://n2.nabble.com/Is-there-a-physical-and-quantitative-explanation-for-this-plot--td2542321.html
>
> I'm just taking the mean of a given set of random variables, where the 
> set size is increased.  There appears to be a quick convergence and then 
> a pretty steady variance out to a set size of 10,0000.

I don't have time to study your code, but it sounds like you are taking 
random normal variables with mean 0 and variance 1, but then taking the 
mean for sets of those.  We know exactly the distribution for the mean of 
the "set" (a.k.a., "sample").  The mean has a normal
distribution with
mean 0 and variance 1/N where N is the size of the sample.  When you allow 
N to vary, you produce a mixture of random normal variables all having 
mean 0 but with different variances.  The plot you show looks correct -- 
the distributions in the mixture that have small variance pile up in the 
middle, while those with greater variance form the long tails.  You could 
get a lot of different shapes depending ont he distribution of N.  But 
save yourself some time.  Instead of making N normal variables and taking 
the mean, just make one and divide it by sqrt(N) -- that will give you 
*exactly* the same result.

Your graph looks a little weird - first, why turn it sideways?  We 
normally plot density on the ordinate, not on the abscissa.  Second, there 
is a thick black bar on the left, but that seems to be an artifact because 
at least half of it is below zero -- how can that happen?

Mike

Thomas Lumley

2009-Mar-27 07:55 UTC

head link

[R] Physical or Statistical Explanation for the "Funnel" Plot?

On Thu, 26 Mar 2009, Jason Rupert wrote:
>
> The R code below produces (after running for a few minutes on a decent
computer) the plot shown at the following location:
>
>
http://n2.nabble.com/Is-there-a-physical-and-quantitative-explanation-for-this-plot--td2542321.html
>
> I'm just taking the mean of a given set of random variables, where the
set size
>is increased.  There appears to be a quick convergence and then a pretty
steady
> variance out to a set size of 10,0000.

Part of the convergence is just that the standard devation of a mean of N
observations is proportional to 1/sqrt(N). In your case the distributions are
all exactly Normal; the same convergence would occur with other distributions,
but you would also see the change in shape from left to right as the
distribution converged to Normal.

There's also some plotting artifacts due to the size of the points.  The
apparent stabilization at large N (and the wide vertical bar at zero that Marc
Schwartz commented on) are due partly to the slow convergence of 1/sqrt(N) but
largely because the width can't be smaller than the width of a point.

When I draw funnel plots like this for whole-genome association data I use the
'hexbin' package, which doesn't have these artifacts and is much
faster and produces smaller graphics files.

     -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Reasonably Related Threads

Search for more possibly parallel threads

R help - Mar 2009 - Physical or Statistical Explanation for the "Funnel" Plot?

[R] Physical or Statistical Explanation for the "Funnel" Plot?

[R] Physical or Statistical Explanation for the "Funnel" Plot?

[R] Physical or Statistical Explanation for the "Funnel" Plot?

Reasonably Related Threads