thr3ads.net - R help - [R] Using R to illustrate the Central Limit Theorem [Apr 2005]

If this information is useful, please help other people find it:
Share via:

Paul Smith

2005-Apr-21 17:06 UTC

[R] Using R to illustrate the Central Limit Theorem

Dear All

I am totally new to R and I would like to know whether R is able and
appropriate to illustrate to my students the Central Limit Theorem,
using for instance 100 independent variables with uniform distribution
and showing that their sum is a variable with an approximated normal
distribution.

Thanks in advance,

Paul

Francisco J. Zagmutt

2005-Apr-21 17:34 UTC

head link

[R] Using R to illustrate the Central Limit Theorem

Hi Paul

This is one of many ways to do it

hist(runif(30))#histogram from 30 random samples from a uniform(0,1)
mm=c(NULL)#creates null vector
for(i in 1:100){mm[i]=mean(runif(30))}#100 sample means of 30 samples each 
from a uniform(0,1)
hist(mm)#the distribution of the sample mean looks normal!

You can even nest this loop within another loop so your student can see 
several histograms showing a "normal" behaviour.

I hope this helps

Francisco

>From: Paul Smith <phhs80 at gmail.com>
>To: r-help at stat.math.ethz.ch
>Subject: [R] Using R to illustrate the Central Limit Theorem
>Date: Thu, 21 Apr 2005 18:06:40 +0100
>
>Dear All
>
>I am totally new to R and I would like to know whether R is able and
>appropriate to illustrate to my students the Central Limit Theorem,
>using for instance 100 independent variables with uniform distribution
>and showing that their sum is a variable with an approximated normal
>distribution.
>
>Thanks in advance,
>
>Paul
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! 
>http://www.R-project.org/posting-guide.html

Vincent Goulet

2005-Apr-21 18:06 UTC

head link

[R] Using R to illustrate the Central Limit Theorem

Hi,

Not exactly what you asked for, but related.

I wrote the following little function to emulate a quincunx (a good 
illustration of the CLT, in my opinion):

quincunx <- function(nb.bins, nb.rows=nb.bins-1, nb.balls=2^nb.bins)
{
    x <- sample(c(0, 1), nb.balls * nb.rows, replace=TRUE)
    dim(x) <- c(nb.rows, nb.balls)
    hist(colSums(x), breaks=0:nb.rows, main="Number of balls per bin")
}

Idea: drop nb.balls in a quincunx with nb.bins bins at the bottom. The bin in 
which a ball ends up is the sum of nb.rows Bernouilli trials (where 0 stands 
for "left" and 1 for "right").

Hope this helps!

Le 21 Avril 2005 13:06, Paul Smith a ??crit??:> Dear All
>
> I am totally new to R and I would like to know whether R is able and
> appropriate to illustrate to my students the Central Limit Theorem,
> using for instance 100 independent variables with uniform distribution
> and showing that their sum is a variable with an approximated normal
> distribution.
>
> Thanks in advance,
>
> Paul
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
-- 
  Vincent Goulet, Professeur agr??g??
  ??cole d'actuariat
  Universit?? Laval, Qu??bec

(Ted Harding)

2005-Apr-21 18:48 UTC

head link

[R] Using R to illustrate the Central Limit Theorem

On 21-Apr-05 Paul Smith wrote:> Dear All
> 
> I am totally new to R and I would like to know whether R is able and
> appropriate to illustrate to my students the Central Limit Theorem,
> using for instance 100 independent variables with uniform distribution
> and showing that their sum is a variable with an approximated normal
> distribution.
> 
> Thanks in advance,
> 
> Paul
Similar to Francisco's suggestion:

  m<-numeric(10000);
  for(k in (1:20)){
    for(i in(1:10000)){m[i]<-(mean(runif(k))-0.5)*sqrt(12*k)}
    hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf("%d",k))
  }

(On my slowish laptop, this ticks over at a satidfactory rate,
about 1 plot per second. If your mahine is much faster, then
simply increase 10000 to a larger number.)

The real problem with demos like this, starting with the
uniform distribution, is that the result is, to the eye,
already approximately normal when k=3, and it's only out
in the tails that the improvement shows for larger values
of k.

This was in fact the way we used to simulate a normal
distribution in the old days: look up 3 numbers in
Kendall & Babington-Smith's "Tables of Random Sampling
Numbers", which are in effect pages full of integers
uniform on 00-99, and take their mean.

It's the one book I ever encountered which contained
absolutely no information -- at least, none that I ever
spotted.

A more dramatic illustration of the CLT effect might be
obtained if, instead of runif(k), you used rbinom(k,1,p)
for p > 0.5, say:

  m<-numeric(10000);
  p<-0.75; for(j in (1:50)){ k<-j*j
    for(i in(1:10000)){m[i]<-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)}
    hist(m,breaks=41,xlim=c(-4,4),main=sprintf("%d",k))
  }

Cheers,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Apr-05                                       Time: 19:48:05
------------------------------ XFMail ------------------------------

Lucke, Joseph F

2005-Apr-21 18:53 UTC

head link

[R] Using R to illustrate the Central Limit Theorem

I just did this last night for a class. It's very simplistic and could
be improve, but it did the job.  First I did the normal. Of course means
of increasing large samples from a normal stay normal.  This setup the
students.  Then I did means from an exponential.  For n=1 you get the
exponential again, and they of course expected the means with larger n's
to also follow the exponential! Got 'em!
Joe

#Central Limit Theorem
oldpar = par(mfrow=c(2,2))
#Normal
n=64    #Do it repeatedly for n=1 4, 16, 64, 100
d=numeric(1000)
for (k in 1:1000){d[k]=mean(rnorm(n,mean=5,sd=16))}
c(mean=mean(d),sd=sd(d))
plot(density(d),col='blue')
curve(dnorm(x,mean=5,sd=16/sqrt(n)),add=T,col='red')


#Exponential
n=64    #1, 4, 16, 64, 100
d=numeric(1000)
for (k in 1:1000){d[k]=mean(rexp(n,rate=1/4))}
c(mean=mean(d),sd=sd(d))
plot(density(d),col='blue')
curve(dnorm(x,mean=4,sd=4/sqrt(n)),add=T,col='red')

par(oldpar)



-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Smith
Sent: Thursday, April 21, 2005 12:07 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Using R to illustrate the Central Limit Theorem


Dear All

I am totally new to R and I would like to know whether R is able and
appropriate to illustrate to my students the Central Limit Theorem,
using for instance 100 independent variables with uniform distribution
and showing that their sum is a variable with an approximated normal
distribution.

Thanks in advance,

Paul

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Bill.Venables@csiro.au

2005-Apr-21 22:49 UTC

head link

[R] Using R to illustrate the Central Limit Theorem

Here's a bit of a refinement on Ted's first suggestion.

 N <- 10000
 graphics.off()
 par(mfrow = c(1,2), pty = "s")
 for(k in 1:20) {
    m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
    pu <- par("usr")[1:2]
    x <- seq(pu[1], pu[2], len = 500)
    lines(x, dnorm(x), col = "red")
    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
    abline(0, 1, col = "red")
    Sys.sleep(1)
  }

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
Ted.Harding at nessie.mcc.ac.uk
Sent: Friday, 22 April 2005 4:48 AM
To: Paul Smith
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] Using R to illustrate the Central Limit Theorem

On 21-Apr-05 Paul Smith wrote:> Dear All
> 
> I am totally new to R and I would like to know whether R is able and
> appropriate to illustrate to my students the Central Limit Theorem,
> using for instance 100 independent variables with uniform distribution
> and showing that their sum is a variable with an approximated normal
> distribution.
> 
> Thanks in advance,
> 
> Paul
Similar to Francisco's suggestion:

  m<-numeric(10000);
  for(k in (1:20)){
    for(i in(1:10000)){m[i]<-(mean(runif(k))-0.5)*sqrt(12*k)}
    hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf("%d",k))
  }

(On my slowish laptop, this ticks over at a satidfactory rate,
about 1 plot per second. If your mahine is much faster, then
simply increase 10000 to a larger number.)

The real problem with demos like this, starting with the
uniform distribution, is that the result is, to the eye,
already approximately normal when k=3, and it's only out
in the tails that the improvement shows for larger values
of k.

This was in fact the way we used to simulate a normal
distribution in the old days: look up 3 numbers in
Kendall & Babington-Smith's "Tables of Random Sampling
Numbers", which are in effect pages full of integers
uniform on 00-99, and take their mean.

It's the one book I ever encountered which contained
absolutely no information -- at least, none that I ever
spotted.

A more dramatic illustration of the CLT effect might be
obtained if, instead of runif(k), you used rbinom(k,1,p)
for p > 0.5, say:

  m<-numeric(10000);
  p<-0.75; for(j in (1:50)){ k<-j*j
    for(i in(1:10000)){m[i]<-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)}
    hist(m,breaks=41,xlim=c(-4,4),main=sprintf("%d",k))
  }

Cheers,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Apr-05                                       Time: 19:48:05
------------------------------ XFMail ------------------------------

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Charles Annis, P.E.

2005-Apr-22 04:23 UTC

head link

[R] Using R to illustrate the Central Limit Theorem

This won't help teach R, but it might illuminate the CLT.  Here are a series
of animated GIFs that begin with different densities, including one that has
a "U" shape, and plots the density of Xbar for n=2,3,4,8,16,32.

http://www.StatisticalEngineering.com/central_limit_theorem.htm

I've also included an explanation of what is happening at each iteration.

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Smith
Sent: Thursday, April 21, 2005 1:07 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Using R to illustrate the Central Limit Theorem

Dear All

I am totally new to R and I would like to know whether R is able and
appropriate to illustrate to my students the Central Limit Theorem,
using for instance 100 independent variables with uniform distribution
and showing that their sum is a variable with an approximated normal
distribution.

Thanks in advance,

Paul

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Kenneth Roy Cabrera Torres

2005-Apr-22 15:16 UTC

head link

[R] Problems with "help.start()"

Hi R users and developers:

I just install the new R version 2.1.0 in a
linux platform.

I get this error when I call the function
> help.start()Making links in per-session dir ...
Error in gsub(pattern, replacement, x, ignore.case, extended, fixed) :
         input string 28 is invalid in this locale

What am I missing? (It works fine with version 2.0.1)

-- 
Kenneth Roy Cabrera Torres
Uninversidad Nacional de Colombia
Sede Medellin
Tel 430 9351
Cel 315 504 9339

Bliese, Paul D LTC USAMH

2005-May-13 09:59 UTC

head link

[R] Using R to illustrate the Central Limit Theorem

Interesting thread. The graphics are great, the only thing that might be
worth doing for teaching purposes would be to illustrate the original
distribution that is being averaged 1000 times.

Below is one option based on Bill Venables code.  Note that to do this I
had to start with a k of 2.

N <- 10000
 for(k in 2:20) {
    graphics.off()
    par(mfrow = c(2,2), pty = "s")
    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 1")
    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 2")
    m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k))
    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
    pu <- par("usr")[1:2]
    x <- seq(pu[1], pu[2], len = 500)
    lines(x, dnorm(x), col = "red")
    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
    abline(0, 1, col = "red")
    Sys.sleep(3)
  }

By the way, I should probably know this but what is the logic of the
"sqrt(12*k)" part of the example?  Obviously as k increases the mean
will approach .5 in a uniform distribution, so runif(k)-.5 will be close
to zero, and sqrt(12*k) increases as k increases.  Why 12, though?

PB

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kevin E. Thorpe
Sent: Friday, May 13, 2005 12:03 AM
To: Bill.Venables at csiro.au
Cc: ted.harding at nessie.mcc.ac.uk; r-help at stat.math.ethz.ch
Subject: Re: [R] Using R to illustrate the Central Limit Theorem

This thread was very timely for me since I will be teaching an
introductory
biostats course in the fall, including a session on CLT. I have
shamelessly
taken Dr. Venables' slick piece of code and wrapped it in a function so
that
I can vary the maximum sample size at will. I then created functions
based
on the first to sample from the exponential and the
chi-squaredistributions.
Lastly, I created a function to sample from a pdf with a parabolic shape
(confirming in the process just how rusty my calculus is :-) ). My code
is
below for all to do with as they please.

Now, at the risk of asking a really obvious question ...

In my final function, I used the inverse probability integral
transformation
to sample from my parabolic distribution. I create a local function
rparab
shown here:

rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}

It seems that in my version of R (2.0.1) on Linux, that calculating the
cube
root of a negative number using ^(1/3) returns NaN. I looked at the help
in
the arithmetic operators and did help.search("cube root"), 
help.search("root")
and help.search("cube") and recognised no alternatives. So I used an 
ifelse() to
deal with the negatives. Have I missed something really elementary?

Thanks

clt.unif <- function(n=20) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Uniform(0,1), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.exp <- function(n=20,theta=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rexp(N*k,1/theta), N, k)) - theta)/sqrt(theta^2/k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("exp(",theta,"), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.chisq <- function(n=20,nu=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rchisq(N*k,nu), N, k)) - nu)/sqrt(2*nu/k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Chi-Square(",nu,"), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.parab <- function(n=20) {
rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rparab(N*k), N, k)))/sqrt(3/(5*k))
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

Bill.Venables at csiro.au wrote:
>Here's a bit of a refinement on Ted's first suggestion.
>
> 
> N <- 10000
> graphics.off()
> par(mfrow = c(1,2), pty = "s")
> for(k in 1:20) {
>    m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
>    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
>            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
>    pu <- par("usr")[1:2]
>    x <- seq(pu[1], pu[2], len = 500)
>    lines(x, dnorm(x), col = "red")
>    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
>    abline(0, 1, col = "red")
>    Sys.sleep(1)
>  }
>  
>

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.946.8081  Fax: 416.971.2462

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Kevin E. Thorpe

2005-May-13 12:28 UTC

head link

[R] Using R to illustrate the Central Limit Theorem

The variance of U[0,1] is 1/12. So the variance of the mean of uniforms 
is 1/12k.
Rather than dividing by 1/12k he multiplied by 12k.

Kevin

Bliese, Paul D LTC USAMH wrote:
>Interesting thread. The graphics are great, the only thing that might be
>worth doing for teaching purposes would be to illustrate the original
>distribution that is being averaged 1000 times.
>
>Below is one option based on Bill Venables code.  Note that to do this I
>had to start with a k of 2.
>
>N <- 10000
> for(k in 2:20) {
>    graphics.off()
>    par(mfrow = c(2,2), pty = "s")
>    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution
1")
>    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution
2")
>    m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k))
>    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
>            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
>    pu <- par("usr")[1:2]
>    x <- seq(pu[1], pu[2], len = 500)
>    lines(x, dnorm(x), col = "red")
>    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
>    abline(0, 1, col = "red")
>    Sys.sleep(3)
>  }
>
>By the way, I should probably know this but what is the logic of the
>"sqrt(12*k)" part of the example?  Obviously as k increases the
mean
>will approach .5 in a uniform distribution, so runif(k)-.5 will be close
>to zero, and sqrt(12*k) increases as k increases.  Why 12, though?
>
>PB
>  
>

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.946.8081  Fax: 416.971.2462

Possibly Parallel Threads

Search for more reasonably related threads

R help - Apr 2005 - Using R to illustrate the Central Limit Theorem

[R] Using R to illustrate the Central Limit Theorem

[R] Using R to illustrate the Central Limit Theorem

[R] Using R to illustrate the Central Limit Theorem

[R] Using R to illustrate the Central Limit Theorem

[R] Using R to illustrate the Central Limit Theorem

[R] Using R to illustrate the Central Limit Theorem

[R] Using R to illustrate the Central Limit Theorem

[R] Problems with "help.start()"

[R] Using R to illustrate the Central Limit Theorem

[R] Using R to illustrate the Central Limit Theorem

Possibly Parallel Threads