Dear All I am totally new to R and I would like to know whether R is able and appropriate to illustrate to my students the Central Limit Theorem, using for instance 100 independent variables with uniform distribution and showing that their sum is a variable with an approximated normal distribution. Thanks in advance, Paul
Francisco J. Zagmutt
2005-Apr-21 17:34 UTC
[R] Using R to illustrate the Central Limit Theorem
Hi Paul
This is one of many ways to do it
hist(runif(30))#histogram from 30 random samples from a uniform(0,1)
mm=c(NULL)#creates null vector
for(i in 1:100){mm[i]=mean(runif(30))}#100 sample means of 30 samples each
from a uniform(0,1)
hist(mm)#the distribution of the sample mean looks normal!
You can even nest this loop within another loop so your student can see
several histograms showing a "normal" behaviour.
I hope this helps
Francisco
>From: Paul Smith <phhs80 at gmail.com>
>To: r-help at stat.math.ethz.ch
>Subject: [R] Using R to illustrate the Central Limit Theorem
>Date: Thu, 21 Apr 2005 18:06:40 +0100
>
>Dear All
>
>I am totally new to R and I would like to know whether R is able and
>appropriate to illustrate to my students the Central Limit Theorem,
>using for instance 100 independent variables with uniform distribution
>and showing that their sum is a variable with an approximated normal
>distribution.
>
>Thanks in advance,
>
>Paul
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
>http://www.R-project.org/posting-guide.html
Hi,
Not exactly what you asked for, but related.
I wrote the following little function to emulate a quincunx (a good
illustration of the CLT, in my opinion):
quincunx <- function(nb.bins, nb.rows=nb.bins-1, nb.balls=2^nb.bins)
{
x <- sample(c(0, 1), nb.balls * nb.rows, replace=TRUE)
dim(x) <- c(nb.rows, nb.balls)
hist(colSums(x), breaks=0:nb.rows, main="Number of balls per bin")
}
Idea: drop nb.balls in a quincunx with nb.bins bins at the bottom. The bin in
which a ball ends up is the sum of nb.rows Bernouilli trials (where 0 stands
for "left" and 1 for "right").
Hope this helps!
Le 21 Avril 2005 13:06, Paul Smith a ??crit??:> Dear All
>
> I am totally new to R and I would like to know whether R is able and
> appropriate to illustrate to my students the Central Limit Theorem,
> using for instance 100 independent variables with uniform distribution
> and showing that their sum is a variable with an approximated normal
> distribution.
>
> Thanks in advance,
>
> Paul
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
--
Vincent Goulet, Professeur agr??g??
??cole d'actuariat
Universit?? Laval, Qu??bec
On 21-Apr-05 Paul Smith wrote:> Dear All > > I am totally new to R and I would like to know whether R is able and > appropriate to illustrate to my students the Central Limit Theorem, > using for instance 100 independent variables with uniform distribution > and showing that their sum is a variable with an approximated normal > distribution. > > Thanks in advance, > > PaulSimilar to Francisco's suggestion: m<-numeric(10000); for(k in (1:20)){ for(i in(1:10000)){m[i]<-(mean(runif(k))-0.5)*sqrt(12*k)} hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf("%d",k)) } (On my slowish laptop, this ticks over at a satidfactory rate, about 1 plot per second. If your mahine is much faster, then simply increase 10000 to a larger number.) The real problem with demos like this, starting with the uniform distribution, is that the result is, to the eye, already approximately normal when k=3, and it's only out in the tails that the improvement shows for larger values of k. This was in fact the way we used to simulate a normal distribution in the old days: look up 3 numbers in Kendall & Babington-Smith's "Tables of Random Sampling Numbers", which are in effect pages full of integers uniform on 00-99, and take their mean. It's the one book I ever encountered which contained absolutely no information -- at least, none that I ever spotted. A more dramatic illustration of the CLT effect might be obtained if, instead of runif(k), you used rbinom(k,1,p) for p > 0.5, say: m<-numeric(10000); p<-0.75; for(j in (1:50)){ k<-j*j for(i in(1:10000)){m[i]<-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)} hist(m,breaks=41,xlim=c(-4,4),main=sprintf("%d",k)) } Cheers, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 21-Apr-05 Time: 19:48:05 ------------------------------ XFMail ------------------------------
I just did this last night for a class. It's very simplistic and could
be improve, but it did the job. First I did the normal. Of course means
of increasing large samples from a normal stay normal. This setup the
students. Then I did means from an exponential. For n=1 you get the
exponential again, and they of course expected the means with larger n's
to also follow the exponential! Got 'em!
Joe
#Central Limit Theorem
oldpar = par(mfrow=c(2,2))
#Normal
n=64 #Do it repeatedly for n=1 4, 16, 64, 100
d=numeric(1000)
for (k in 1:1000){d[k]=mean(rnorm(n,mean=5,sd=16))}
c(mean=mean(d),sd=sd(d))
plot(density(d),col='blue')
curve(dnorm(x,mean=5,sd=16/sqrt(n)),add=T,col='red')
#Exponential
n=64 #1, 4, 16, 64, 100
d=numeric(1000)
for (k in 1:1000){d[k]=mean(rexp(n,rate=1/4))}
c(mean=mean(d),sd=sd(d))
plot(density(d),col='blue')
curve(dnorm(x,mean=4,sd=4/sqrt(n)),add=T,col='red')
par(oldpar)
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Smith
Sent: Thursday, April 21, 2005 12:07 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Using R to illustrate the Central Limit Theorem
Dear All
I am totally new to R and I would like to know whether R is able and
appropriate to illustrate to my students the Central Limit Theorem,
using for instance 100 independent variables with uniform distribution
and showing that their sum is a variable with an approximated normal
distribution.
Thanks in advance,
Paul
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
Bill.Venables@csiro.au
2005-Apr-21 22:49 UTC
[R] Using R to illustrate the Central Limit Theorem
Here's a bit of a refinement on Ted's first suggestion.
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:20) {
m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
hist(m, breaks = "FD", xlim = c(-4,4), main = k,
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
Ted.Harding at nessie.mcc.ac.uk
Sent: Friday, 22 April 2005 4:48 AM
To: Paul Smith
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] Using R to illustrate the Central Limit Theorem
On 21-Apr-05 Paul Smith wrote:> Dear All
>
> I am totally new to R and I would like to know whether R is able and
> appropriate to illustrate to my students the Central Limit Theorem,
> using for instance 100 independent variables with uniform distribution
> and showing that their sum is a variable with an approximated normal
> distribution.
>
> Thanks in advance,
>
> Paul
Similar to Francisco's suggestion:
m<-numeric(10000);
for(k in (1:20)){
for(i in(1:10000)){m[i]<-(mean(runif(k))-0.5)*sqrt(12*k)}
hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf("%d",k))
}
(On my slowish laptop, this ticks over at a satidfactory rate,
about 1 plot per second. If your mahine is much faster, then
simply increase 10000 to a larger number.)
The real problem with demos like this, starting with the
uniform distribution, is that the result is, to the eye,
already approximately normal when k=3, and it's only out
in the tails that the improvement shows for larger values
of k.
This was in fact the way we used to simulate a normal
distribution in the old days: look up 3 numbers in
Kendall & Babington-Smith's "Tables of Random Sampling
Numbers", which are in effect pages full of integers
uniform on 00-99, and take their mean.
It's the one book I ever encountered which contained
absolutely no information -- at least, none that I ever
spotted.
A more dramatic illustration of the CLT effect might be
obtained if, instead of runif(k), you used rbinom(k,1,p)
for p > 0.5, say:
m<-numeric(10000);
p<-0.75; for(j in (1:50)){ k<-j*j
for(i in(1:10000)){m[i]<-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)}
hist(m,breaks=41,xlim=c(-4,4),main=sprintf("%d",k))
}
Cheers,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Apr-05 Time: 19:48:05
------------------------------ XFMail ------------------------------
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
Charles Annis, P.E.
2005-Apr-22 04:23 UTC
[R] Using R to illustrate the Central Limit Theorem
This won't help teach R, but it might illuminate the CLT. Here are a series of animated GIFs that begin with different densities, including one that has a "U" shape, and plots the density of Xbar for n=2,3,4,8,16,32. http://www.StatisticalEngineering.com/central_limit_theorem.htm I've also included an explanation of what is happening at each iteration. Charles Annis, P.E. Charles.Annis at StatisticalEngineering.com phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Smith Sent: Thursday, April 21, 2005 1:07 PM To: r-help at stat.math.ethz.ch Subject: [R] Using R to illustrate the Central Limit Theorem Dear All I am totally new to R and I would like to know whether R is able and appropriate to illustrate to my students the Central Limit Theorem, using for instance 100 independent variables with uniform distribution and showing that their sum is a variable with an approximated normal distribution. Thanks in advance, Paul ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi R users and developers: I just install the new R version 2.1.0 in a linux platform. I get this error when I call the function> help.start()Making links in per-session dir ... Error in gsub(pattern, replacement, x, ignore.case, extended, fixed) : input string 28 is invalid in this locale What am I missing? (It works fine with version 2.0.1) -- Kenneth Roy Cabrera Torres Uninversidad Nacional de Colombia Sede Medellin Tel 430 9351 Cel 315 504 9339
Bliese, Paul D LTC USAMH
2005-May-13 09:59 UTC
[R] Using R to illustrate the Central Limit Theorem
Interesting thread. The graphics are great, the only thing that might be
worth doing for teaching purposes would be to illustrate the original
distribution that is being averaged 1000 times.
Below is one option based on Bill Venables code. Note that to do this I
had to start with a k of 2.
N <- 10000
for(k in 2:20) {
graphics.off()
par(mfrow = c(2,2), pty = "s")
hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 1")
hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 2")
m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k))
hist(m, breaks = "FD", xlim = c(-4,4), main = k,
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(3)
}
By the way, I should probably know this but what is the logic of the
"sqrt(12*k)" part of the example? Obviously as k increases the mean
will approach .5 in a uniform distribution, so runif(k)-.5 will be close
to zero, and sqrt(12*k) increases as k increases. Why 12, though?
PB
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kevin E. Thorpe
Sent: Friday, May 13, 2005 12:03 AM
To: Bill.Venables at csiro.au
Cc: ted.harding at nessie.mcc.ac.uk; r-help at stat.math.ethz.ch
Subject: Re: [R] Using R to illustrate the Central Limit Theorem
This thread was very timely for me since I will be teaching an
introductory
biostats course in the fall, including a session on CLT. I have
shamelessly
taken Dr. Venables' slick piece of code and wrapped it in a function so
that
I can vary the maximum sample size at will. I then created functions
based
on the first to sample from the exponential and the
chi-squaredistributions.
Lastly, I created a function to sample from a pdf with a parabolic shape
(confirming in the process just how rusty my calculus is :-) ). My code
is
below for all to do with as they please.
Now, at the risk of asking a really obvious question ...
In my final function, I used the inverse probability integral
transformation
to sample from my parabolic distribution. I create a local function
rparab
shown here:
rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}
It seems that in my version of R (2.0.1) on Linux, that calculating the
cube
root of a negative number using ^(1/3) returns NaN. I looked at the help
in
the arithmetic operators and did help.search("cube root"),
help.search("root")
and help.search("cube") and recognised no alternatives. So I used an
ifelse() to
deal with the negatives. Have I missed something really elementary?
Thanks
clt.unif <- function(n=20) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Uniform(0,1), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}
clt.exp <- function(n=20,theta=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rexp(N*k,1/theta), N, k)) - theta)/sqrt(theta^2/k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("exp(",theta,"), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}
clt.chisq <- function(n=20,nu=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rchisq(N*k,nu), N, k)) - nu)/sqrt(2*nu/k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Chi-Square(",nu,"), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}
clt.parab <- function(n=20) {
rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rparab(N*k), N, k)))/sqrt(3/(5*k))
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}
Bill.Venables at csiro.au wrote:
>Here's a bit of a refinement on Ted's first suggestion.
>
>
> N <- 10000
> graphics.off()
> par(mfrow = c(1,2), pty = "s")
> for(k in 1:20) {
> m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
> hist(m, breaks = "FD", xlim = c(-4,4), main = k,
> prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
> pu <- par("usr")[1:2]
> x <- seq(pu[1], pu[2], len = 500)
> lines(x, dnorm(x), col = "red")
> qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col =
"blue")
> abline(0, 1, col = "red")
> Sys.sleep(1)
> }
>
>
--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: kevin.thorpe at utoronto.ca Tel: 416.946.8081 Fax: 416.971.2462
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
The variance of U[0,1] is 1/12. So the variance of the mean of uniforms is 1/12k. Rather than dividing by 1/12k he multiplied by 12k. Kevin Bliese, Paul D LTC USAMH wrote:>Interesting thread. The graphics are great, the only thing that might be >worth doing for teaching purposes would be to illustrate the original >distribution that is being averaged 1000 times. > >Below is one option based on Bill Venables code. Note that to do this I >had to start with a k of 2. > >N <- 10000 > for(k in 2:20) { > graphics.off() > par(mfrow = c(2,2), pty = "s") > hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 1") > hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 2") > m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k)) > hist(m, breaks = "FD", xlim = c(-4,4), main = k, > prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") > pu <- par("usr")[1:2] > x <- seq(pu[1], pu[2], len = 500) > lines(x, dnorm(x), col = "red") > qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") > abline(0, 1, col = "red") > Sys.sleep(3) > } > >By the way, I should probably know this but what is the logic of the >"sqrt(12*k)" part of the example? Obviously as k increases the mean >will approach .5 in a uniform distribution, so runif(k)-.5 will be close >to zero, and sqrt(12*k) increases as k increases. Why 12, though? > >PB > >-- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.946.8081 Fax: 416.971.2462