Dear All I am totally new to R and I would like to know whether R is able and appropriate to illustrate to my students the Central Limit Theorem, using for instance 100 independent variables with uniform distribution and showing that their sum is a variable with an approximated normal distribution. Thanks in advance, Paul
Francisco J. Zagmutt
2005-Apr-21 17:34 UTC
[R] Using R to illustrate the Central Limit Theorem
Hi Paul This is one of many ways to do it hist(runif(30))#histogram from 30 random samples from a uniform(0,1) mm=c(NULL)#creates null vector for(i in 1:100){mm[i]=mean(runif(30))}#100 sample means of 30 samples each from a uniform(0,1) hist(mm)#the distribution of the sample mean looks normal! You can even nest this loop within another loop so your student can see several histograms showing a "normal" behaviour. I hope this helps Francisco>From: Paul Smith <phhs80 at gmail.com> >To: r-help at stat.math.ethz.ch >Subject: [R] Using R to illustrate the Central Limit Theorem >Date: Thu, 21 Apr 2005 18:06:40 +0100 > >Dear All > >I am totally new to R and I would like to know whether R is able and >appropriate to illustrate to my students the Central Limit Theorem, >using for instance 100 independent variables with uniform distribution >and showing that their sum is a variable with an approximated normal >distribution. > >Thanks in advance, > >Paul > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! >http://www.R-project.org/posting-guide.html
Hi, Not exactly what you asked for, but related. I wrote the following little function to emulate a quincunx (a good illustration of the CLT, in my opinion): quincunx <- function(nb.bins, nb.rows=nb.bins-1, nb.balls=2^nb.bins) { x <- sample(c(0, 1), nb.balls * nb.rows, replace=TRUE) dim(x) <- c(nb.rows, nb.balls) hist(colSums(x), breaks=0:nb.rows, main="Number of balls per bin") } Idea: drop nb.balls in a quincunx with nb.bins bins at the bottom. The bin in which a ball ends up is the sum of nb.rows Bernouilli trials (where 0 stands for "left" and 1 for "right"). Hope this helps! Le 21 Avril 2005 13:06, Paul Smith a ??crit??:> Dear All > > I am totally new to R and I would like to know whether R is able and > appropriate to illustrate to my students the Central Limit Theorem, > using for instance 100 independent variables with uniform distribution > and showing that their sum is a variable with an approximated normal > distribution. > > Thanks in advance, > > Paul > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html-- Vincent Goulet, Professeur agr??g?? ??cole d'actuariat Universit?? Laval, Qu??bec
On 21-Apr-05 Paul Smith wrote:> Dear All > > I am totally new to R and I would like to know whether R is able and > appropriate to illustrate to my students the Central Limit Theorem, > using for instance 100 independent variables with uniform distribution > and showing that their sum is a variable with an approximated normal > distribution. > > Thanks in advance, > > PaulSimilar to Francisco's suggestion: m<-numeric(10000); for(k in (1:20)){ for(i in(1:10000)){m[i]<-(mean(runif(k))-0.5)*sqrt(12*k)} hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf("%d",k)) } (On my slowish laptop, this ticks over at a satidfactory rate, about 1 plot per second. If your mahine is much faster, then simply increase 10000 to a larger number.) The real problem with demos like this, starting with the uniform distribution, is that the result is, to the eye, already approximately normal when k=3, and it's only out in the tails that the improvement shows for larger values of k. This was in fact the way we used to simulate a normal distribution in the old days: look up 3 numbers in Kendall & Babington-Smith's "Tables of Random Sampling Numbers", which are in effect pages full of integers uniform on 00-99, and take their mean. It's the one book I ever encountered which contained absolutely no information -- at least, none that I ever spotted. A more dramatic illustration of the CLT effect might be obtained if, instead of runif(k), you used rbinom(k,1,p) for p > 0.5, say: m<-numeric(10000); p<-0.75; for(j in (1:50)){ k<-j*j for(i in(1:10000)){m[i]<-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)} hist(m,breaks=41,xlim=c(-4,4),main=sprintf("%d",k)) } Cheers, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 21-Apr-05 Time: 19:48:05 ------------------------------ XFMail ------------------------------
I just did this last night for a class. It's very simplistic and could be improve, but it did the job. First I did the normal. Of course means of increasing large samples from a normal stay normal. This setup the students. Then I did means from an exponential. For n=1 you get the exponential again, and they of course expected the means with larger n's to also follow the exponential! Got 'em! Joe #Central Limit Theorem oldpar = par(mfrow=c(2,2)) #Normal n=64 #Do it repeatedly for n=1 4, 16, 64, 100 d=numeric(1000) for (k in 1:1000){d[k]=mean(rnorm(n,mean=5,sd=16))} c(mean=mean(d),sd=sd(d)) plot(density(d),col='blue') curve(dnorm(x,mean=5,sd=16/sqrt(n)),add=T,col='red') #Exponential n=64 #1, 4, 16, 64, 100 d=numeric(1000) for (k in 1:1000){d[k]=mean(rexp(n,rate=1/4))} c(mean=mean(d),sd=sd(d)) plot(density(d),col='blue') curve(dnorm(x,mean=4,sd=4/sqrt(n)),add=T,col='red') par(oldpar) -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Smith Sent: Thursday, April 21, 2005 12:07 PM To: r-help at stat.math.ethz.ch Subject: [R] Using R to illustrate the Central Limit Theorem Dear All I am totally new to R and I would like to know whether R is able and appropriate to illustrate to my students the Central Limit Theorem, using for instance 100 independent variables with uniform distribution and showing that their sum is a variable with an approximated normal distribution. Thanks in advance, Paul ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Bill.Venables@csiro.au
2005-Apr-21 22:49 UTC
[R] Using R to illustrate the Central Limit Theorem
Here's a bit of a refinement on Ted's first suggestion. N <- 10000 graphics.off() par(mfrow = c(1,2), pty = "s") for(k in 1:20) { m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k) hist(m, breaks = "FD", xlim = c(-4,4), main = k, prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2] x <- seq(pu[1], pu[2], len = 500) lines(x, dnorm(x), col = "red") qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red") Sys.sleep(1) } -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ted.Harding at nessie.mcc.ac.uk Sent: Friday, 22 April 2005 4:48 AM To: Paul Smith Cc: r-help at stat.math.ethz.ch Subject: RE: [R] Using R to illustrate the Central Limit Theorem On 21-Apr-05 Paul Smith wrote:> Dear All > > I am totally new to R and I would like to know whether R is able and > appropriate to illustrate to my students the Central Limit Theorem, > using for instance 100 independent variables with uniform distribution > and showing that their sum is a variable with an approximated normal > distribution. > > Thanks in advance, > > PaulSimilar to Francisco's suggestion: m<-numeric(10000); for(k in (1:20)){ for(i in(1:10000)){m[i]<-(mean(runif(k))-0.5)*sqrt(12*k)} hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf("%d",k)) } (On my slowish laptop, this ticks over at a satidfactory rate, about 1 plot per second. If your mahine is much faster, then simply increase 10000 to a larger number.) The real problem with demos like this, starting with the uniform distribution, is that the result is, to the eye, already approximately normal when k=3, and it's only out in the tails that the improvement shows for larger values of k. This was in fact the way we used to simulate a normal distribution in the old days: look up 3 numbers in Kendall & Babington-Smith's "Tables of Random Sampling Numbers", which are in effect pages full of integers uniform on 00-99, and take their mean. It's the one book I ever encountered which contained absolutely no information -- at least, none that I ever spotted. A more dramatic illustration of the CLT effect might be obtained if, instead of runif(k), you used rbinom(k,1,p) for p > 0.5, say: m<-numeric(10000); p<-0.75; for(j in (1:50)){ k<-j*j for(i in(1:10000)){m[i]<-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)} hist(m,breaks=41,xlim=c(-4,4),main=sprintf("%d",k)) } Cheers, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 21-Apr-05 Time: 19:48:05 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Charles Annis, P.E.
2005-Apr-22 04:23 UTC
[R] Using R to illustrate the Central Limit Theorem
This won't help teach R, but it might illuminate the CLT. Here are a series of animated GIFs that begin with different densities, including one that has a "U" shape, and plots the density of Xbar for n=2,3,4,8,16,32. http://www.StatisticalEngineering.com/central_limit_theorem.htm I've also included an explanation of what is happening at each iteration. Charles Annis, P.E. Charles.Annis at StatisticalEngineering.com phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Smith Sent: Thursday, April 21, 2005 1:07 PM To: r-help at stat.math.ethz.ch Subject: [R] Using R to illustrate the Central Limit Theorem Dear All I am totally new to R and I would like to know whether R is able and appropriate to illustrate to my students the Central Limit Theorem, using for instance 100 independent variables with uniform distribution and showing that their sum is a variable with an approximated normal distribution. Thanks in advance, Paul ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi R users and developers: I just install the new R version 2.1.0 in a linux platform. I get this error when I call the function> help.start()Making links in per-session dir ... Error in gsub(pattern, replacement, x, ignore.case, extended, fixed) : input string 28 is invalid in this locale What am I missing? (It works fine with version 2.0.1) -- Kenneth Roy Cabrera Torres Uninversidad Nacional de Colombia Sede Medellin Tel 430 9351 Cel 315 504 9339
Bliese, Paul D LTC USAMH
2005-May-13 09:59 UTC
[R] Using R to illustrate the Central Limit Theorem
Interesting thread. The graphics are great, the only thing that might be worth doing for teaching purposes would be to illustrate the original distribution that is being averaged 1000 times. Below is one option based on Bill Venables code. Note that to do this I had to start with a k of 2. N <- 10000 for(k in 2:20) { graphics.off() par(mfrow = c(2,2), pty = "s") hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 1") hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 2") m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k)) hist(m, breaks = "FD", xlim = c(-4,4), main = k, prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2] x <- seq(pu[1], pu[2], len = 500) lines(x, dnorm(x), col = "red") qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red") Sys.sleep(3) } By the way, I should probably know this but what is the logic of the "sqrt(12*k)" part of the example? Obviously as k increases the mean will approach .5 in a uniform distribution, so runif(k)-.5 will be close to zero, and sqrt(12*k) increases as k increases. Why 12, though? PB -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kevin E. Thorpe Sent: Friday, May 13, 2005 12:03 AM To: Bill.Venables at csiro.au Cc: ted.harding at nessie.mcc.ac.uk; r-help at stat.math.ethz.ch Subject: Re: [R] Using R to illustrate the Central Limit Theorem This thread was very timely for me since I will be teaching an introductory biostats course in the fall, including a session on CLT. I have shamelessly taken Dr. Venables' slick piece of code and wrapped it in a function so that I can vary the maximum sample size at will. I then created functions based on the first to sample from the exponential and the chi-squaredistributions. Lastly, I created a function to sample from a pdf with a parabolic shape (confirming in the process just how rusty my calculus is :-) ). My code is below for all to do with as they please. Now, at the risk of asking a really obvious question ... In my final function, I used the inverse probability integral transformation to sample from my parabolic distribution. I create a local function rparab shown here: rparab <- function(nn) { u <- 2*runif(nn) - 1 ifelse(u<0,-(abs(u)^(1/3)),u^(1/3)) } It seems that in my version of R (2.0.1) on Linux, that calculating the cube root of a negative number using ^(1/3) returns NaN. I looked at the help in the arithmetic operators and did help.search("cube root"), help.search("root") and help.search("cube") and recognised no alternatives. So I used an ifelse() to deal with the negatives. Have I missed something really elementary? Thanks clt.unif <- function(n=20) { N <- 10000 graphics.off() par(mfrow = c(1,2), pty = "s") for(k in 1:n) { m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k) hist(m, breaks = "FD", xlim = c(-4,4), main = paste("Uniform(0,1), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2] x <- seq(pu[1], pu[2], len = 500) lines(x, dnorm(x), col = "red") qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red") Sys.sleep(1) } } clt.exp <- function(n=20,theta=1) { N <- 10000 graphics.off() par(mfrow = c(1,2), pty = "s") for(k in 1:n) { m <- (rowMeans(matrix(rexp(N*k,1/theta), N, k)) - theta)/sqrt(theta^2/k) hist(m, breaks = "FD", xlim = c(-4,4), main = paste("exp(",theta,"), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2] x <- seq(pu[1], pu[2], len = 500) lines(x, dnorm(x), col = "red") qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red") Sys.sleep(1) } } clt.chisq <- function(n=20,nu=1) { N <- 10000 graphics.off() par(mfrow = c(1,2), pty = "s") for(k in 1:n) { m <- (rowMeans(matrix(rchisq(N*k,nu), N, k)) - nu)/sqrt(2*nu/k) hist(m, breaks = "FD", xlim = c(-4,4), main = paste("Chi-Square(",nu,"), n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2] x <- seq(pu[1], pu[2], len = 500) lines(x, dnorm(x), col = "red") qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red") Sys.sleep(1) } } clt.parab <- function(n=20) { rparab <- function(nn) { u <- 2*runif(nn) - 1 ifelse(u<0,-(abs(u)^(1/3)),u^(1/3)) } N <- 10000 graphics.off() par(mfrow = c(1,2), pty = "s") for(k in 1:n) { m <- (rowMeans(matrix(rparab(N*k), N, k)))/sqrt(3/(5*k)) hist(m, breaks = "FD", xlim = c(-4,4), main = paste("n = ",k,sep=""), prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") pu <- par("usr")[1:2] x <- seq(pu[1], pu[2], len = 500) lines(x, dnorm(x), col = "red") qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") abline(0, 1, col = "red") Sys.sleep(1) } } Bill.Venables at csiro.au wrote:>Here's a bit of a refinement on Ted's first suggestion. > > > N <- 10000 > graphics.off() > par(mfrow = c(1,2), pty = "s") > for(k in 1:20) { > m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k) > hist(m, breaks = "FD", xlim = c(-4,4), main = k, > prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") > pu <- par("usr")[1:2] > x <- seq(pu[1], pu[2], len = 500) > lines(x, dnorm(x), col = "red") > qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") > abline(0, 1, col = "red") > Sys.sleep(1) > } > >-- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.946.8081 Fax: 416.971.2462 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
The variance of U[0,1] is 1/12. So the variance of the mean of uniforms is 1/12k. Rather than dividing by 1/12k he multiplied by 12k. Kevin Bliese, Paul D LTC USAMH wrote:>Interesting thread. The graphics are great, the only thing that might be >worth doing for teaching purposes would be to illustrate the original >distribution that is being averaged 1000 times. > >Below is one option based on Bill Venables code. Note that to do this I >had to start with a k of 2. > >N <- 10000 > for(k in 2:20) { > graphics.off() > par(mfrow = c(2,2), pty = "s") > hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 1") > hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 2") > m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k)) > hist(m, breaks = "FD", xlim = c(-4,4), main = k, > prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon") > pu <- par("usr")[1:2] > x <- seq(pu[1], pu[2], len = 500) > lines(x, dnorm(x), col = "red") > qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue") > abline(0, 1, col = "red") > Sys.sleep(3) > } > >By the way, I should probably know this but what is the logic of the >"sqrt(12*k)" part of the example? Obviously as k increases the mean >will approach .5 in a uniform distribution, so runif(k)-.5 will be close >to zero, and sqrt(12*k) increases as k increases. Why 12, though? > >PB > >-- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.946.8081 Fax: 416.971.2462