Jose Claudio Faria
2006-Aug-04  20:09 UTC
[R] Doubt about Student t distribution simulation
Dear R list,
I would like to illustrate the origin of the Student t distribution using R.
So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t 
distribution with (sample.size - 1) degree free, what is wrong with the 
simulation below? I think that the theoretical curve should agree with 
the relative frequencies of the t values calculated:
#== begin options====# parameters
   mu    = 10
   sigma =  5
# size of sample
   n = 3
# repetitions
   nsim = 10000
# histogram parameter
   nchist = 150
#== end options======
t   = numeric()
pop = rnorm(10000, mean = mu, sd = sigma)
for (i in 1:nsim) {
   amo.i = sample(pop, n, replace = TRUE)
   t[i]  = (mean(amo.i) - mu) / (sigma / sqrt(n))
}
win.graph(w = 5, h = 7)
split.screen(c(2,1))
screen(1)
hist(t,
      main     = "histogram",
      breaks   = nchist,
      col      = "lightgray",
      xlab     = '', ylab = "Fi",
      font.lab = 2, font = 2)
screen(2)
hist(t,
      probability = T,
      main        = 'f.d.p and histogram',
      breaks      = nchist,
      col         = 'lightgray',
      xlab        = 't', ylab = 'f(t)',
      font.lab    = 2, font = 2)
x = t
curve(dt(x, df = n-1), add = T, col = "red", lwd = 2)
Many thanks for any help,
___
Jose Claudio Faria
Brasil/Bahia/Ilheus/UESC/DCET
Estat?stica Experimental/Prof. Adjunto
mails: joseclaudio.faria em terra.com.br
        jc_faria em uesc.br
        jc_faria em uol.com.br
Dear Jose, The problem is that you're using the population standard deviation (sigma) rather than the sample SD of each sample [i.e., t[i] = (mean(amo.i) - mu) / (sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as they appear to be. A couple of smaller points: (1) Even after this correction, you're sampling from a discrete population (albeit with replacement) and so the values won't be exactly t-distributed. You could draw the samples directly from N(mu, sigma) instead. (2) It would be preferable to make a quantile-comparison plot against the t-distribution, since you'd get a better picture of what's going on in the tails. I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jose > Claudio Faria > Sent: Friday, August 04, 2006 3:09 PM > To: R-help at stat.math.ethz.ch > Subject: [R] Doubt about Student t distribution simulation > > Dear R list, > > I would like to illustrate the origin of the Student t > distribution using R. > > So, if (sample.mean - pop.mean) / standard.error(sample.mean) > has t distribution with (sample.size - 1) degree free, what > is wrong with the simulation below? I think that the > theoretical curve should agree with the relative frequencies > of the t values calculated: > > #== begin options====> # parameters > mu = 10 > sigma = 5 > > # size of sample > n = 3 > > # repetitions > nsim = 10000 > > # histogram parameter > nchist = 150 > #== end options======> > t = numeric() > pop = rnorm(10000, mean = mu, sd = sigma) > > for (i in 1:nsim) { > amo.i = sample(pop, n, replace = TRUE) > t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) } > > win.graph(w = 5, h = 7) > split.screen(c(2,1)) > screen(1) > hist(t, > main = "histogram", > breaks = nchist, > col = "lightgray", > xlab = '', ylab = "Fi", > font.lab = 2, font = 2) > > screen(2) > hist(t, > probability = T, > main = 'f.d.p and histogram', > breaks = nchist, > col = 'lightgray', > xlab = 't', ylab = 'f(t)', > font.lab = 2, font = 2) > > x = t > curve(dt(x, df = n-1), add = T, col = "red", lwd = 2) > > Many thanks for any help, > ___ > Jose Claudio Faria > Brasil/Bahia/Ilheus/UESC/DCET > Estat?stica Experimental/Prof. Adjunto > mails: joseclaudio.faria at terra.com.br > jc_faria at uesc.br > jc_faria at uol.com.br > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jose Claudio Faria <joseclaudio.faria at terra.com.br> writes:> Dear R list, > > I would like to illustrate the origin of the Student t distribution using R. > > So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t > distribution with (sample.size - 1) degree free, what is wrong with the > simulation below? I think that the theoretical curve should agree with > the relative frequencies of the t values calculated: > > #== begin options====> # parameters > mu = 10 > sigma = 5 > > # size of sample > n = 3 > > # repetitions > nsim = 10000 > > # histogram parameter > nchist = 150 > #== end options======> > t = numeric() > pop = rnorm(10000, mean = mu, sd = sigma) > > for (i in 1:nsim) { > amo.i = sample(pop, n, replace = TRUE) > t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n))At the very least, you need a sample-based standard error: sd(amo.i), not sigma. Also, resampling from "pop" is not really what the t-distribution is based on, but I don't think that matters much.> } > > win.graph(w = 5, h = 7) > split.screen(c(2,1)) > screen(1) > hist(t, > main = "histogram", > breaks = nchist, > col = "lightgray", > xlab = '', ylab = "Fi", > font.lab = 2, font = 2) > > screen(2) > hist(t, > probability = T, > main = 'f.d.p and histogram', > breaks = nchist, > col = 'lightgray', > xlab = 't', ylab = 'f(t)', > font.lab = 2, font = 2) > > x = t > curve(dt(x, df = n-1), add = T, col = "red", lwd = 2) > > Many thanks for any help, > ___ > Jose Claudio Faria > Brasil/Bahia/Ilheus/UESC/DCET > Estat?stica Experimental/Prof. Adjunto > mails: joseclaudio.faria at terra.com.br > jc_faria at uesc.br > jc_faria at uol.com.br > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907