Jose Claudio Faria
2006-Aug-04 20:09 UTC
[R] Doubt about Student t distribution simulation
Dear R list, I would like to illustrate the origin of the Student t distribution using R. So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t distribution with (sample.size - 1) degree free, what is wrong with the simulation below? I think that the theoretical curve should agree with the relative frequencies of the t values calculated: #== begin options====# parameters mu = 10 sigma = 5 # size of sample n = 3 # repetitions nsim = 10000 # histogram parameter nchist = 150 #== end options====== t = numeric() pop = rnorm(10000, mean = mu, sd = sigma) for (i in 1:nsim) { amo.i = sample(pop, n, replace = TRUE) t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) } win.graph(w = 5, h = 7) split.screen(c(2,1)) screen(1) hist(t, main = "histogram", breaks = nchist, col = "lightgray", xlab = '', ylab = "Fi", font.lab = 2, font = 2) screen(2) hist(t, probability = T, main = 'f.d.p and histogram', breaks = nchist, col = 'lightgray', xlab = 't', ylab = 'f(t)', font.lab = 2, font = 2) x = t curve(dt(x, df = n-1), add = T, col = "red", lwd = 2) Many thanks for any help, ___ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estat?stica Experimental/Prof. Adjunto mails: joseclaudio.faria em terra.com.br jc_faria em uesc.br jc_faria em uol.com.br
Dear Jose, The problem is that you're using the population standard deviation (sigma) rather than the sample SD of each sample [i.e., t[i] = (mean(amo.i) - mu) / (sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as they appear to be. A couple of smaller points: (1) Even after this correction, you're sampling from a discrete population (albeit with replacement) and so the values won't be exactly t-distributed. You could draw the samples directly from N(mu, sigma) instead. (2) It would be preferable to make a quantile-comparison plot against the t-distribution, since you'd get a better picture of what's going on in the tails. I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jose > Claudio Faria > Sent: Friday, August 04, 2006 3:09 PM > To: R-help at stat.math.ethz.ch > Subject: [R] Doubt about Student t distribution simulation > > Dear R list, > > I would like to illustrate the origin of the Student t > distribution using R. > > So, if (sample.mean - pop.mean) / standard.error(sample.mean) > has t distribution with (sample.size - 1) degree free, what > is wrong with the simulation below? I think that the > theoretical curve should agree with the relative frequencies > of the t values calculated: > > #== begin options====> # parameters > mu = 10 > sigma = 5 > > # size of sample > n = 3 > > # repetitions > nsim = 10000 > > # histogram parameter > nchist = 150 > #== end options======> > t = numeric() > pop = rnorm(10000, mean = mu, sd = sigma) > > for (i in 1:nsim) { > amo.i = sample(pop, n, replace = TRUE) > t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) } > > win.graph(w = 5, h = 7) > split.screen(c(2,1)) > screen(1) > hist(t, > main = "histogram", > breaks = nchist, > col = "lightgray", > xlab = '', ylab = "Fi", > font.lab = 2, font = 2) > > screen(2) > hist(t, > probability = T, > main = 'f.d.p and histogram', > breaks = nchist, > col = 'lightgray', > xlab = 't', ylab = 'f(t)', > font.lab = 2, font = 2) > > x = t > curve(dt(x, df = n-1), add = T, col = "red", lwd = 2) > > Many thanks for any help, > ___ > Jose Claudio Faria > Brasil/Bahia/Ilheus/UESC/DCET > Estat?stica Experimental/Prof. Adjunto > mails: joseclaudio.faria at terra.com.br > jc_faria at uesc.br > jc_faria at uol.com.br > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jose Claudio Faria <joseclaudio.faria at terra.com.br> writes:> Dear R list, > > I would like to illustrate the origin of the Student t distribution using R. > > So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t > distribution with (sample.size - 1) degree free, what is wrong with the > simulation below? I think that the theoretical curve should agree with > the relative frequencies of the t values calculated: > > #== begin options====> # parameters > mu = 10 > sigma = 5 > > # size of sample > n = 3 > > # repetitions > nsim = 10000 > > # histogram parameter > nchist = 150 > #== end options======> > t = numeric() > pop = rnorm(10000, mean = mu, sd = sigma) > > for (i in 1:nsim) { > amo.i = sample(pop, n, replace = TRUE) > t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n))At the very least, you need a sample-based standard error: sd(amo.i), not sigma. Also, resampling from "pop" is not really what the t-distribution is based on, but I don't think that matters much.> } > > win.graph(w = 5, h = 7) > split.screen(c(2,1)) > screen(1) > hist(t, > main = "histogram", > breaks = nchist, > col = "lightgray", > xlab = '', ylab = "Fi", > font.lab = 2, font = 2) > > screen(2) > hist(t, > probability = T, > main = 'f.d.p and histogram', > breaks = nchist, > col = 'lightgray', > xlab = 't', ylab = 'f(t)', > font.lab = 2, font = 2) > > x = t > curve(dt(x, df = n-1), add = T, col = "red", lwd = 2) > > Many thanks for any help, > ___ > Jose Claudio Faria > Brasil/Bahia/Ilheus/UESC/DCET > Estat?stica Experimental/Prof. Adjunto > mails: joseclaudio.faria at terra.com.br > jc_faria at uesc.br > jc_faria at uol.com.br > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907