dear helplist, my student has fifty trees, numbered one to fifty, and a vector recording which tree a certain possum slept in on 12 nights. R> c [1] 3 14 17 22 26 26 17 40 43 25 46 46 R> Thus it slept in tree #3 on Monday, then tree #14 on Tues, and so on. I wish to test the null hypothesis that the animal chooses trees randomly; try R> table(c) c 3 14 17 22 25 26 40 43 46 1 1 2 1 1 2 1 1 2>Thus it slept in tree #3 once, tree #14 once, tree #17 twice, etc. Now on the null hypothesis the expected number of sleeps per tree is 12/50=0.24; so how do I carry out a chisquare test on the data, including the trees that it never slept in? chisq.test() doesn't "know" that there are actually fifty distinct trees (most of which were never slept in) and not nine. > chisq.test(table(c)) Chi-squared test for given probabilities data: table(c) X-squared = 1.5, df = 8, p-value = 0.9927 of course this isn't right because chisquared is > 25.8 due to the animal sleeping in tree #17 and tree #46 twice (and of course, df should be 49 because I have 50 trees). help anyone? -- Robin Hankin, Lecturer, School of Geographical and Environmental Science Private Bag 92019 Auckland New Zealand r.hankin at auckland.ac.nz tel 0064-9-373-7599 x6820; FAX 0064-9-373-7042 as of: Wed Jun 12 13:51:00 NZST 2002 This (linux) system up continuously for: 286 days, 20 hours, 33 minutes -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 12 Jun 2002, Robin Hankin wrote:> > dear helplist, > > my student has fifty trees, numbered one to fifty, and a vector > recording which tree a certain possum slept in on 12 nights. > > R> c > [1] 3 14 17 22 26 26 17 40 43 25 46 46 > R> > > Thus it slept in tree #3 on Monday, then tree #14 on Tues, and so on. > I wish to test the null hypothesis that the animal chooses trees > randomly; try > > R> table(c) > c > 3 14 17 22 25 26 40 43 46 > 1 1 2 1 1 2 1 1 2 > > > > Thus it slept in tree #3 once, tree #14 once, tree #17 twice, etc.Try tabulate(c), which goes to 46. Or, better, tab <- rep(0,50) names(tab) <- 1:50 tab[names(table(c))] <- table(c)> Now on the null hypothesis the expected number of sleeps per tree is > 12/50=0.24; so how do I carry out a chisquare test on the data, > including the trees that it never slept in? > > chisq.test() doesn't "know" that there are actually fifty distinct > trees (most of which were never slept in) and not nine. > > > chisq.test(table(c)) > > Chi-squared test for given probabilities > > data: table(c) > X-squared = 1.5, df = 8, p-value = 0.9927 > > of course this isn't right because chisquared is > 25.8 due to the > animal sleeping in tree #17 and tree #46 twice (and of course, df > should be 49 because I have 50 trees).> chisq.test(tab)Chi-squared test for given probabilities data: tab X-squared = 63, df = 49, p-value = 0.08625 Warning message: Chi-squared approximation may be incorrect in: chisq.test(tab) The warning is serious: the approximation is probably dreadful for data this sparse. In any case, is the null hypothesis plausible: the animal independently and uniform chooses a tree each night to sleep in, from exactly the 50 trees your student labelled? You could easily get a more accurate significance by simulation: doone <- function(...) { c <- sample(1:50, 12, replace = T) tab <- rep(0,50) names(tab) <- 1:50 tab[names(table(c))] <- table(c) chisq.test(tab)$statistic }> table(round(sapply(1:1000, doone),3))38 46.333 54.667 63 71.333 79.667 88 96.333 232 394 229 97 34 6 7 1 (Note how discrete the distribution is.) -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Robin Hankin wrote:> >... I've been advised that the correct distribution for the number of > sleeps in the 50 trees would be multinomial with n=50 and the > p_i=12/50 for i=1 to 50. I'll check out Feller when I get home > tonight, but I suspect that M-B is the limiting case? >I grabbed my thesis and refreshed my memory, and I think it will be what you want. Here is a table for the discrete case with 50 cells and 12 trials. N choices probability cumulative p 1 2.048e-19 2.048e-19 2 2.05421e-14 2.05423e-14 3 4.16787e-11 4.16992e-11 4 1.3844e-08 1.38857e-08 5 1.43652e-06 1.45041e-06 6 6.20311e-05 6.34815e-05 7 0.00129369 0.00135717 8 0.0141003 0.0154574 9 0.0829514 0.0984088 10 0.260324 0.358733 11 0.403082 0.761815 12 0.238185 1 Expected value = 10.76 What this is telling you is that if your possum is going back to the same tree (which the ringtail at my place does) he will have to restrict his choices to 8 or less to avoid a Type I error. Of course, your design probably includes more than 1 possum, and you would have to sort out the alpha for your number of subjects (and repeats?). I've appended the source code for generating tables such as this. Jim -------------- next part -------------- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <math.h> double Factorial(double n) { if(n > 1) return(n * Factorial(n-1)); return(1); } double NChooseR(double n,double r) { double nfact; double rfact; double nminusrfact; nfact = Factorial(n); rfact = Factorial(r); nminusrfact = Factorial(n-r); return(nfact/(rfact * nminusrfact)); } double PFilled(double n,double r,int f) { double psum=0; double combfact; int v; combfact = NChooseR(n,n-f); for(v=0;v<=f;v++) psum += pow(-1,v) * NChooseR(f,v) * pow((f-v)/n,r); return(combfact*psum); } double MBdist(double ncells,double nballs) { double psum; double ptotal = 0; double ev = 0; int filled = 1; while(filled <= nballs) { psum = PFilled(ncells,nballs,filled); ptotal += psum; fprintf(stdout," %2d %-16g%g\n",filled,psum,ptotal); ev += psum * filled++; } if(ptotal < 0.99 || ptotal > 1.01) fprintf(stdout,"Warning: total p is %f\n",ptotal); return(ev); } int main(int argc,char *argv[]) { if(argc == 3) { fprintf(stdout,"N choices probability cumulative p\n"); printf("\nExpected value = %.2f\n",MBdist(atof(argv[1]),atof(argv[2]))); return(0); } return(1); }
Robin Hankin <r.hankin at auckland.ac.nz> has a vector of trees visited: R> c <- c(3,14,17,22,26,26,17,40,43,25,46,46) and wants a count of visits for trees 1:50. table(c) is inadequate because it omits the zero cases. Brian D. Ripley <ripley at stats.ox.ac.uk> suggested:> Try tabulate(c), which goes to 46. Or, better, > tab <- rep(0,50) > names(tab) <- 1:50 > tab[names(table(c))] <- table(c)But tabulate has a second argument (nbins) which does what you want in 1 line: R> tabulate(c, 50) -- -- David Brahm (brahm at alum.mit.edu) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 13 Jun 2002, David Brahm wrote:> Robin Hankin <r.hankin at auckland.ac.nz> has a vector of trees visited: > R> c <- c(3,14,17,22,26,26,17,40,43,25,46,46) > and wants a count of visits for trees 1:50. table(c) is inadequate because it > omits the zero cases. > > Brian D. Ripley <ripley at stats.ox.ac.uk> suggested: > > Try tabulate(c), which goes to 46. Or, better, > > tab <- rep(0,50) > > names(tab) <- 1:50 > > tab[names(table(c))] <- table(c) > > But tabulate has a second argument (nbins) which does what you want in 1 line: > R> tabulate(c, 50)Yes if you want 1:50. Not for other label sets, which is why I did it the more general way. I did mention tabulate: should have givne more details, I guess. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._