Geertje Van der Heijden
2007-Nov-20 13:00 UTC
[R] Problem with code for bootstrapping chi square test with count data
Hi, I'd like some advice on bootstrapping in R. I have a species x with 20 individuals and a factor containing 0 and 1's (in this case 5 zeros and 15 ones). I want to compare the frequency of the occurrence of 1 with a probability value. This code seems to work to do this in R. attach(test) p <- c(0.5272, (1-0.5272)) sp1_1 <- length(subset(x, x==1)) sp1_0 <- length(subset(x, x==0)) obs1_1 <- c(sp1_1, sp1_0) chisq.test(obs1_1, p=p) However, I'd like to bootstrap these 20 individuals to produce a whole population of samples and I'd like to do a chi-square test for each of the bootstrap sample to create a distribution of the chi-square statistic. I have bootstrapped the 0's and 1's of x 20 times using the following code: resamples <- lapply(1:20, function(i) sample(x, replace=T)) What I can't get to work is how to calculate the observed values for 1's and 0's in each of the bootstrap samples, which I need to do a chi-square test for each sample. The methd I used above doesn't seem to work the results for resamples. Does anyone have an idea on how to get this to work? Or is there another easier way to do this? I hope it is clear what I am trying to do! I have tried looking on the internet and in the R archives, but I can't find what I am looking for. Thanks very much in advance for your help! Geertje ~~~~ Geertje van der Heijden PhD student Tropical Ecology School of Geography University of Leeds Leeds LS2 9JT Tel: (+44)(0)113 3433345 Email: g.m.f.vanderheijden04@leeds.ac.uk [[alternative HTML version deleted]]
Prof Brian Ripley
2007-Nov-20 21:01 UTC
[R] Problem with code for bootstrapping chi square test with count data
On Tue, 20 Nov 2007, Geertje Van der Heijden wrote:> Hi, > > I'd like some advice on bootstrapping in R. > > I have a species x with 20 individuals and a factor containing 0 and 1's > (in this case 5 zeros and 15 ones). I want to compare the frequency of > the occurrence of 1 with a probability value. This code seems to work to > do this in R. > > attach(test) > p <- c(0.5272, (1-0.5272)) > sp1_1 <- length(subset(x, x==1)) > sp1_0 <- length(subset(x, x==0)) > obs1_1 <- c(sp1_1, sp1_0) > chisq.test(obs1_1, p=p) > > However, I'd like to bootstrap these 20 individuals to produce a whole > population of samples and I'd like to do a chi-square test for each of > the bootstrap sample to create a distribution of the chi-square > statistic.And what do you want to do with that distribution? It is 'a distribution', but it is not obviously connected with the test you say you want to do. It seems to me that the real issue here is understanding the applicability of the bootstrap. We can make your example reproducible by x <- c(rep(0,5), rep(1, 15)) N <- sum(x) # 15 p <- c(0.5272, (1-0.5272)) chisq.test(c(N, 20-N), p=p)> I have bootstrapped the 0's and 1's of x 20 times using the following > code: > > resamples <- lapply(1:20, function(i) sample(x, replace=T)) > > What I can't get to work is how to calculate the observed values for 1's > and 0's in each of the bootstrap samples, which I need to do a > chi-square test for each sample. The methd I used above doesn't seem to > work the results for resamples. Does anyone have an idea on how to get > this to work? Or is there another easier way to do this? I hope it is > clear what I am trying to do!You can make this easier by noticing that the number of ones will be binomial(20, 15/20). So e.g. res <- replicate(1000, {N <- rbinom(1, 20, 15/20); chisq.test(c(N, 20-N), p=p)$statistic}) Now what are you going to do with this? It is not simulation under the null hypothesis, which would be res <- replicate(1000, {N <- rbinom(1, 20, 0.5272); chisq.test(c(N, 20-N), p=p)$statistic}) mean(res > 3.983) and shows good agreement with the theoretical approximating distribution. Also, why do you want a chisq test to do this? We can use binom.test(15, 20, 0.5272), which is exact. (The only assumption it makes iid trials, which the bootstrap methods are also making.) -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595