A Likert scale may have produced counts of answers per category. According to theory I may expect equality over the categories. A statistical test shall reveal the actual equality in my sample. When applying a chi square test with increasing number of repetitions (simulate.p.value) over a fixed sample, the p-value decreases dramatically (looks as if converge to zero). (1) Why? (2) (If this test is wrong), then which test can check what I want to check, that is: are the two distributions of frequencies (observed and expected) in principle the same? (3) By the way, how to deal with low frequency cells? r <- c(10, 100, 500, 1000, 2000, 5000) v <- c(35, 40, 45, 45, 40, 35) sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=x)$p.value }) Thank you, S?ren -- S?ren Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch
soeren.vogel at eawag.ch wrote:> A Likert scale may have produced counts of answers per category. > According to theory I may expect equality over the categories. A > statistical test shall reveal the actual equality in my sample. > > When applying a chi square test with increasing number of repetitions > (simulate.p.value) over a fixed sample, the p-value decreases > dramatically (looks as if converge to zero). > > (1) Why? > (2) (If this test is wrong), then which test can check what I want to > check, that is: are the two distributions of frequencies (observed and > expected) in principle the same? > (3) By the way, how to deal with low frequency cells? > > r <- c(10, 100, 500, 1000, 2000, 5000) > v <- c(35, 40, 45, 45, 40, 35) > sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), > rescale.p=T, simulate.p.value=T, B=x)$p.value })This is a combination of user error and an infelicity in chisq.test. You are sapply'ing over a list with one element, so essentially you are doing chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T, B=r)$p.value Now B is supposed to be a single integer, so the above cannot be expected to do anything sensible, but you might have hoped for an error message. Instead, it seems that you get the result of r[1] replications divided by r+1:> chisq.test(v, p=c(rep.int(40, 6)), rescale.p=T, simulate.p.value=T,B=r)$p.value [1] 0.636363636 0.069306931 0.013972056 0.006993007 0.003498251 0.001399720> 7/(r+1)[1] 0.636363636 0.069306931 0.013972056 0.006993007 0.003498251 0.001399720 What you really wanted was> sapply(r,function (x) { chisq.test(v, p=c(rep.int(40, 6)),rescale.p=T, simulate.p.value=T, B=x)$p.value }) [1] 0.9090909 0.8118812 0.7964072 0.7672328 0.8025987 0.7932414> Thank you, S?ren > > > --S?ren Vogel, PhD-Student, Eawag, Dept. SIAM > http://www.eawag.ch, http://sozmod.eawag.ch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Mar 11, 2009, at 6:36 AM, soeren.vogel at eawag.ch wrote:> A Likert scale may have produced counts of answers per category. > According to theory I may expect equality over the categories. A > statistical test shall reveal the actual equality in my sample. > > When applying a chi square test with increasing number of > repetitions (simulate.p.value) over a fixed sample, the p-value > decreases dramatically (looks as if converge to zero). > > (1) Why?With low numbers of repetitions the test has low power, i.e, it may give you the wrong answer to the question: are those two vectors from the same distribution? As you increase in number, the simulated value approaches the "truth".> > (2) (If this test is wrong), then which test can check what I want > to check, that is: are the two distributions of frequencies > (observed and expected) in principle the same?"In principle" they are not the same. Do you want a test that tells you they are?> > (3) By the way, how to deal with low frequency cells? > > r <- c(10, 100, 500, 1000, 2000, 5000) > v <- c(35, 40, 45, 45, 40, 35) > sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), > rescale.p=T, simulate.p.value=T, B=x)$p.value }) > > Thank you, S?ren > > > -- > S?ren Vogel, PhD-Student, Eawag, Dept. SIAM > http://www.eawag.ch, http://sozmod.eawag.ch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Thanks to Peter Dalgaard for the correct answer. I misinterpreted what R was returning. On Mar 11, 2009, at 7:32 AM, David Winsemius wrote:> > On Mar 11, 2009, at 6:36 AM, soeren.vogel at eawag.ch wrote: > >> A Likert scale may have produced counts of answers per category. >> According to theory I may expect equality over the categories. A >> statistical test shall reveal the actual equality in my sample. >> >> When applying a chi square test with increasing number of >> repetitions (simulate.p.value) over a fixed sample, the p-value >> decreases dramatically (looks as if converge to zero). >> >> (1) Why? > > With low numbers of repetitions the test has low power, i.e, it may > give you the wrong answer to the question: are those two vectors > from the same distribution? As you increase in number, the simulated > value approaches the "truth". >> >> (2) (If this test is wrong), then which test can check what I want >> to check, that is: are the two distributions of frequencies >> (observed and expected) in principle the same? > > "In principle" they are not the same. Do you want a test that tells > you they are? >> >> (3) By the way, how to deal with low frequency cells? >> >> r <- c(10, 100, 500, 1000, 2000, 5000) >> v <- c(35, 40, 45, 45, 40, 35) >> sapply(list(r), function (x) { chisq.test(v, p=c(rep.int(40, 6)), >> rescale.p=T, simulate.p.value=T, B=x)$p.value }) >> >>David Winsemius, MD Heritage Laboratories West Hartford, CT