The standard chisq.test() and fisher.test() functions, when applied to two distributions (to determine whether the same underlying distribution applies to both) requires one to pre-bin the distributions. Is there a library function (either built-in or in a package) that acts more like the ks.test() function, in that one can simply pass the two distributions and have it do the necessary binning as well as the actual statistical test? (Yes, you can accuse me of laziness: I just don't fancy trying to figure out a routine that would make sure that there more than 5 samples in each of the expected bins before applying the chi-squared test. It seems too much like re-inventing an elementary wheel that must have been invented by someone else.)
On 10/12/2007 1:16 PM, D. R. Evans wrote:> The standard chisq.test() and fisher.test() functions, when applied to > two distributions (to determine whether the same underlying > distribution applies to both) requires one to pre-bin the > distributions. > > Is there a library function (either built-in or in a package) that > acts more like the ks.test() function, in that one can simply pass the > two distributions and have it do the necessary binning as well as the > actual statistical test? > > (Yes, you can accuse me of laziness: I just don't fancy trying to > figure out a routine that would make sure that there more than 5 > samples in each of the expected bins before applying the chi-squared > test. It seems too much like re-inventing an elementary wheel that > must have been invented by someone else.)If you have a quantile function q() for the distribution, a sample size of N, and want expected counts of 5 in each bin, just calculate the cutpoints as nbins <- floor(N/5) cutpoints <- c(-Inf, q( (1:(nbins-1)/nbins)), Inf) Duncan Murdoch
On Fri, 2007-10-12 at 11:16 -0600, D. R. Evans wrote:> The standard chisq.test() and fisher.test() functions, when applied to > two distributions (to determine whether the same underlying > distribution applies to both) requires one to pre-bin the > distributions. > > Is there a library function (either built-in or in a package) that > acts more like the ks.test() function, in that one can simply pass the > two distributions and have it do the necessary binning as well as the > actual statistical test? > > (Yes, you can accuse me of laziness: I just don't fancy trying to > figure out a routine that would make sure that there more than 5 > samples in each of the expected bins before applying the chi-squared > test. It seems too much like re-inventing an elementary wheel that > must have been invented by someone else.)You might want to review the following article: Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations Ian Campbell Stat in Med 26:3661-3675; 2007 http://www3.interscience.wiley.com/cgi-bin/abstract/114125487/ABSTRACT Frank Harrell has offered some comments here (bottom of page): http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/DataAnalysisDisc HTH, Marc Schwartz