Mohammad Ehsanul Karim
2005-Jun-26 10:00 UTC
[R] chisq.test using amalgamation automatically (possible ?!?)
Dear List, If any of observed and/or expected data has less than 5 frequencies, then chisq.test (Pearson's Chi-squared Test for Count Data from package:stats) gives warning messages. For example, x<-c(10, 14, 10, 11, 11, 7, 8, 4, 1, 4, 4, 2, 1, 1, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) y<-c(9.13112391745095, 13.1626482033341, 12.6623267638188, 11.0130706413029, 9.16415925139016, 7.47441794889028, 6.03743388141852, 4.85350508692505, 3.89248001363859, 3.11803140037476, 2.49617540962629, 1.99774139023269, 1.5985926374167, 1.27909653584089, 1.02341602646530, 0.818828097315106, 0.655132353196336, 0.524159229418155, 0.418022824890164, 0.335528136508225, 0.268448671671046, 0.214779801990545, 0.171840507806838, 0.137485729582785, 0.109999238967747, 0.0880079144684513, 0.070413150156564) Chi.Sq<-sum((c(x[1:7], sum(x[8:9]), sum(x[10:11]), sum(x[12:27]))-c(y[1:7], sum(y[8:9]), sum(y[10:11]), sum(y[12:27])))^2/c(y[1:7], sum(y[8:9]), sum(y[10:11]), sum(y[12:27]))) # using amalgamation pchisq(Chi.Sq, df=9, ncp=0, lower.tail = FALSE, log.p = FALSE) # result being 0.8830207 but chisq.test(x,y) gives the following output with incorrect df: Pearson's Chi-squared test data: x and y X-squared = 216, df = 208, p-value = 0.3373 Warning message: Chi-squared approximation may be incorrect in: chisq.test(x, y) Is there any way that we can use directly chisq.test without having warning message in such cases (that is, using amalgamation conveniently so that we don't have to check each elements if they are less than 5 or not - the whole process being automatic, may be by means of programming)? Any hint, help, support, references will be highly appreciated. Thank you for your time. ---------------------------------- Mohammad Ehsanul Karim Web: http://snipurl.com/ehsan ISRT, University of Dhaka, BD ---------------------------------- ____________________________________________________ Rekindle the Rivalries. Sign up for Fantasy Football
Gabor Grothendieck
2005-Jun-26 16:23 UTC
[R] chisq.test using amalgamation automatically (possible ?!?)
On 6/26/05, Mohammad Ehsanul Karim <wildscop at yahoo.com> wrote:> Dear List, > > > If any of observed and/or expected data has less than > 5 frequencies, then chisq.test (Pearson's Chi-squared > Test for Count Data from package:stats) gives warning > messages. For example, > > x<-c(10, 14, 10, 11, 11, 7, 8, 4, 1, 4, 4, 2, 1, 1, 2, > 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) > y<-c(9.13112391745095, 13.1626482033341, > 12.6623267638188, 11.0130706413029, 9.16415925139016, > 7.47441794889028, 6.03743388141852, 4.85350508692505, > 3.89248001363859, 3.11803140037476, 2.49617540962629, > 1.99774139023269, 1.5985926374167, 1.27909653584089, > 1.02341602646530, 0.818828097315106, > 0.655132353196336, 0.524159229418155, > 0.418022824890164, 0.335528136508225, > 0.268448671671046, 0.214779801990545, > 0.171840507806838, 0.137485729582785, > 0.109999238967747, 0.0880079144684513, > 0.070413150156564) > > Chi.Sq<-sum((c(x[1:7], sum(x[8:9]), sum(x[10:11]), > sum(x[12:27]))-c(y[1:7], sum(y[8:9]), sum(y[10:11]), > sum(y[12:27])))^2/c(y[1:7], sum(y[8:9]), > sum(y[10:11]), sum(y[12:27]))) # using amalgamation > pchisq(Chi.Sq, df=9, ncp=0, lower.tail = FALSE, log.p > = FALSE) # result being 0.8830207 > > but chisq.test(x,y) gives the following output with > incorrect df: > > Pearson's Chi-squared test > > data: x and y > X-squared = 216, df = 208, p-value = 0.3373 > > Warning message: > Chi-squared approximation may be incorrect in: > chisq.test(x, y) > > > > Is there any way that we can use directly chisq.test > without having warning message in such cases (that is, > using amalgamation conveniently so that we don't have > to check each elements if they are less than 5 or not > - the whole process being automatic, may be by means > of programming)? > > > > Any hint, help, support, references will be highly > appreciated. > Thank you for your time. >Check out ?combine.levels in package Hmisc. Also, in the chisq.test call above perhaps you meant this: chisq.test(x,p=y/sum(y))
Prof Brian Ripley
2005-Jun-27 07:32 UTC
[R] chisq.test using amalgamation automatically (possible ?!?)
You have actually used chisq.test to test independence of the cross tabulation of x and y as factors, a table with 1 on the diagonal and 0 elsewhere. I doubt this was your intention, but unfortunately you have not told us your actual intention. Perhaps you intended y to be the expected values, but as they do not have the same sum as x it is not clear what distribution is appropriate. (The standard theory assumes that the total count was used in determining the expected values from supplying probabilities, which is why df=9 would be used with 10 categories.) You can use the expected values _if known in advance_ to amalgamate categories, but in most uses of chisq.test they are not known in advance. In any case, without some knowledge of the context, you cannot decide which categories should be merged: your choices are arbitrary unless the categories are ordered. Suppose they applied to types of fruit? If you know that, then certainly you can program R to do the amalgamation for you. BTW, it is just confusing (at least to your readers) to supply the default values of arguments explicitly. pchisq(Chi.sq, df=9) would suffice. On Sun, 26 Jun 2005, Mohammad Ehsanul Karim wrote:> Dear List, > > > If any of observed and/or expected data has less than > 5 frequencies, then chisq.test (Pearson's Chi-squared > Test for Count Data from package:stats) gives warning > messages. For example, > > x<-c(10, 14, 10, 11, 11, 7, 8, 4, 1, 4, 4, 2, 1, 1, 2, > 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) > y<-c(9.13112391745095, 13.1626482033341, > 12.6623267638188, 11.0130706413029, 9.16415925139016, > 7.47441794889028, 6.03743388141852, 4.85350508692505, > 3.89248001363859, 3.11803140037476, 2.49617540962629, > 1.99774139023269, 1.5985926374167, 1.27909653584089, > 1.02341602646530, 0.818828097315106, > 0.655132353196336, 0.524159229418155, > 0.418022824890164, 0.335528136508225, > 0.268448671671046, 0.214779801990545, > 0.171840507806838, 0.137485729582785, > 0.109999238967747, 0.0880079144684513, > 0.070413150156564) > > Chi.Sq<-sum((c(x[1:7], sum(x[8:9]), sum(x[10:11]), > sum(x[12:27]))-c(y[1:7], sum(y[8:9]), sum(y[10:11]), > sum(y[12:27])))^2/c(y[1:7], sum(y[8:9]), > sum(y[10:11]), sum(y[12:27]))) # using amalgamation > pchisq(Chi.Sq, df=9, ncp=0, lower.tail = FALSE, log.p > = FALSE) # result being 0.8830207 > > but chisq.test(x,y) gives the following output with > incorrect df: > > Pearson's Chi-squared test > > data: x and y > X-squared = 216, df = 208, p-value = 0.3373 > > Warning message: > Chi-squared approximation may be incorrect in: > chisq.test(x, y) > > Is there any way that we can use directly chisq.test > without having warning message in such cases (that is, > using amalgamation conveniently so that we don't have > to check each elements if they are less than 5 or not > - the whole process being automatic, may be by means > of programming)? > > Any hint, help, support, references will be highly > appreciated. > Thank you for your time.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595