Hi, I have calculated chi-square goodness of fit test,Sample coming from Poisson distribution. please copy this script in R & run the script The R script is as follows ########################## start ######################################### No_of_Frauds<- c(4,1,6,9,9,10,2,4,8,2,3,0,1,2,3,1,3,4,5,4,4,4,9,5,4,3,11,8,12,3,10,0,7) lambda<- mean(No_of_Frauds) # Chi-Squared Goodness of Fit Test # Ho: The data follow a specified distribution Vs H1: Not Ho # observed frequencies variable.cnts <- table(No_of_Frauds) variable.cnts variable.cnts.prs <- dpois(as.numeric(names(variable.cnts)), lambda) variable.cnts.prs variable.cnts <- c(variable.cnts, 0) variable.cnts variable.cnts.prs <- c(variable.cnts.prs, 1-sum(variable.cnts.prs)) variable.cnts.prs tst <- chisq.test(variable.cnts, p=variable.cnts.prs) Tst ######################### end ######################################## The result of R is as follows Warning message: Chi-squared approximation may be incorrect in: chisq.test(variable.cnts, p = variable.cnts.prs)> tstChi-squared test for given probabilities data: variable.cnts X-squared = 40.5614, df = 13, p-value = 0.0001122 But I have done calculations in Excel. I am getting different answer. Observed = 2,3,3,5,7,2,1,1,2,3,2,1,1,0 Expected=0.251005528,1.224602726,2.987288468,4.85811559,5.925428863,5.78 1782103,4.701348074,3.276697142,1.998288788,1.083247457,0.528493456,0.23 4400679,0.095299266,0.035764993 Estimated Parameter =4.878788 Chi square stat = 0.000113 My excel answer tally with the book which I have refer for excel. Please tell me the correct calculation in R. And how to interprit the results in R. Thanks. Regards. Priti.

"priti desai" <priti.desai at kalyptorisk.com> writes:> Hi, > I have calculated chi-square goodness of fit test,Sample coming from > Poisson distribution. > please copy this script in R & run the script > The R script is as follows > > ########################## start > ######################################### > > No_of_Frauds<- > c(4,1,6,9,9,10,2,4,8,2,3,0,1,2,3,1,3,4,5,4,4,4,9,5,4,3,11,8,12,3,10,0,7) > > > > lambda<- mean(No_of_Frauds) > > > # Chi-Squared Goodness of Fit Test > > # Ho: The data follow a specified distribution Vs H1: Not Ho > > # observed frequencies > > variable.cnts <- table(No_of_Frauds) > variable.cnts > > variable.cnts.prs <- dpois(as.numeric(names(variable.cnts)), lambda) > variable.cnts.prs > > variable.cnts <- c(variable.cnts, 0) > variable.cnts > variable.cnts.prs <- c(variable.cnts.prs, 1-sum(variable.cnts.prs)) > variable.cnts.prs > > tst <- chisq.test(variable.cnts, p=variable.cnts.prs) > Tst > > ######################### end ######################################## > > > The result of R is as follows > > Warning message: > Chi-squared approximation may be incorrect in: chisq.test(variable.cnts, > p = variable.cnts.prs) > > tst > > Chi-squared test for given probabilities > > data: variable.cnts > X-squared = 40.5614, df = 13, p-value = 0.0001122 > > > But I have done calculations in Excel. I am getting different answer. > > Observed = 2,3,3,5,7,2,1,1,2,3,2,1,1,0 > Expected=0.251005528,1.224602726,2.987288468,4.85811559,5.925428863,5.78 > 1782103,4.701348074,3.276697142,1.998288788,1.083247457,0.528493456,0.23 > 4400679,0.095299266,0.035764993 > > > Estimated Parameter =4.878788 > > Chi square stat = 0.000113 > > > My excel answer tally with the book which I have refer for excel. > Please tell me the correct calculation in R. > And how to interprit the results in R.As far as I can see, the "Chi square stat" in Excel is essentially the p-value in R. The slight difference appears to arise from Excel using the point probability rather than the tail ditto in the last cell:> O <- c(2,3,3,5,7,2,1,1,2,3,2,1,1,0) > E <- c(0.251005528,1.224602726,2.987288468,4.85811559,5.925428863,+ 5.781782103,4.701348074,3.276697142,1.998288788,1.083247457,0.528493456, + 0.234400679,0.095299266,0.035764993)> (O-E)^2/E[1] 1.218691e+01 2.573925e+00 5.409021e-05 4.143826e-03 1.948725e-01 [6] 2.473610e+00 2.914053e+00 1.581883e+00 1.465377e-06 3.391598e+00 [11] 4.097178e+00 2.500600e+00 8.588560e+00 3.576499e-02> sum((O-E)^2/E)[1] 40.54315> pchisq(sum((O-E)^2/E), 13,low=F)[1] 0.0001129818> E[1] 0.25100553 1.22460273 2.98728847 4.85811559 5.92542886 5.78178210 [7] 4.70134807 3.27669714 1.99828879 1.08324746 0.52849346 0.23440068 [13] 0.09529927 0.03576499> sum(E)[1] 32.98176 Please don''t assume that something is correct, just because it is Excel output and some book mindlessly copied it... The calculations are both wrong, because they ignore the fact that lambda has been estimated from the data, and also because they deal with very small expected cell counts. For a better test, you likely need to simulate the distribution of the chi-square, or, as I''d be inclined to do, go directly for the pretty obvious overdispersion:> var(X)[1] 11.17235> var(X)/mean(X) # expected is ca. 1 in the Poisson distrib.[1] 2.289984> r <- replicate(100000,{X <- rpois(33, 4.87879); var(X)/mean(X)}) > sum(r > 2.289984)[1] 5 -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /''_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907