Michael Friendly
2013-Jul-09 18:29 UTC
[Rd] probable bugs in stats::loglin calculation of pearson chisq
In running the following example of a loglinear model for the Titanic data, I was surprised to see NaN reported for the Pearson chisq > loglin(Titanic, margin=list(1:3, 4)) 2 iterations: deviation 2.273737e-13 $lrt [1] 671.9622 $pearson [1] NaN $df [1] 15 $margin $margin[[1]] [1] "Class" "Sex" "Age" $margin[[2]] [1] "Survived" Tracing it back, this occurs because there are zeros in the fitted/expected frequencies for children among the Crew. > # get fitted (expected) values > fitted <- loglin(Titanic, margin=list(1:3, 4), fit=TRUE)$fit 2 iterations: deviation 2.273737e-13 > fitted[Class="Crew",,Age="Child",] Survived Sex No Yes Male 0 0 Female 0 0 I certainly understand the difference between sampling zeros and structural zeros, and this distinction seems properly implemented in loglin() via the start= argument, but only in the calculation of Pearson chisq, not for LRT. I think this is a code bug, but if there is a reason for the difference, it should be documented in the help for loglin. Another probable bug is that the calculation of of the LRT chisq also takes zero fitted values into account, while the calculation of the Pearson chisq does not, and leads to the NaN result for my example. It occurs in the following portion of the code for loglin: fit <- z$fit attributes(fit) <- attributes(table) observed <- as.vector(table[start > 0]) expected <- as.vector(fit[start > 0]) pearson <- sum((observed - expected)^2/expected) observed <- as.vector(table[table * fit > 0]) expected <- as.vector(fit[table * fit > 0]) lrt <- 2 * sum(observed * log(observed/expected)) I don't understand the reasons for the different calculations of observed & expected for pearson & lrt. FWIW, below is how I calculate these in my mosaics.sas program start chisq(obs, fit); *-- Find Pearson and likelihood ratio chisquares; gf = sum ( (obs - fit)##2 / ( fit + (fit=0) ) ); lr = 2 # sum ( obs # log ( (obs+(obs=0)) / (fit + (fit=0)) ) ); return (gf // lr); finish; -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA