rreeve@liposcience.com
2003-Feb-03 18:43 UTC
[Rd] summary.table bug in parameter (and fix) (PR#2526)
I sent this in with an old version, but it's in latest version as well. The fix is simple. In the summary.table function, the parameter is calculated incorrectly for a test of independence among all cells when the table is more than 2-way table. Example: Consider X:> Xa b c 1 A1 B2 C1 2 A3 BA3 C2 3 A2 B1 C4 4 A1 B2 C3 5 A3 BA3 C2 6 A1 BA3 C1 7 A2 BA3 C2 8 A1 BA3 C3 9 A1 B2 C1 10 A1 BA3 C2 11 A2 B1 C2 12 A1 B2 C3 13 A3 BA3 C4 14 A2 B1 C3 15 A2 B1 C2 16 A2 BA3 C4 17 A3 BA3 C3 18 A3 BA3 C4 19 A2 BA3 C4 20 A1 B2 C3 21 A1 BA3 C1 22 A2 B1 C2 23 A2 BA3 C3 24 A2 B1 C2 25 A2 B1 C2 26 A3 B1 C1 27 A2 BA3 C1 28 A2 BA3 C2 29 A2 BA3 C4 30 A3 B1 C2 31 A1 B1 C4 32 A2 B2 C1 33 A1 B2 C3 34 A2 BA3 C3 35 A2 BA3 C1 36 A2 BA3 C1 37 A2 B2 C2 38 A2 B2 C2 39 A2 B2 C4 40 A2 B1 C4 41 A2 B2 C4 42 A1 B1 C3 43 A2 BA3 C4 44 A2 BA3 C2 45 A2 B1 C2 46 A2 B2 C1 47 A3 B1 C3 48 A1 B2 C1 49 A2 B1 C2 50 A1 B1 C1 51 A3 B2 C4 52 A1 B2 C1 53 A3 B2 C1 54 A2 BA3 C1 55 A2 B1 C2 56 A1 BA3 C3 57 A3 B1 C2 58 A1 B1 C3 59 A2 B2 C2 60 A2 BA3 C1 61 A2 B1 C3 62 A3 B2 C2 63 A2 BA3 C2 64 A2 B1 C3 65 A1 BA3 C3 66 A2 B2 C2 67 A2 B1 C4 68 A2 BA3 C2 69 A2 BA3 C2 70 A2 B1 C3 71 A3 BA3 C2 72 A3 BA3 C3 73 A2 B2 C1 74 A2 B1 C3 75 A1 BA3 C1 76 A2 BA3 C3 77 A3 B1 C4 78 A3 BA3 C3 79 A2 B1 C3 80 A2 B1 C3 81 A2 BA3 C4 82 A2 B1 C2 83 A3 BA3 C1 84 A2 BA3 C1 85 A2 BA3 C4 86 A3 BA3 C1 87 A2 BA3 C1 88 A2 B2 C1 89 A3 BA3 C2 90 A2 BA3 C1 91 A1 B1 C4 92 A2 B1 C1 93 A2 B1 C1 94 A2 B1 C3 95 A2 BA3 C1 96 A3 B2 C1 97 A3 BA3 C4 98 A2 B1 C2 99 A2 BA3 C3 100 A2 B1 C2> X.table <- table(X) > X.table, , c = C1 b a B1 B2 B3 A1 3 2 3 A2 6 4 5 A3 3 2 0 , , c = C2 b a B1 B2 B3 A1 0 0 1 A2 7 7 7 A3 3 1 3 , , c = C3 b a B1 B2 B3 A1 3 3 3 A2 6 3 3 A3 1 0 3 , , c = C4 b a B1 B2 B3 A1 0 0 2 A2 2 2 7 A3 4 1 0 Now, we construct a table from X (3x3x4):> X.table <- table(X) > summary(X.table)Number of cases in table: 100 Number of factors: 3 Test for independence of all factors: Chisq = 30.232, df = 12, p-value = 0.002576 Chi-squared approximation may be incorrect The df is incorrect. It is calculated as 2*2*3 = 12, but should be 3*3*4 - 1 - (3-1) - (3-1) - (4-1) = 28 (df of interaction terms not in main effects only model). Now consider equivalent model tested in loglin:> loglin(X.table, list(1,2,3))2 iterations: deviation 3.552714e-15 $lrt [1] 38.32701 $pearson [1] 30.23244 $df [1] 28 $margin $margin[[1]] [1] "a" $margin[[2]] [1] "b" $margin[[3]] [1] "c" The statistic is identical, but df different. The problem is in summary.table: Current version of summary.table (INCORRECT VERSION): function (object, ...) { if (!inherits(object, "table")) stop("object must inherit from class table") n.cases <- sum(object) n.vars <- length(dim(object)) y <- list(n.vars = n.vars, n.cases = n.cases) if (n.vars > 1) { m <- vector("list", length = n.vars) for (k in seq(along = m)) { m[[k]] <- apply(object, k, sum)/n.cases } expected <- apply(do.call("expand.grid", m), 1, prod) * n.cases statistic <- sum((c(object) - expected)^2/expected) parameter <- prod(sapply(m, length) - 1) #### Problem is on this line (works only for 2-way tables) y <- c(y, list(statistic = statistic, parameter = parameter, approx.ok = all(expected >= 5), p.value = pchisq(statistic, parameter, lower.tail = FALSE), call = attr(object, "call"))) } class(y) <- "summary.table" y } summary.table should be (CORRECTED VERSION) function (object, ...) { if (!inherits(object, "table")) stop("object must inherit from class table") n.cases <- sum(object) n.vars <- length(dim(object)) y <- list(n.vars = n.vars, n.cases = n.cases) if (n.vars > 1) { m <- vector("list", length = n.vars) for (k in seq(along = m)) { m[[k]] <- apply(object, k, sum)/n.cases } expected <- apply(do.call("expand.grid", m), 1, prod) * n.cases statistic <- sum((c(object) - expected)^2/expected) parameter <- prod(sapply(m, length)) - (sum(sapply(m, length) - 1) + 1) ### This line changed (works for all multiway tables) y <- c(y, list(statistic = statistic, parameter = parameter, approx.ok = all(expected >= 5), p.value = pchisq(statistic, parameter, lower.tail = FALSE), call = attr(object, "call"))) } class(y) <- "summary.table" y } Using the corrected summary.table (using the correct code above):> summary(X.table)Number of cases in table: 100 Number of factors: 3 Test for independence of all factors: Chisq = 30.232, df = 28, p-value = 0.3522 Chi-squared approximation may be incorrect These are correct. --Russell Reeve rreeve@liposcience.com --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = major = 1 minor = 6.2 year = 2003 month = 01 day = 10 language = R Windows 2000 Professional (build 2195) Service Pack 2.0 Search Path: .GlobalEnv, package:ctest, Autoloads, package:base
>>>>> rreeve writes:> I sent this in with an old version, but it's in latest version as > well. The fix is simple. In the summary.table function, the parameter > is calculated incorrectly for a test of independence among all cells > when the table is more than 2-way table.As I said last week, this is already fixed in r-devel. What is the point of sending another bug report? -k