This is a question about array and data frame manipulation and
calculation, in the
context of models for log odds in contingency tables.
I have a data frame representing a 3-way frequency table, of size 5
(litter) x 2 (treatment) x 3 (deaths).
"Freq" is the frequency in each cell, and deaths is the response
variable.
Mice <-
structure(list(litter = c(7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 7L, 7L, 8L,
8L, 9L, 9L, 10L, 10L, 11L, 11L), treatment = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label =
c("A",
"B"), class = "factor"), deaths = structure(c(1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("0",
"1",
"2+"), class = "factor"), Freq = c(58L, 75L, 49L, 58L,
33L, 45L,
15L, 39L, 4L, 5L, 11L, 19L, 14L, 17L, 18L, 22L, 13L, 22L, 12L,
15L, 5L, 7L, 10L, 8L, 15L, 10L, 15L, 18L, 17L, 8L)), .Names =
c("litter",
"treatment", "deaths", "Freq"), row.names =
c(NA, 30L), class =
"data.frame")
From this, I want to calculate the log odds for adjacent categories of
the last variable (deaths)
and have this value in a data frame with factors litter (5), treatment
(2), and contrast (2), as detailed below.
The data can be seen in xtabs() form:
mice.tab <- xtabs(Freq ~ litter + treatment + deaths, data=Mice)
ftable(mice.tab)
deaths 0 1 2+
litter treatment
7 A 58 11 5
B 75 19 7
8 A 49 14 10
B 58 17 8
9 A 33 18 15
B 45 22 10
10 A 15 13 15
B 39 22 18
11 A 4 12 17
B 5 15 8
>
From this, I want to calculate the (adjacent) log odds of 0 vs. 1 and 1
vs.2+ deaths, which is easy in
array format,
odds1 <- log(mice.tab[,,1]/mice.tab[,,2]) # contrast 0:1
odds2 <- log(mice.tab[,,2]/mice.tab[,,3]) # contrast 1:2+
odds1
treatment
litter A B
7 1.6625477 1.3730491
8 1.2527630 1.2272297
9 0.6061358 0.7156200
10 0.1431008 0.5725192
11 -1.0986123 -1.0986123
>
But, for analysis, I want to have these in a data frame, with factors
litter, treatment and contrast
and a column, 'logodds' containing the entries in the odds1 and odds2
tables, suitably strung out.
For this problem, the desired result is given by
> result <- data.frame(expand.grid(litter=factor(7:11),
treatment=c("A","B"), deaths=c("0:1",
"1:2+")),
logodds=c(odds1, odds2))
> result
litter treatment deaths logodds
1 7 A 0:1 1.6625477
2 8 A 0:1 1.2527630
3 9 A 0:1 0.6061358
4 10 A 0:1 0.1431008
5 11 A 0:1 -1.0986123
6 7 B 0:1 1.3730491
7 8 B 0:1 1.2272297
8 9 B 0:1 0.7156200
9 10 B 0:1 0.5725192
10 11 B 0:1 -1.0986123
11 7 A 1:2+ 0.7884574
12 8 A 1:2+ 0.3364722
13 9 A 1:2+ 0.1823216
14 10 A 1:2+ -0.1431008
15 11 A 1:2+ -0.3483067
16 7 B 1:2+ 0.9985288
17 8 B 1:2+ 0.7537718
18 9 B 1:2+ 0.7884574
19 10 B 1:2+ 0.2006707
20 11 B 1:2+ 0.6286087
>
More generally, for an I x J x K table, where the last factor is the
response, my desired result
is a data frame of IJ(K-1) rows, with adjacent log odds in a 'logodds'
column, and ideally, I'd like
to have a general function to do this.
Note that if T is the 10 x 3 matrix of frequencies shown by ftable(),
the calculation is essentially
log(T) %*% matrix(c(1, -1, 0,
0, 1, -1))
followed by reshaping and labeling.
Can anyone help with this?
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street Web:http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA