I have a large n-way contingency table, constructed as a table object, and want to pool (collapse) some categories, summing the frequencies in all collapsed cells. How can I do this? thx, -Michael -- Michael Friendly friendly at yorku.ca York University http://www.math.yorku.ca/SCS/friendly.html Psychology Department 4700 Keele Street Tel: (416) 736-5115 x66249 Toronto, Ontario, M3J 1P3 Fax: (416) 736-5814 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Dear Mike, At 01:53 PM 4/8/2002 -0400, Michael Friendly wrote:>I have a large n-way contingency table, constructed as a table >object, and want to pool (collapse) some categories, summing the >frequencies in all collapsed cells. How can I do this?Use apply; for example: > tab <- as.table(array(1:24, c(2,3,4))) > tab , , A A B C A 1 3 5 B 2 4 6 , , B A B C A 7 9 11 B 8 10 12 , , C A B C A 13 15 17 B 14 16 18 , , D A B C A 19 21 23 B 20 22 24 > apply(tab, c(2,3), sum) # sums over first coordinate A B C D A 3 15 27 39 B 7 19 31 43 C 11 23 35 47 I hope that this helps, John ----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
beautiful, I didn't realise you could use "c(2,3)". Its not in
"help". I
guess it's so obvious now...
John Strumila
> apply(tab, c(2,3), sum) # sums over first coordinate
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Perhaps I should have given an example.
# 5 column matrix: Day, Time, Station, State, Freq
tv.matrix<-read.table("C:/R/mosaics/tv.dat")
> tv.matrix[1:5,]
V1 V2 V3 V4 V5
1 1 1 1 1 6
2 2 1 1 1 18
3 3 1 1 1 6
4 4 1 1 1 2
5 5 1 1 1 11>
tv <- array(tv.matrix[,5], dim=c(5,11,5,3))
dimnames(tv) <-
list(c("Monday","Tuesday","Wednesday","Thursday","Friday"),
c("8:00","8:15","8:30","8:45","9:00","9:15","9:30",
"9:45","10:00","10:15","10:30"),
c("A","C","N","F","Other"),
c("O","S","P"))
names(dimnames(tv))<-c("Day", "Time",
"Station", "State")
Say I want to collapse the 11 time categories to 3. The real problem
is when I have only the 4-way array (or an equivalent table). But
here, I thought I could just collapse the 2nd column to 3 categories:
tv.matrix[,2] <- 1+ as.integer((tv.matrix[,2]-1) /4)
tv2 <- array(tv.matrix[,5],
dim=c(5,3,5,3))
dimnames(tv2) <-
list(c("Monday","Tuesday","Wednesday","Thursday","Friday"),
c("8:00-8:45","9:00-9:45","10:00-10:30"),
c("A","C","N","F","Other"),
c("O","S","P"))
names(dimnames(tv2))<-c("Day", "Time",
"Station", "State")
But something is wrong, because the margins are not the same:
> margin.table(tv,1)
Day
Monday Tuesday Wednesday Thursday Friday
21271 20486 19304 19779 17275 > margin.table(tv2,1)
Day
Monday Tuesday Wednesday Thursday Friday
996 912 962 898 771
What am I missing?
-Michael
--
Michael Friendly friendly at yorku.ca
York University http://www.math.yorku.ca/SCS/friendly.html
Psychology Department
4700 Keele Street Tel: (416) 736-5115 x66249
Toronto, Ontario, M3J 1P3 Fax: (416) 736-5814
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
you're right as usual. I only looked at the 'arguments' section. -----Original Message----- From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk] Sent: Friday, 12 April 2002 4:40 AM To: Strumila, John Cc: r-help Subject: RE: [R] pooling categories in a table On Tue, 9 Apr 2002, Strumila, John wrote:> beautiful, I didn't realise you could use "c(2,3)". Its not in "help". I > guess it's so obvious now...I beg to differ: it is under MARGIN in help(apply) (and in a couple of good books on S). Also note that some 1.5.0 colSums() will do this sort of thing a mite more efficiently. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._