Dear R-help, First of all, thank you VERY much for any help you have time to offer. I greatly appreciate it. I would like to write a function that, given an arbitrary number of factors from a data frame, tabulates the number of occurrences of each unique combination of the factors. Cleary, this works:> table(horse,date,surface)<SNIP> , , surface = TURF date horse 20080404 20080514 20081015 20081025 20081120 20081203 20090319 Bedevil 0 0 0 0 0 0 0 Cut To The Point 227 0 0 0 0 0 0 <SNIP> But I would prefer output that skips all the zeros, flattens any dimensions greater than 2, and gives the level names rather than codes. I can write code specifically for n factors like this: (here 2 levels): ft <- function(x,y) {cbind( levels(x)[unique(cbind(x,y))[,1]],levels(y)[unique(cbind(x,y))[,2]], table(x,y)[unique(cbind(x,y))])} which gives the lovely output I'm looking for: # [,1] [,2] [,3] # [1,] "Cut To The Point" "20080404" "227" # [2,] "Prairie Wolf" "20080404" "364" # [3,] "Bedevil" "20080514" "319" # [4,] "Prairie Wolf" "20080514" "330" But my attempts to make this into a function that handles arbitrary numbers of factors as separate input arguments has failed. The closest I can get is: ft2 <- function (...) { cbind( unique(cbind(...)), table(...)[unique(cbind(...))] ) giving:> ft2(horse,date)horse date [1,] 2 1 227 [2,] 9 1 364 [3,] 1 2 319 [4,] 9 2 330 [5,] 9 3 291 [6,] 12 3 249 [7,] 10 3 286 [8,] 5 4 217 [9,] 3 4 426 [10,] 8 4 468 [11,] 9 5 319 [12,] 13 5 328 [13,] 12 5 138 [14,] 7 6 375 [15,] 11 6 366 [16,] 4 7 255 [17,] 6 7 517 I would be greatly in debt to anyone willing to show me how to make the above function take arbitrary inputs and still produce output displaying factor level names instead of the underlying coded numbers. Cheers and thanks for your time! Andrew Spence RCUK Academic Research Fellow Structure and Motion Laboratory Royal Veterinary College Hawkshead Lane North Mymms, Hatfield Hertfordshire AL9 7TA +44 (0) 1707 666988 mailto:aspence@rvc.ac.uk http://www.rvc.ac.uk/sml/People/andrewspence.cfm [[alternative HTML version deleted]]
Andrew, Is this what you're looking for? Most likely a more elegant solution exists... but maybe this is good enough. ## BEGIN R SAMPLE CODE ## sample data frame, 3 factors tmp <- data.frame(f1 = sample(gl(2, 50, labels = c("Male", "Female"))), f2 = sample(gl(4, 25, labels c("White", "Black", "Hispanic", "Other"))), f3 = sample(gl(4, 25, labels c("0-20", "21-40", "41-60", "61-80")))) summary(tmp) ## the function test <- function(...) { tbl <- table(interaction(..., sep = "!")) tbl.nozero <- tbl[tbl > 0] nms <- strsplit(names(tbl.nozero), "!") cb <- cbind(t(do.call(data.frame, nms)), tbl.nozero) dimnames(cb) <- NULL cb } ## test calling the function, does this produce what you want? with(tmp, test(f1, f2, f3)) ## END R SAMPLE CODE Best Regards, Erik Iverson> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Andrew Spence > Sent: Friday, October 02, 2009 1:15 PM > To: r-help at r-project.org > Subject: [R] Tabulating using arbitrary numbers of factors > > Dear R-help, > > > > First of all, thank you VERY much for any help you have time to offer. I > greatly appreciate it. > > > > I would like to write a function that, given an arbitrary number of > factors > from a data frame, tabulates the number of occurrences of each unique > combination of the factors. Cleary, this works: > > > > > table(horse,date,surface) > > <SNIP> > > , , surface = TURF > > > > date > > horse 20080404 20080514 20081015 20081025 20081120 20081203 > 20090319 > > Bedevil 0 0 0 0 0 0 > 0 > > Cut To The Point 227 0 0 0 0 0 > 0 > > <SNIP> > > > > But I would prefer output that skips all the zeros, flattens any > dimensions > greater than 2, and gives the level names rather than codes. I can write > code specifically for n factors like this: (here 2 levels): > > > > ft <- function(x,y) {cbind( > levels(x)[unique(cbind(x,y))[,1]],levels(y)[unique(cbind(x,y))[,2]], > table(x,y)[unique(cbind(x,y))])} > > > > which gives the lovely output I'm looking for: > > > > # [,1] [,2] [,3] > > # [1,] "Cut To The Point" "20080404" "227" > > # [2,] "Prairie Wolf" "20080404" "364" > > # [3,] "Bedevil" "20080514" "319" > > # [4,] "Prairie Wolf" "20080514" "330" > > > > But my attempts to make this into a function that handles arbitrary > numbers > of factors as separate input arguments has failed. The closest I can get > is: > > > > ft2 <- function (...) { cbind( unique(cbind(...)), > table(...)[unique(cbind(...))] ) > > > > giving: > > > ft2(horse,date) > > horse date > > [1,] 2 1 227 > > [2,] 9 1 364 > > [3,] 1 2 319 > > [4,] 9 2 330 > > [5,] 9 3 291 > > [6,] 12 3 249 > > [7,] 10 3 286 > > [8,] 5 4 217 > > [9,] 3 4 426 > > [10,] 8 4 468 > > [11,] 9 5 319 > > [12,] 13 5 328 > > [13,] 12 5 138 > > [14,] 7 6 375 > > [15,] 11 6 366 > > [16,] 4 7 255 > > [17,] 6 7 517 > > > > I would be greatly in debt to anyone willing to show me how to make the > above function take arbitrary inputs and still produce output displaying > factor level names instead of the underlying coded numbers. > > > > Cheers and thanks for your time! > > > > Andrew Spence > RCUK Academic Research Fellow > Structure and Motion Laboratory > Royal Veterinary College > Hawkshead Lane > North Mymms, Hatfield > Hertfordshire AL9 7TA > +44 (0) 1707 666988 > > mailto:aspence at rvc.ac.uk > > http://www.rvc.ac.uk/sml/People/andrewspence.cfm > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
try 'reshape':> require(reshape) > # add a column to accumulate on > tmp$inc <- 1 > recast(tmp, f1 + f2 + f3 ~ ., sum)Using f1, f2, f3 as id variables f1 f2 f3 (all) 1 Male White 0-20 3 2 Male White 21-40 4 3 Male White 41-60 2 4 Male White 61-80 3 5 Male Black 0-20 3 6 Male Black 21-40 4 7 Male Black 41-60 2 8 Male Black 61-80 3 9 Male Hispanic 0-20 4 10 Male Hispanic 21-40 4 11 Male Hispanic 41-60 4 12 Male Hispanic 61-80 3 13 Male Other 0-20 3 14 Male Other 21-40 2 15 Male Other 41-60 2 16 Male Other 61-80 4 17 Female White 0-20 2 18 Female White 21-40 4 19 Female White 41-60 4 20 Female White 61-80 3 21 Female Black 0-20 5 22 Female Black 21-40 3 23 Female Black 41-60 4 24 Female Black 61-80 1 25 Female Hispanic 0-20 1 26 Female Hispanic 21-40 2 27 Female Hispanic 41-60 4 28 Female Hispanic 61-80 3 29 Female Other 0-20 4 30 Female Other 21-40 2 31 Female Other 41-60 3 32 Female Other 61-80 5> >On Fri, Oct 2, 2009 at 2:15 PM, Andrew Spence <aspence at rvc.ac.uk> wrote:> Dear R-help, > > > > First of all, thank you VERY much for any help you have time to offer. I > greatly appreciate it. > > > > I would like to write a function that, given an arbitrary number of factors > from a data frame, tabulates the number of occurrences of each unique > combination of the factors. Cleary, this works: > > > >> table(horse,date,surface) > > <SNIP> > > , , surface = TURF > > > > ? ? ? ? ? ? ? ? ? date > > horse ? ? ? ? ? ? ? 20080404 20080514 20081015 20081025 20081120 20081203 > 20090319 > > ?Bedevil ? ? ? ? ? ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > 0 > > ?Cut To The Point ? ? ? 227 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 ? ? ? ?0 > 0 > > <SNIP> > > > > But I would prefer output that skips all the zeros, flattens any dimensions > greater than 2, and gives the level names rather than codes. I can write > code specifically for n factors like this: (here 2 levels): > > > > ft <- function(x,y) {cbind( > levels(x)[unique(cbind(x,y))[,1]],levels(y)[unique(cbind(x,y))[,2]], > table(x,y)[unique(cbind(x,y))])} > > > > which gives the lovely output I'm looking for: > > > > # ? ? ?[,1] ? ? ? ? ? ? ? ?[,2] ? ? ? [,3] > > # [1,] "Cut To The Point" ?"20080404" "227" > > # [2,] "Prairie Wolf" ? ? ?"20080404" "364" > > # [3,] "Bedevil" ? ? ? ? ? "20080514" "319" > > # [4,] "Prairie Wolf" ? ? ?"20080514" "330" > > > > But my attempts to make this into a function that handles arbitrary numbers > of factors as separate input arguments has failed. The closest I can get is: > > > > ft2 <- function (...) { cbind( unique(cbind(...)), > table(...)[unique(cbind(...))] ) > > > > giving: > >> ft2(horse,date) > > ? ? ?horse date > > ?[1,] ? ? 2 ? ?1 227 > > ?[2,] ? ? 9 ? ?1 364 > > ?[3,] ? ? 1 ? ?2 319 > > ?[4,] ? ? 9 ? ?2 330 > > ?[5,] ? ? 9 ? ?3 291 > > ?[6,] ? ?12 ? ?3 249 > > ?[7,] ? ?10 ? ?3 286 > > ?[8,] ? ? 5 ? ?4 217 > > ?[9,] ? ? 3 ? ?4 426 > > [10,] ? ? 8 ? ?4 468 > > [11,] ? ? 9 ? ?5 319 > > [12,] ? ?13 ? ?5 328 > > [13,] ? ?12 ? ?5 138 > > [14,] ? ? 7 ? ?6 375 > > [15,] ? ?11 ? ?6 366 > > [16,] ? ? 4 ? ?7 255 > > [17,] ? ? 6 ? ?7 517 > > > > I would be greatly in debt to anyone willing to show me how to make the > above function take arbitrary inputs and still produce output displaying > factor level names instead of the underlying coded numbers. > > > > Cheers and thanks for your time! > > > > Andrew Spence > RCUK Academic Research Fellow > Structure and Motion Laboratory > Royal Veterinary College > Hawkshead Lane > North Mymms, Hatfield > Hertfordshire AL9 7TA > +44 (0) 1707 666988 > > mailto:aspence at rvc.ac.uk > > http://www.rvc.ac.uk/sml/People/andrewspence.cfm > > > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On 10/03/2009 04:15 AM, Andrew Spence wrote:> Dear R-help, > > > > First of all, thank you VERY much for any help you have time to offer. I > greatly appreciate it. > > > > I would like to write a function that, given an arbitrary number of factors > from a data frame, tabulates the number of occurrences of each unique > combination of the factors. Cleary, this works: > > > > >> table(horse,date,surface) >> > <SNIP> > > , , surface = TURF > > > > date > > horse 20080404 20080514 20081015 20081025 20081120 20081203 > 20090319 > > Bedevil 0 0 0 0 0 0 > 0 > > Cut To The Point 227 0 0 0 0 0 > 0 > > <SNIP> > > > > But I would prefer output that skips all the zeros, flattens any dimensions > greater than 2, and gives the level names rather than codes. I can write > code specifically for n factors like this: (here 2 levels): > > > > ft<- function(x,y) {cbind( > levels(x)[unique(cbind(x,y))[,1]],levels(y)[unique(cbind(x,y))[,2]], > table(x,y)[unique(cbind(x,y))])} > > > > which gives the lovely output I'm looking for: > > > > # [,1] [,2] [,3] > > # [1,] "Cut To The Point" "20080404" "227" > > # [2,] "Prairie Wolf" "20080404" "364" > > # [3,] "Bedevil" "20080514" "319" > > # [4,] "Prairie Wolf" "20080514" "330" > > > > But my attempts to make this into a function that handles arbitrary numbers > of factors as separate input arguments has failed. The closest I can get is: > > > > ft2<- function (...) { cbind( unique(cbind(...)), > table(...)[unique(cbind(...))] ) > > > > giving: > > >> ft2(horse,date) >> > horse date > > [1,] 2 1 227 > > [2,] 9 1 364 > > [3,] 1 2 319 > > [4,] 9 2 330 > > [5,] 9 3 291 > > [6,] 12 3 249 > > [7,] 10 3 286 > > [8,] 5 4 217 > > [9,] 3 4 426 > > [10,] 8 4 468 > > [11,] 9 5 319 > > [12,] 13 5 328 > > [13,] 12 5 138 > > [14,] 7 6 375 > > [15,] 11 6 366 > > [16,] 4 7 255 > > [17,] 6 7 517 > > > > I would be greatly in debt to anyone willing to show me how to make the > above function take arbitrary inputs and still produce output displaying > factor level names instead of the underlying coded numbers. > >Hi Andrew, The sizetree function in plotrix does what you want graphically, I think. Perhaps if each invocation returned the vector of counts, the deepest level of counts would be returned at the final exit with the factor levels. Jim