In a manuscript, I have the following code to illustrate dummy coding of two factors in a contingency table. It works, but is surely obscured by the method I used, involving outer() to find equalities and 0+outer() to convert to numeric. Can someone help simplify this code to be more comprehensible and give the *same* result? I'd prefer a solution that uses base R. haireye <- margin.table(HairEyeColor, 1:2) haireye.df <- as.data.frame(haireye) dummy.hair <- 0+outer(haireye.df$Hair, levels(haireye.df$Hair), `==`) colnames(dummy.hair) <- paste0('h', 1:4) dummy.eye <- 0+outer(haireye.df$Eye, levels(haireye.df$Eye), `==`) colnames(dummy.eye) <- paste0('e', 1:4) haireye.df <- data.frame(haireye.df, dummy.hair, dummy.eye) haireye.df > haireye.df Hair Eye Freq h1 h2 h3 h4 e1 e2 e3 e4 1 Black Brown 68 1 0 0 0 1 0 0 0 2 Brown Brown 119 0 1 0 0 1 0 0 0 3 Red Brown 26 0 0 1 0 1 0 0 0 4 Blond Brown 7 0 0 0 1 1 0 0 0 5 Black Blue 20 1 0 0 0 0 1 0 0 6 Brown Blue 84 0 1 0 0 0 1 0 0 7 Red Blue 17 0 0 1 0 0 1 0 0 8 Blond Blue 94 0 0 0 1 0 1 0 0 9 Black Hazel 15 1 0 0 0 0 0 1 0 10 Brown Hazel 54 0 1 0 0 0 0 1 0 11 Red Hazel 14 0 0 1 0 0 0 1 0 12 Blond Hazel 10 0 0 0 1 0 0 1 0 13 Black Green 5 1 0 0 0 0 0 0 1 14 Brown Green 29 0 1 0 0 0 0 0 1 15 Red Green 14 0 0 1 0 0 0 0 1 16 Blond Green 16 0 0 0 1 0 0 0 1 > -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web:http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
Hi Michael, At first I thought that as.numeric() would do it, but that loses the matrix structure. Here are two solutions; I think that I prefer the second. ----------- snip --------------------> (dummy.hair <- outer(haireye.df$Hair,+ levels(haireye.df$Hair), function(x, y) as.numeric(x == y))) [,1] [,2] [,3] [,4] [1,] 1 0 0 0 [2,] 0 1 0 0 [3,] 0 0 1 0 [4,] 0 0 0 1 [5,] 1 0 0 0 [6,] 0 1 0 0 [7,] 0 0 1 0 [8,] 0 0 0 1 [9,] 1 0 0 0 [10,] 0 1 0 0 [11,] 0 0 1 0 [12,] 0 0 0 1 [13,] 1 0 0 0 [14,] 0 1 0 0 [15,] 0 0 1 0 [16,] 0 0 0 1> (dummy.hair <- model.matrix(~ -1 + Hair, data=haireye.df))HairBlack HairBrown HairRed HairBlond 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 0 0 0 1 5 1 0 0 0 6 0 1 0 0 7 0 0 1 0 8 0 0 0 1 9 1 0 0 0 10 0 1 0 0 11 0 0 1 0 12 0 0 0 1 13 1 0 0 0 14 0 1 0 0 15 0 0 1 0 16 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$Hair [1] "contr.treatment" ----------- snip -------------------- I hope this helps, John> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael > Friendly > Sent: Tuesday, December 30, 2014 6:05 PM > To: R-help > Subject: [R] simplify code for dummy coding of factors > > In a manuscript, I have the following code to illustrate dummy coding of > two factors in a contingency table. > > It works, but is surely obscured by the method I used, involving outer() > to find equalities and 0+outer() > to convert to numeric. Can someone help simplify this code to be more > comprehensible and give the > *same* result? I'd prefer a solution that uses base R. > > haireye <- margin.table(HairEyeColor, 1:2) > > haireye.df <- as.data.frame(haireye) > dummy.hair <- 0+outer(haireye.df$Hair, levels(haireye.df$Hair), `==`) > colnames(dummy.hair) <- paste0('h', 1:4) > dummy.eye <- 0+outer(haireye.df$Eye, levels(haireye.df$Eye), `==`) > colnames(dummy.eye) <- paste0('e', 1:4) > > haireye.df <- data.frame(haireye.df, dummy.hair, dummy.eye) > haireye.df > > > haireye.df > Hair Eye Freq h1 h2 h3 h4 e1 e2 e3 e4 > 1 Black Brown 68 1 0 0 0 1 0 0 0 > 2 Brown Brown 119 0 1 0 0 1 0 0 0 > 3 Red Brown 26 0 0 1 0 1 0 0 0 > 4 Blond Brown 7 0 0 0 1 1 0 0 0 > 5 Black Blue 20 1 0 0 0 0 1 0 0 > 6 Brown Blue 84 0 1 0 0 0 1 0 0 > 7 Red Blue 17 0 0 1 0 0 1 0 0 > 8 Blond Blue 94 0 0 0 1 0 1 0 0 > 9 Black Hazel 15 1 0 0 0 0 0 1 0 > 10 Brown Hazel 54 0 1 0 0 0 0 1 0 > 11 Red Hazel 14 0 0 1 0 0 0 1 0 > 12 Blond Hazel 10 0 0 0 1 0 0 1 0 > 13 Black Green 5 1 0 0 0 0 0 0 1 > 14 Brown Green 29 0 1 0 0 0 0 0 1 > 15 Red Green 14 0 0 1 0 0 0 0 1 > 16 Blond Green 16 0 0 0 1 0 0 0 1 > > > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. & Chair, Quantitative Methods > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele Street Web:http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
On Dec 30, 2014, at 3:05 PM, Michael Friendly wrote:> In a manuscript, I have the following code to illustrate dummy coding of two factors in a contingency table. > > It works, but is surely obscured by the method I used, involving outer() to find equalities and 0+outer() > to convert to numeric. Can someone help simplify this code to be more comprehensible and give the > *same* result? I'd prefer a solution that uses base R. > > haireye <- margin.table(HairEyeColor, 1:2) > > haireye.df <- as.data.frame(haireye) > dummy.hair <- 0+outer(haireye.df$Hair, levels(haireye.df$Hair), `==`) > colnames(dummy.hair) <- paste0('h', 1:4) > dummy.eye <- 0+outer(haireye.df$Eye, levels(haireye.df$Eye), `==`) > colnames(dummy.eye) <- paste0('e', 1:4) > > haireye.df <- data.frame(haireye.df, dummy.hair, dummy.eye) > haireye.df > > > haireye.df > Hair Eye Freq h1 h2 h3 h4 e1 e2 e3 e4 > 1 Black Brown 68 1 0 0 0 1 0 0 0 > 2 Brown Brown 119 0 1 0 0 1 0 0 0 > 3 Red Brown 26 0 0 1 0 1 0 0 0 > 4 Blond Brown 7 0 0 0 1 1 0 0 0 > 5 Black Blue 20 1 0 0 0 0 1 0 0 > 6 Brown Blue 84 0 1 0 0 0 1 0 0 > 7 Red Blue 17 0 0 1 0 0 1 0 0 > 8 Blond Blue 94 0 0 0 1 0 1 0 0 > 9 Black Hazel 15 1 0 0 0 0 0 1 0 > 10 Brown Hazel 54 0 1 0 0 0 0 1 0 > 11 Red Hazel 14 0 0 1 0 0 0 1 0 > 12 Blond Hazel 10 0 0 0 1 0 0 1 0 > 13 Black Green 5 1 0 0 0 0 0 0 1 > 14 Brown Green 29 0 1 0 0 0 0 0 1 > 15 Red Green 14 0 0 1 0 0 0 0 1 > 16 Blond Green 16 0 0 0 1 0 0 0 1I think the world would be a better place if you illustrated model.matrix: haireye.mtx <- cbind( model.matrix(~0+Hair, as.data.frame(haireye) ), model.matrix(~0+Eye, as.data.frame(haireye) ) ) colnames(haireye.mtx) <- gsub("[a-z]+", "", colnames(haireye.mtx) )> haireye.df2HB HB HR HB EB EB EH EG 1 1 0 0 0 1 0 0 0 2 0 1 0 0 1 0 0 0 3 0 0 1 0 1 0 0 0 4 0 0 0 1 1 0 0 0 5 1 0 0 0 0 1 0 0 6 0 1 0 0 0 1 0 0 7 0 0 1 0 0 1 0 0 8 0 0 0 1 0 1 0 0 9 1 0 0 0 0 0 1 0 10 0 1 0 0 0 0 1 0 11 0 0 1 0 0 0 1 0 12 0 0 0 1 0 0 1 0 13 1 0 0 0 0 0 0 1 14 0 1 0 0 0 0 0 1 15 0 0 1 0 0 0 0 1 16 0 0 0 1 0 0 0 1 -- David.> > > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. & Chair, Quantitative Methods > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele Street Web:http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
I like this very simple version. Note that you don't need as.data.frame(). model.matrix(Freq ~ Hair + Eye, data=haireye, contrasts.arg=list(Hair=diag(4), Eye=diag(4))) On Tue, Dec 30, 2014 at 6:05 PM, Michael Friendly <friendly at yorku.ca> wrote:> In a manuscript, I have the following code to illustrate dummy coding of two > factors in a contingency table. > > It works, but is surely obscured by the method I used, involving outer() to > find equalities and 0+outer() > to convert to numeric. Can someone help simplify this code to be more > comprehensible and give the > *same* result? I'd prefer a solution that uses base R. > > haireye <- margin.table(HairEyeColor, 1:2) > > haireye.df <- as.data.frame(haireye) > dummy.hair <- 0+outer(haireye.df$Hair, levels(haireye.df$Hair), `==`) > colnames(dummy.hair) <- paste0('h', 1:4) > dummy.eye <- 0+outer(haireye.df$Eye, levels(haireye.df$Eye), `==`) > colnames(dummy.eye) <- paste0('e', 1:4) > > haireye.df <- data.frame(haireye.df, dummy.hair, dummy.eye) > haireye.df > >> haireye.df > Hair Eye Freq h1 h2 h3 h4 e1 e2 e3 e4 > 1 Black Brown 68 1 0 0 0 1 0 0 0 > 2 Brown Brown 119 0 1 0 0 1 0 0 0 > 3 Red Brown 26 0 0 1 0 1 0 0 0 > 4 Blond Brown 7 0 0 0 1 1 0 0 0 > 5 Black Blue 20 1 0 0 0 0 1 0 0 > 6 Brown Blue 84 0 1 0 0 0 1 0 0 > 7 Red Blue 17 0 0 1 0 0 1 0 0 > 8 Blond Blue 94 0 0 0 1 0 1 0 0 > 9 Black Hazel 15 1 0 0 0 0 0 1 0 > 10 Brown Hazel 54 0 1 0 0 0 0 1 0 > 11 Red Hazel 14 0 0 1 0 0 0 1 0 > 12 Blond Hazel 10 0 0 0 1 0 0 1 0 > 13 Black Green 5 1 0 0 0 0 0 0 1 > 14 Brown Green 29 0 1 0 0 0 0 0 1 > 15 Red Green 14 0 0 1 0 0 0 0 1 > 16 Blond Green 16 0 0 0 1 0 0 0 1 >> > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. & Chair, Quantitative Methods > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele Street Web:http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
"More comprehensible" depends on context, which we don't have. You could be simply trying to illustrate the logic of the transformation (solution 1 below) or providing a recipe which by its brevity is (perhaps?) memorable (solution 2 below). Solution 1: appendDummys <- function( DF, keycol, d.base ) { # get levels of key column lvls <- levels( DF[[ keycol ]] ) # for each level in the key column keyno <- 1L for ( keylvl in lvls ) { # name for new column dname <- paste0( d.base, keyno ) # make the new column, filled with default value DF[[ dname ]] <- 0L # change those values in the new column where the value matches the # current level DF[ keylvl == DF[[ keycol ]], dname ] <- 1L # prepare for next loop keyno <- keyno + 1L } # return modified data frame DF } haireye <- margin.table(HairEyeColor, 1:2) haireye.df <- as.data.frame(haireye) haireye.df <- appendDummys( haireye.df, "Hair", "h" ) haireye.df <- appendDummys( haireye.df, "Eye", "e" ) ### Solution 2 haireye <- margin.table(HairEyeColor, 1:2) haireye.df <- as.data.frame(haireye) dummykeys <- data.frame( h = factor( as.integer( haireye.df$Hair ) ) , e = factor( as.integer( haireye.df$Eye ) ) ) dummy.hair <- as.data.frame( model.matrix( ~ h - 1 ), data=dummykeys ) dummy.eye <- as.data.frame( model.matrix( ~ e - 1 ), data=dummykeys ) haireye.df <- data.frame( haireye.df, dummy.hair, dummy.eye ) ### FWIW I am not a fan of mixing the model matrix columns in with the original data... the column names can (in general) clash. On Tue, 30 Dec 2014, Michael Friendly wrote:> In a manuscript, I have the following code to illustrate dummy coding of two > factors in a contingency table. > > It works, but is surely obscured by the method I used, involving outer() to > find equalities and 0+outer() > to convert to numeric. Can someone help simplify this code to be more > comprehensible and give the > *same* result? I'd prefer a solution that uses base R. > > haireye <- margin.table(HairEyeColor, 1:2) > > haireye.df <- as.data.frame(haireye) > dummy.hair <- 0+outer(haireye.df$Hair, levels(haireye.df$Hair), `==`) > colnames(dummy.hair) <- paste0('h', 1:4) > dummy.eye <- 0+outer(haireye.df$Eye, levels(haireye.df$Eye), `==`) > colnames(dummy.eye) <- paste0('e', 1:4) > > haireye.df <- data.frame(haireye.df, dummy.hair, dummy.eye) > haireye.df > >> haireye.df > Hair Eye Freq h1 h2 h3 h4 e1 e2 e3 e4 > 1 Black Brown 68 1 0 0 0 1 0 0 0 > 2 Brown Brown 119 0 1 0 0 1 0 0 0 > 3 Red Brown 26 0 0 1 0 1 0 0 0 > 4 Blond Brown 7 0 0 0 1 1 0 0 0 > 5 Black Blue 20 1 0 0 0 0 1 0 0 > 6 Brown Blue 84 0 1 0 0 0 1 0 0 > 7 Red Blue 17 0 0 1 0 0 1 0 0 > 8 Blond Blue 94 0 0 0 1 0 1 0 0 > 9 Black Hazel 15 1 0 0 0 0 0 1 0 > 10 Brown Hazel 54 0 1 0 0 0 0 1 0 > 11 Red Hazel 14 0 0 1 0 0 0 1 0 > 12 Blond Hazel 10 0 0 0 1 0 0 1 0 > 13 Black Green 5 1 0 0 0 0 0 0 1 > 14 Brown Green 29 0 1 0 0 0 0 0 1 > 15 Red Green 14 0 0 1 0 0 0 0 1 > 16 Blond Green 16 0 0 0 1 0 0 0 1 >> > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. & Chair, Quantitative Methods > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele Street Web:http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
On 12/30/14 7:16 PM, Richard M. Heiberger wrote:> I like this very simple version. Note that you don't need as.data.frame(). > > model.matrix(Freq ~ Hair + Eye, data=haireye, > contrasts.arg=list(Hair=diag(4), Eye=diag(4)))Thanks to all who replied to this question. model.matrix() was what I had missed, but to get the result in the form I wanted, required a bit more work. These solutions give me what I asked for. haireye <- margin.table(HairEyeColor, 1:2) # Jeff Newmiller, solution 2, corrected haireye.df <- as.data.frame(haireye) dummykeys <- data.frame( h = factor( as.integer( haireye.df$Hair ) ) , e = factor( as.integer( haireye.df$Eye ) ) ) dummy.hair <- as.data.frame( model.matrix( ~ h - 1, data=dummykeys )) dummy.eye <- as.data.frame( model.matrix( ~ e - 1, data=dummykeys )) haireye.df <- data.frame( haireye.df, dummy.hair, dummy.eye ) # Rich Heiberger, removing intercept, including haireye data haireye.df <- as.data.frame(haireye) haireye.df <- cbind( haireye.df, model.matrix(Freq ~ Hair + Eye, data=haireye, contrasts.arg=list(Hair=diag(4), Eye=diag(4)))[,-1] ) haireye.df -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA