soeren.vogel at eawag.ch
2009-Mar-07 13:39 UTC
[R] Recode factor into binary factor-level vars
How to I "recode" a factor into a binary data frame according to the factor levels: ### example:start set.seed(20) l <- sample(rep.int(c("locA", "locB", "locC", "locD"), 100), 10, replace=T) # [1] "locD" "locD" "locD" "locD" "locB" "locA" "locA" "locA" "locD" "locA" ### example:end What I want in the end is the following: m$locA: 0, 0, 0, 0, 0, 1, 1, 1, 0, 1 m$locB: 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 m$locC: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 m$locD: 1, 1, 1, 1, 0, 0, 0, 0, 1, 0 Instead of 0, NA's would also be fine. Thanks, S?ren -- S?ren Vogel, PhD-Student, Eawag, Dept. SIAM http://www.eawag.ch, http://sozmod.eawag.ch
one way is: set.seed(20) l <- sample(rep.int(c("locA", "locB", "locC", "locD"), 100), 10, replace=T) f <- factor(l, levels = paste("loc", LETTERS[1:4], sep = "")) m <- as.data.frame(model.matrix(~ f - 1)) names(m) <- levels(f) m I hope it helps. Best, Dimitris soeren.vogel at eawag.ch wrote:> How to I "recode" a factor into a binary data frame according to the > factor levels: > > ### example:start > set.seed(20) > l <- sample(rep.int(c("locA", "locB", "locC", "locD"), 100), 10, replace=T) > # [1] "locD" "locD" "locD" "locD" "locB" "locA" "locA" "locA" "locD" "locA" > ### example:end > > What I want in the end is the following: > > m$locA: 0, 0, 0, 0, 0, 1, 1, 1, 0, 1 > m$locB: 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 > m$locC: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > m$locD: 1, 1, 1, 1, 0, 0, 0, 0, 1, 0 > > Instead of 0, NA's would also be fine. > > Thanks, S?ren >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
S?ren; You need to somehow add back to the information that is in "l" that fact that it was sampled from a set with 4 elements. Since you didn't sample from a factor the level information was lost. Otherwise, you coud create that list with unique(l) which in this case only returns 3 elements: set.l <- c("locA", "locB", "locC", "locD") sapply(set.l, function(x) l == x) locA locB locC locD [1,] FALSE FALSE FALSE TRUE [2,] FALSE FALSE FALSE TRUE [3,] FALSE FALSE FALSE TRUE [4,] FALSE FALSE FALSE TRUE [5,] FALSE TRUE FALSE FALSE [6,] TRUE FALSE FALSE FALSE [7,] TRUE FALSE FALSE FALSE [8,] TRUE FALSE FALSE FALSE [9,] FALSE FALSE FALSE TRUE [10,] TRUE FALSE FALSE FALSE Its in the wrong orientation because "l" is actually a column vector, so t() fixes that and adding 0 to TRUE/FALSE returns 0/1: t(sapply(set.l, function(x) x == l))+0 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] locA 0 0 0 0 0 1 1 1 0 1 locB 0 0 0 0 1 0 0 0 0 0 locC 0 0 0 0 0 0 0 0 0 0 locD 1 1 1 1 0 0 0 0 1 0 m <- as.data.frame(t(sapply(set.l, function(x) l == x))+0) m The one-liner would be: m <- as.data.frame(t(sapply(c("locA", "locB", "locC", "locD"), function(x) l == x))+0) You canalso you mapply but the result does not have the desired row names and the column names are the result of the sampling whcih seems to me potentially confusing: > mapply(function(x) x==set.l, l)+0 locD locD locD locD locB locA locA locA locD locA [1,] 0 0 0 0 0 1 1 1 0 1 [2,] 0 0 0 0 1 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 1 1 1 1 0 0 0 0 1 0 I see that Dimitris has already given you a perfectly workable solution, but these seem to tackle the problem from a different angle. -- David Winsemius, MD Heritage Laboratories West Hartford, CT On Mar 7, 2009, at 8:39 AM, soeren.vogel at eawag.ch wrote:> How to I "recode" a factor into a binary data frame according to the > factor levels: > > ### example:start > set.seed(20) > l <- sample(rep.int(c("locA", "locB", "locC", "locD"), 100), 10, > replace=T) > # [1] "locD" "locD" "locD" "locD" "locB" "locA" "locA" "locA" "locD" > "locA" > ### example:end > > What I want in the end is the following: > > m$locA: 0, 0, 0, 0, 0, 1, 1, 1, 0, 1 > m$locB: 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 > m$locC: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > m$locD: 1, 1, 1, 1, 0, 0, 0, 0, 1, 0 > > Instead of 0, NA's would also be fine. > > Thanks, S?ren > > -- > S?ren Vogel, PhD-Student, Eawag, Dept. SIAM > http://www.eawag.ch, http://sozmod.eawag.ch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.