soeren.vogel at eawag.ch
2009-Mar-07 13:39 UTC
[R] Recode factor into binary factor-level vars
How to I "recode" a factor into a binary data frame according to the
factor levels:
### example:start
set.seed(20)
l <- sample(rep.int(c("locA", "locB", "locC",
"locD"), 100), 10,
replace=T)
# [1] "locD" "locD" "locD" "locD"
"locB" "locA" "locA" "locA"
"locD"
"locA"
### example:end
What I want in the end is the following:
m$locA: 0, 0, 0, 0, 0, 1, 1, 1, 0, 1
m$locB: 0, 0, 0, 0, 1, 0, 0, 0, 0, 0
m$locC: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
m$locD: 1, 1, 1, 1, 0, 0, 0, 0, 1, 0
Instead of 0, NA's would also be fine.
Thanks, S?ren
--
S?ren Vogel, PhD-Student, Eawag, Dept. SIAM
http://www.eawag.ch, http://sozmod.eawag.ch
one way is:
set.seed(20)
l <- sample(rep.int(c("locA", "locB", "locC",
"locD"), 100), 10, replace=T)
f <- factor(l, levels = paste("loc", LETTERS[1:4], sep =
""))
m <- as.data.frame(model.matrix(~ f - 1))
names(m) <- levels(f)
m
I hope it helps.
Best,
Dimitris
soeren.vogel at eawag.ch wrote:> How to I "recode" a factor into a binary data frame according to
the
> factor levels:
>
> ### example:start
> set.seed(20)
> l <- sample(rep.int(c("locA", "locB",
"locC", "locD"), 100), 10, replace=T)
> # [1] "locD" "locD" "locD" "locD"
"locB" "locA" "locA" "locA"
"locD" "locA"
> ### example:end
>
> What I want in the end is the following:
>
> m$locA: 0, 0, 0, 0, 0, 1, 1, 1, 0, 1
> m$locB: 0, 0, 0, 0, 1, 0, 0, 0, 0, 0
> m$locC: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> m$locD: 1, 1, 1, 1, 0, 0, 0, 0, 1, 0
>
> Instead of 0, NA's would also be fine.
>
> Thanks, S?ren
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
S?ren;
You need to somehow add back to the information that is in "l" that
fact that it was sampled from a set with 4 elements. Since you didn't
sample from a factor the level information was lost. Otherwise, you
coud create that list with unique(l) which in this case only returns 3
elements:
set.l <- c("locA", "locB", "locC",
"locD")
sapply(set.l, function(x) l == x)
locA locB locC locD
[1,] FALSE FALSE FALSE TRUE
[2,] FALSE FALSE FALSE TRUE
[3,] FALSE FALSE FALSE TRUE
[4,] FALSE FALSE FALSE TRUE
[5,] FALSE TRUE FALSE FALSE
[6,] TRUE FALSE FALSE FALSE
[7,] TRUE FALSE FALSE FALSE
[8,] TRUE FALSE FALSE FALSE
[9,] FALSE FALSE FALSE TRUE
[10,] TRUE FALSE FALSE FALSE
Its in the wrong orientation because "l" is actually a column vector,
so t() fixes that and adding 0 to TRUE/FALSE returns 0/1:
t(sapply(set.l, function(x) x == l))+0
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
locA 0 0 0 0 0 1 1 1 0 1
locB 0 0 0 0 1 0 0 0 0 0
locC 0 0 0 0 0 0 0 0 0 0
locD 1 1 1 1 0 0 0 0 1 0
m <- as.data.frame(t(sapply(set.l, function(x) l == x))+0)
m
The one-liner would be:
m <- as.data.frame(t(sapply(c("locA", "locB",
"locC", "locD"),
function(x) l == x))+0)
You canalso you mapply but the result does not have the desired row
names and the column names are the result of the sampling whcih seems
to me potentially confusing:
> mapply(function(x) x==set.l, l)+0
locD locD locD locD locB locA locA locA locD locA
[1,] 0 0 0 0 0 1 1 1 0 1
[2,] 0 0 0 0 1 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 1 1 1 1 0 0 0 0 1 0
I see that Dimitris has already given you a perfectly workable
solution, but these seem to tackle the problem from a different angle.
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
On Mar 7, 2009, at 8:39 AM, soeren.vogel at eawag.ch wrote:
> How to I "recode" a factor into a binary data frame according to
the
> factor levels:
>
> ### example:start
> set.seed(20)
> l <- sample(rep.int(c("locA", "locB",
"locC", "locD"), 100), 10,
> replace=T)
> # [1] "locD" "locD" "locD" "locD"
"locB" "locA" "locA" "locA"
"locD"
> "locA"
> ### example:end
>
> What I want in the end is the following:
>
> m$locA: 0, 0, 0, 0, 0, 1, 1, 1, 0, 1
> m$locB: 0, 0, 0, 0, 1, 0, 0, 0, 0, 0
> m$locC: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> m$locD: 1, 1, 1, 1, 0, 0, 0, 0, 1, 0
>
> Instead of 0, NA's would also be fine.
>
> Thanks, S?ren
>
> --
> S?ren Vogel, PhD-Student, Eawag, Dept. SIAM
> http://www.eawag.ch, http://sozmod.eawag.ch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.