thr3ads.net - R help - [R] recode categorial vars into binary data [May 2013]

If this information is useful, please help other people find it:
Share via:

D. Alain

2013-May-07 16:20 UTC

[R] recode categorial vars into binary data

Dear R-List, 

I would like to recode categorial variables into binary data, so that all values
above median are coded 1 and all values below 0, separating each var into two
equally large groups (e.g. good performers = 0 vs. bad performers =1).

I have not succeeded so far in finding a nice solution to do that in R. I
thought there might be a better way than ordering each column and recoding the
first 50% into 0 and the second into 1. If I use ifelse I have a problem with
cases that share the same rank being all median.

e.g.
df<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(1,1,4,2,3,2,2,5,2,2),k2=c(1,2,3,2,1,2,1,3,3,2),result=c(4,3,5,4,2,6,4,4,2,3)))

now I want to recode k1 and k2 so that I have half of the values recoded 0 and
half recoded 1, split around the median point. The median of k1 is 2 which would
lead to unequal groupsize if used 2 as cutoff, so all values k1=2 should be
recoded 1 or 0 randomly until both categories have the same length.

something like

df.rec<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(0,0,1,0,1,1,0,1,0,1),k2=c(0,1,1,0,0,1,0,1,1,0),result=c(4,3,5,4,2,6,4,4,2,3)))

Can anyone help?

Thank you in advance.

Best wishes.
Alain  
	[[alternative HTML version deleted]]

Rui Barradas

2013-May-07 16:51 UTC

head link

[R] recode categorial vars into binary data

Hello,

First of all, you don't need as.data.frame(cbind(...)). It's much better
to simply do data.frame(...).
As for the conversion, the following function doesn't use randomness but 
gets the job done



df <- data.frame(snr=c(1,2,3,4,5,6,7,8,9,10),
	k1=c(1,1,4,2,3,2,2,5,2,2),
	k2=c(1,2,3,2,1,2,1,3,3,2),
	result=c(4,3,5,4,2,6,4,4,2,3))

fun <- function(x){
	n <- length(x)
	y <- rep(NA, n)
	y[x < median(x)] <- 0
	y[x > median(x)] <- 1
	w <- which(x == median(x))
	y[w[seq_len(n/2 - length(which(x < median(x))))]] <- 0
	y[is.na(y)] <- 1
	y
}

fun(df$k1)
fun(df$k2)



Hope this helps,

Rui Barradas

Em 07-05-2013 17:20, D. Alain escreveu:> Dear R-List,
>
> I would like to recode categorial variables into binary data, so that all
values above median are coded 1 and all values below 0, separating each var into
two equally large groups (e.g. good performers = 0 vs. bad performers =1).
>
> I have not succeeded so far in finding a nice solution to do that in R. I
thought there might be a better way than ordering each column and recoding the
first 50% into 0 and the second into 1. If I use ifelse I have a problem with
cases that share the same rank being all median.
>
> e.g.
>
df<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(1,1,4,2,3,2,2,5,2,2),k2=c(1,2,3,2,1,2,1,3,3,2),result=c(4,3,5,4,2,6,4,4,2,3)))
>
> now I want to recode k1 and k2 so that I have half of the values recoded 0
and half recoded 1, split around the median point. The median of k1 is 2 which
would lead to unequal groupsize if used 2 as cutoff, so all values k1=2 should
be recoded 1 or 0 randomly until both categories have the same length.
>
> something like
>
>
df.rec<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(0,0,1,0,1,1,0,1,0,1),k2=c(0,1,1,0,0,1,0,1,1,0),result=c(4,3,5,4,2,6,4,4,2,3)))
>
> Can anyone help?
>
> Thank you in advance.
>
> Best wishes.
> Alain
> 	[[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

David Winsemius

2013-May-07 17:00 UTC

head link

[R] recode categorial vars into binary data

On May 7, 2013, at 9:20 AM, D. Alain wrote:
> Dear R-List, 
> 
> I would like to recode categorial variables into binary data, so that all
values above median are coded 1 and all values below 0, separating each var into
two equally large groups (e.g. good performers = 0 vs. bad performers =1).
> 
> I have not succeeded so far in finding a nice solution to do that in R. I
thought there might be a better way than ordering each column and recoding the
first 50% into 0 and the second into 1. If I use ifelse I have a problem with
cases that share the same rank being all median.
> 
> e.g.
>
df<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(1,1,4,2,3,2,2,5,2,2),k2=c(1,2,3,2,1,2,1,3,3,2),result=c(4,3,5,4,2,6,4,4,2,3)))
First off, stop using cbind() when it is not needed. You will not see the reason
when the columns are all numeric but you will start experiencing pain and
puzzlement when the arguments are of mixed classes. The data.frame function will
do what you want. (Where do people pick up this practice anyway?)



df[,2] <- as.numeric( order(df[,2]) >= length(df[,2])/2 )



> 
> now I want to recode k1 and k2 so that I have half of the values recoded 0
and half recoded 1, split around the median point. The median of k1 is 2 which
would lead to unequal groupsize if used 2 as cutoff, so all values k1=2 should
be recoded 1 or 0 randomly until both categories have the same length.
> 
> something like
> 
>
df.rec<-as.data.frame(cbind(snr=c(1,2,3,4,5,6,7,8,9,10),k1=c(0,0,1,0,1,1,0,1,0,1),k2=c(0,1,1,0,0,1,0,1,1,0),result=c(4,3,5,4,2,6,4,4,2,3)))
> 
> Can anyone help?
> 
> Thank you in advance.
> 
> Best wishes.
> Alain  
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Maybe Matching Threads

Search for more seemingly similar threads

R help - May 2013 - recode categorial vars into binary data

[R] recode categorial vars into binary data

[R] recode categorial vars into binary data

[R] recode categorial vars into binary data

Maybe Matching Threads