thr3ads.net - R help - [R] Lapply to create sub categories based on categorical data [Feb 2014]

If this information is useful, please help other people find it:
Share via:

arun

2014-Feb-02 22:40 UTC

[R] Lapply to create sub categories based on categorical data

Hi,
Try:
x <-
c(rep("A",0.1*10000),rep("B",0.2*10000),rep("C",0.65*10000),rep("D",0.05*10000))
set.seed(24)
categorical_data <- sample(x,10000)
set.seed(49)
p_val <- runif(10000,0,1) 

combi <- data.frame(V1=categorical_data,V2=p_val) 
variables <- unique(combi$V1)
?res <- lapply(levels(variables),function(x){
combi$NEWVAR<-(combi$V1==x)*1; combi})


A.K.


I was wondering if you kind folks could answer a question I have. In the sample
data I've provided below, in column 1 I have a categorical
variable labeled A,B,C and D, and in column 2 simulated p-values. 

x <-
c(rep("A",0.1*10000),rep("B",0.2*10000),rep("C",0.65*10000),rep("D",0.05*10000))
categorical_data=as.matrix(sample(x,10000)) 
p_val=as.matrix(runif(10000,0,1)) 
combi=as.data.frame(cbind(categorical_data,p_val)) 

This is simulated data, but my example comes out as 

head(combi) 
? V1 ? ? ? ? ? ? ? ?V2 
1 ?A 0.484525170875713 
2 ?C ?0.48046557046473 
3 ?C 0.228440979029983 
4 ?B 0.216991128632799 
5 ?C 0.521497668232769 
6 ?D 0.358560319757089 

I want to now take one of the categorical variables, let's say 
"C", and create another variable (coded as 1 if it's C or 0 if it 
isn't). 

combi$NEWVAR[combi$V1=="C"] <-1 
combi$NEWVAR[combi$V1!="C" <-0 

? V1 ? ? ? ? ? ? ? ?V2 NEWVAR 
1 ?A 0.484525170875713 0 
2 ?C ?0.48046557046473 1 
3 ?C 0.228440979029983 1 
4 ?B 0.216991128632799 0 
5 ?C 0.521497668232769 1 
6 ?D 0.358560319757089 0 

I'd like to do this for each of the variables in V1, creating a new table
each time, by looping over using lapply:

variables=unique(combi$V1) 

loopeddata=lapply(variables,function(x){ 
combi$NEWVAR[combi$V1==x] <-1 
combi$NEWVAR[combi$V1!=x]<-0 
} 
) 

My output however looks like this: 

[[1]] 
[1] 0 

[[2]] 
[1] 0 

[[3]] 
[1] 0 

[[4]] 
[1] 0 

My desired output would be like the table in the second block of
 code, but when looping over the third column would be A=1, while 
B,C,D=0. Then B=1, A,C,D=0 etc, for each table created. 

Any help would me very much appreciated

R help - Feb 2014 - Lapply to create sub categories based on categorical data

[R] Lapply to create sub categories based on categorical data