Hello:
This seems like an obvious question, but I am having trouble answering it.
I am new to R, so I apologize if its too simple to be posting. I have
searched for solutions to no avail.
I have data that I am trying to set up for further analysis ("training
data"). What I need is 12 groups based on patterns of 4 variables. The
complication comes in when missing data is present. Let me describe with
an example - focusing on just 3 of the 12 groups:
vec=c(1,1,1,1,1,1,NA,NA,1,1,0,0,1,NA,1,1,1,NA,0,0,1,NA,1,0,0,0,0,1,0,0,0,0,NA,NA,NA,NA,1,NA,0,NA,1,NA,1,NA)> a=matrix(vec, ncol=4,nrow=11, byrow=T)
> edit(a)
col1 col2 col3 col4
[1,] 1 1 1 1
[2,] 1 1 NA NA
[3,] 1 1 0 0
[4,] 1 NA 1 1
[5,] 1 NA 0 0
[6,] 1 NA 1 0
[7,] 0 0 0 1
[8,] 0 0 0 0
[9,] NA NA NA NA
[10,] 1 NA 0 NA
[11,] 1 NA 1 NA
Here are 11 individuals. I want the following groups (coded as three
separate binary variables):
Group1 - scored a 1 on col1 and multiple time
Group2 - scored a 1 on col1 but only once
Group3 - did not score a 1 in col1
This seems straightforward, except missingness complicates it. Take
individual 5 - this person should be placed in Groups 1 AND 2 because we
don'tknow the score col2. Same with individual 10, though the response
pattern differs.
I tried using if statements, but am running into the problem that if is not
vecotrized, and I can't seem to make if run with apply. I can use ifelse,
but its very clunky and inefficient to list all possible patterns:
(Note this is not complete of all patterns, its just an example of what
Ivebeen doing)
dd$TEST1=ifelse(is.na(d$C8W1raw),1,
(ifelse(d$C8W1raw==1 & is.na(d$C9W1raw) & is.na(d$C11AW1raw) & is.na
(d$C12AW1rraw),777899,
(ifelse((d$C8W1raw==1 & d$C9W1raw==1)| (d$C8W1raw==1 & d$C11AW1raw==1)
|(d$C8W1raw==1 & d$C12AW1rraw==1),1,
(ifelse(d$C8W1raw==1 & ((is.na(d$C9W1raw) | d$C9W1raw==0) &
(is.na(d$C11AW1raw)
| d$C11AW1raw==0)& (is.na(d$C12AW1rraw)|d$C12AW1rraw==0)),777899,
0)))))))
Any ideas on how to approach this efficiently?
Thanks,
Andrea
[[alternative HTML version deleted]]