Anthony Staines
2011-Nov-19 23:31 UTC
[R] Advice on recoding a variable depending on another which contains NAs
Dear colleagues, I would be very grateful for your help with the following. I have banged my head off this question several times in the past, and repeatedly over the last week. I have looked in the usual places and found no obvious solution. I fear that this just means I didn't recognize it, but I'd be very grateful for your help. I am scoring 8000 psychometric tests - the SCQ, if you have heard of it. On this test the scoring rules depends on one variable SCQ1 - if this is answered yes, the final score is a function of 39 variables, and if no, of 31 variables. I've calculated both of these scores (SCQScore1 and SCQScore2)for all the children in my study, and I wish to create a final score, which is SCQScore1 when SCQ1 is 1, and SCQScore2 when SCQ1 is 2. There are also missing values for SCQ1, and I have chosen, for the moment, to set the final score to SCQScore1 for these. [[This is a debatable choice, but I am not asking your advice on that choice!]] d$SCQScore <- 99 ##Distinct value for any other values I've missed d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1] ## Talks using phrases/sentences, so sum S2CQ:SCQ40 d$SCQScore[SCQ1 == 2] <- d$SCQScore2[SCQ1 == 2] ## Can't do this, so sum SCQ8:SCQ40 d$SCQScore[is.na(d$SCQ1)] <- d$SCQScore1 [is.na(d$SCQ1)] ## SCQ1 is missing This fails on line 2 (d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1]) with the error message "NAs are not allowed in subscripted assignments", presumably because SCQ1 does indeed contain missing values. This can be fixed, got around, or otherwise bypassed, by creating a new variable SCQ1, with no missing values, as shown :- SCQ1 <- d$SCQ1 SCQ1[is.na(SCQ1)] <- 3 d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1] ## Talks using phrases/sentences so sum S2CQ:SCQ40 d$SCQScore[SCQ1 == 2] <- d$SCQScore2[SCQ1 == 2] ## Can't do this, so sum SCQ8:SCQ40 d$SCQScore[SCQ1 == 3] <- d$SCQScore1[SCQ1 == 3] ## We don't know if he/she can talk, so guess - sum S2:S40 This type of thing is a common problem in my little world. Is there a better/less klutzy/smarter way of solving it than creating a new variable each time? Please bear in mind that it is critical, for later analysis, to keep the missing values in SCQ1. Best wishes, Anthony Staines -- Anthony Staines, Professor of Health Systems, School of Nursing and Human Sciences, DCU, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 http://astaines.eu/
David Winsemius
2011-Nov-19 23:55 UTC
[R] Advice on recoding a variable depending on another which contains NAs
On Nov 19, 2011, at 6:31 PM, Anthony Staines wrote:> Dear colleagues, > > I would be very grateful for your help with the following. I have > banged my head off this question several times in the past, and > repeatedly over the last week. I have looked in the usual places and > found no obvious solution. I fear that this just means I didn't > recognize it, but I'd be very grateful for your help. > > I am scoring 8000 psychometric tests - the SCQ, if you have heard of > it. On this test the scoring rules depends on one variable SCQ1 - if > this is answered yes, the final score is a function of 39 variables, > and if no, of 31 variables. > > I've calculated both of these scores (SCQScore1 and SCQScore2)for > all the children in my study, and I wish to create a final score, > which is SCQScore1 when SCQ1 is 1, and SCQScore2 when SCQ1 is 2. > There are also missing values for SCQ1, and I have chosen, for the > moment, to set the final score to SCQScore1 for these. [[This is a > debatable choice, but I am not asking your advice on that choice!]]This would seem to be an obvious task for ifelse() SCQScore <- NA d$SCQScore <- ifelse( SCQ1 == 1, d$SCQScore1, d$SCOScore2) (And don't use 99 for missing. Use NA. It will protect you better than "99".) I suppose you could enforce the two level testing with: d$SCQScore <- ifelse( SCQ1 == 1, d$SCQScore1, ifelse(SCQ1 ==2, d$SCOScore2, NA))> > d$SCQScore <- 99 > ##Distinct value for any other values I've missed > > d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1] > ## Talks using phrases/sentences, so sum S2CQ:SCQ40 > > d$SCQScore[SCQ1 == 2] <- d$SCQScore2[SCQ1 == 2] > ## Can't do this, so sum SCQ8:SCQ40 > > d$SCQScore[is.na(d$SCQ1)] <- d$SCQScore1 [is.na(d$SCQ1)] > ## SCQ1 is missing > > This fails on line 2 > (d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1]) > with the error message > "NAs are not allowed in subscripted assignments", > presumably because SCQ1 does indeed contain missing values. > > This can be fixed, got around, or otherwise bypassed, by creating a > new variable SCQ1, with no missing values, as shown :- > > SCQ1 <- d$SCQ1 > SCQ1[is.na(SCQ1)] <- 3 > > d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1] > ## Talks using phrases/sentences so sum S2CQ:SCQ40 > d$SCQScore[SCQ1 == 2] <- d$SCQScore2[SCQ1 == 2] > ## Can't do this, so sum SCQ8:SCQ40 > d$SCQScore[SCQ1 == 3] <- d$SCQScore1[SCQ1 == 3] > ## We don't know if he/she can talk, so guess - sum S2:S40 > > This type of thing is a common problem in my little world. Is there > a better/less klutzy/smarter way of solving it than creating a new > variable each time? Please bear in mind that it is critical, for > later analysis, to keep the missing values in SCQ1. > > Best wishes, > Anthony Staines > -- > Anthony Staines, Professor of Health Systems, > School of Nursing and Human Sciences, DCU, Dublin 9,Ireland. > Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 > http://astaines.eu/ > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT