Here's an example of how I think you can do what you want. Play with
the definition of the function highest.use() to get random selection of
multiple maxima.
> drug.names <- c("marijuana", "crack",
"cocaine", "heroin")
> drugs <- factor(drug.names, levels=drug.names)
> drugs
[1] marijuana crack cocaine heroin
Levels: marijuana crack cocaine heroin
> as.numeric(drugs)
[1] 1 2 3 4
> N <- 20
> set.seed(1)
> primary.drug <- sample(drugs, N, rep=T)
> primary.drug[sample(1:20, 10)] <- NA
> primary.drug
[1] <NA> crack <NA> <NA> <NA>
<NA> heroin
[8] cocaine cocaine marijuana <NA> <NA> cocaine
crack
[15] heroin <NA> cocaine heroin <NA> <NA>
Levels: marijuana crack cocaine heroin
> # usage frequencies
> marijuana <- sample(1:3, N, rep=T)
> crack <- sample(1:3, N, rep=T)
> cocaine <- sample(1:3, N, rep=T)
> heroin <- sample(1:3, N, rep=T)
> cbind(marijuana, crack, cocaine, heroin)
marijuana crack cocaine heroin
[1,] 2 2 2 1
[2,] 2 3 3 1
[3,] 2 2 2 2
[4,] 1 1 2 3
[5,] 3 1 2 3
[6,] 3 1 3 3
[7,] 3 1 3 2
[8,] 1 2 2 2
[9,] 3 2 3 3
[10,] 2 2 3 2
[11,] 3 3 2 2
[12,] 2 1 3 2
[13,] 3 2 2 1
[14,] 2 1 1 3
[15,] 2 2 3 2
[16,] 3 1 1 1
[17,] 1 2 3 1
[18,] 2 3 1 2
[19,] 3 1 1 3
[20,] 3 3 1 2
> highest.use <- function(x) {y <- which(x==max(x, na.rm=T)); if
(length(y)==1) return(y) else return(NA)}
> apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use)
[1] NA NA NA 4 NA NA NA NA NA 3 NA 3 1 4 3 1 3 2 NA NA
> impute.primary.drug <- drugs[ifelse(is.na(primary.drug),
apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use),
as.numeric(primary.drug))]
> data.frame(primary.drug, impute.primary.drug)
primary.drug impute.primary.drug
1 <NA> <NA>
2 crack crack
3 <NA> <NA>
4 <NA> heroin
5 <NA> <NA>
6 <NA> <NA>
7 heroin heroin
8 cocaine cocaine
9 cocaine cocaine
10 marijuana marijuana
11 <NA> <NA>
12 <NA> cocaine
13 cocaine cocaine
14 crack crack
15 heroin heroin
16 <NA> marijuana
17 cocaine cocaine
18 heroin heroin
19 <NA> <NA>
20 <NA> <NA>
>
Brian Perron wrote:> Hello R users,
>
> I am relatively new to R and cannot seem to crack a coding problem. I
> am working with substance abuse data, and I have a variable called
> "primary.drug" which is considered the drug of choice for each
> subject. I have just a few missing values on that variable. Instead
> of using a multiple imputation method like chained equations, I would
> prefer to derive these values from other survey responses.
> Specifically, I have a frequency of use (in days) for each of the major
> drugs, so I would like the missing values to be replaced by that drug
> with the highest level of use. I am starting with the "ifelse"
and
> "max" statements, but I know it is wrong:
>
> impute.primary.drug <- ifelse(is.na(primary.drug), max(marijuana,
> crack, cocaine, heroin), primary.drug)
>
> Here are the problems. First, the max statement (should it be
"pmax"?),
> returns the highest numeric quantity rather than the variable itself.
> In other words, I want to test which drug has the highest value, but
> return the variable name rather than the observed value. Second, if
> ties are observed, how can I specify the value to be NA? Or, how can I
> specify one of the values to be randomly selected?
>
> Thank in advance for your assistance.
>
> Regards,
> Brian
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>