Delphine COURVOISIER
2008-Nov-11 14:17 UTC
[R] simulate data with binary outcome and correlated predictors
Hi, I would like to simulate data with a binary outcome and a set of predictors that are correlated. I want to be able to fix the number of event (Y=1) vs. non-event (Y=0). Thus, I fix this and then simulate the predictors. I have 2 questions: 1. When the predictors are continuous, I can use mvrnorm(). However, if I have continuous, ordinal and binary predictors, I'm not sure how to simulate accurately the relationships between predictors. 2. To specify the coefficients of the regression of Y on predictors, I must specify separately the predictors for Y=1 and Y=0, I can vary the mean and the variance/covariances of the predictors. However, with this approach, it is harder to determine precisely the coefficients of the predictors. Any help on how to be as precise as possible on the betas would be nice. Here is the code I wrote where all predictors are continuous with variance =1 and correlations between predictors vary for each condition of Y. _________ library(MASS) N<-1000 nbX<-3 propSick<-0.2 corrSick<-.8 corrHealthy<-.9 sigma0<-matrix(corrHealthy,nbX,nbX) diag(sigma0)<-1 sigma1<-matrix(corrSick,nbX,nbX) diag(sigma1)<-1 dataHealthy<-mvrnorm(N*(1-propSick),c(0,0,0),sigma0) dataSick<-mvrnorm(N*propSick,c(1,1,1),sigma1) dataS<-as.data.frame(matrix(0,ncol=4,nrow=N)) dimnames(dataS)[[2]]<-c("IV1","IV2","IV3","DV") dataS$DV[1:(N*propSick)]<-1 dataS$DV<-factor(dataS$DV) dataS[1:(N*propSick),1:3]<-dataSick dataS[(N*propSick+1):N,1:3]<-dataHealthy _____________ thanks in advance for any suggestions, ************************************ Delphine Courvoisier Clinical Epidemiology Division University of Geneva Hospital +4122 37 29029
Greg Snow
2008-Nov-11 17:50 UTC
[R] simulate data with binary outcome and correlated predictors
You could generate all the data as continuous using the idea below, then use the cut function to change some of the normal continuous variables into binary or ordinal variables. It is less clear exactly what the correlation means in this case, but the variables would still have a relationship. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Delphine COURVOISIER > Sent: Tuesday, November 11, 2008 7:18 AM > To: r-help at r-project.org > Subject: [R] simulate data with binary outcome and correlated > predictors > > Hi, > > I would like to simulate data with a binary outcome and a set of > predictors that are correlated. I want to be able to fix the number of > event (Y=1) vs. non-event (Y=0). Thus, I fix this and then simulate the > predictors. I have 2 questions: > 1. When the predictors are continuous, I can use mvrnorm(). However, if > I have continuous, ordinal and binary predictors, I'm not sure how to > simulate accurately the relationships between predictors. > 2. To specify the coefficients of the regression of Y on predictors, I > must specify separately the predictors for Y=1 and Y=0, I can vary the > mean and the variance/covariances of the predictors. However, with this > approach, it is harder to determine precisely the coefficients of the > predictors. Any help on how to be as precise as possible on the betas > would be nice. > > Here is the code I wrote where all predictors are continuous with > variance =1 and correlations between predictors vary for each condition > of Y. > _________ > library(MASS) > N<-1000 > > nbX<-3 > propSick<-0.2 > corrSick<-.8 > corrHealthy<-.9 > > sigma0<-matrix(corrHealthy,nbX,nbX) > diag(sigma0)<-1 > sigma1<-matrix(corrSick,nbX,nbX) > diag(sigma1)<-1 > dataHealthy<-mvrnorm(N*(1-propSick),c(0,0,0),sigma0) > dataSick<-mvrnorm(N*propSick,c(1,1,1),sigma1) > > dataS<-as.data.frame(matrix(0,ncol=4,nrow=N)) > dimnames(dataS)[[2]]<-c("IV1","IV2","IV3","DV") > dataS$DV[1:(N*propSick)]<-1 > dataS$DV<-factor(dataS$DV) > dataS[1:(N*propSick),1:3]<-dataSick > dataS[(N*propSick+1):N,1:3]<-dataHealthy > _____________ > > thanks in advance for any suggestions, > > > > > ************************************ > Delphine Courvoisier > Clinical Epidemiology Division > University of Geneva Hospital > +4122 37 29029 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.