Hi, I wrote a script in order to simulate data, which I will use for evaluating missing data and imputation. However, I'm having trouble with the last part of my script, in which a dataframe is constructed without missing values. This is my script: y1 <- rnorm(10,0,3) y2 <- rnorm(10,3,3) y3 <- rnorm(10,3,3) y4 <- rnorm(10,6,3) y <- c(y1,y2,y3,y4) a1 <-rep(1,20) a2 <-rep(2,20) a <- c(a1,a2) b1 <- gl(2,10,20) b2 <- gl(2,10,20) b <- c(b1,b2) x1 <- 1+2*y1+ rnorm(10,0,8) x2 <- 1+2*y2+ rnorm(10,0,8) x3 <- 1+2*y3+ rnorm(10,0,8) x4 <- 1+2*y4+ rnorm(10,0,8) x <- c(x1,x2,x3,x4) #Create missing data dependent on factor A: mar.y <- rep(NA,40) df <- data.frame(y=y, mar.y=mar.y, a=a, b=b, x=x) for (j in 1:40) { # Create missingness at random dependent on A: df$mar.y[which(df$a==1)] <- replicate(length(which(df$a==1)), rbinom(1,1,0.20)) df$mar.y[which(df$a==2)] <- replicate(length(which(df$a==2)), rbinom(1,1,0.10)) } if (length(which(df$mar.y==0))>34) { df <- df[sample(which(df$mar.y==0),34), ] } else { df <- df[c(which(df$mar.y==0), sample(which(df$mar.y==1),34-length(which(df$mar.y==0)))), ] } (I would like the total number of randomly removed values to be 15% of the total sample size, which in this case are 6 values. In other scripts I'm using different values.) At this point, I would like to impute missing values. However, my dataframe only contains the 34 'observed' values (which seemed okay in the beginning of my study). Now, I would like my dataframe to contain 34 observed values (y=0) AND the 6 'missing' or deleted values (y=1). Unfortunately, the missing values are deleted from the data set with 'sample', so imputation is not possible at the moment (i.e., there are no NA's to impute) Does anyone knows how to rewrite the last bit of the script (if...else...-part), in order to keep the 6 'deleted/missing' values in the data set, and give them a value mar.y=1 (or NA, or any other value), together with the 34 'observed ones' (mar.y=0)? In this way, I can impute the missing values in my data set. Thanks in advance, Sarah. -- View this message in context: http://r.789695.n4.nabble.com/Simulating-data-and-imputation-tp3167119p3167119.html Sent from the R help mailing list archive at Nabble.com.