Dear R users, I'm trying to randomly recreate a real dataset with missing data and I'm not quite sure if I can use the sample command for this. I think it might be better to do it in 2 steps and randomly replace the sampled data with missing data... So something like this x <- sample(10000:20000, 100) #without replacement Now I want x to contain to 20% missing data (NA). Could anyone help me how to do this? Thanks Joanne -- ======================================= Joanne Demmler Ph.D. Research Assistant School of Medicine Swansea University Singleton Park Swansea SA2 8PP UK tel: +44 (0)1792 295674 fax: +44 (0)1792 513430 email: j.demmler at swansea.ac.uk DECIPHer: www.decipher.uk.net
If you want to average 20% missing values then you could try it in 1 step, viz: sample(c(10000:20000, rep(NA, 2000)),100) Otherwise, 2 steps is preferable. Use code as below: sample(10000:20000,100)->kk kk[sample(1:100,20)]<-NA Paul -- View this message in context: http://www.nabble.com/random-sampling-or-random-replacement-tp24199695p24200736.html Sent from the R help mailing list archive at Nabble.com.
Joanne, ===================[...snip...] x <- sample(10000:20000, 100) #without replacement Now I want x to contain to 20% missing data (NA). Could anyone help me how to do this? ===============See if this helps: n <- length(x) x[sample(n, 0.2*n)] <- NA cheers, -Girish -- View this message in context: http://www.nabble.com/random-sampling-or-random-replacement-tp24199695p24200909.html Sent from the R help mailing list archive at Nabble.com.