I'm trying to figure out whether there is a simple one or two-pass approach to randomly creating missing values for a set of existing (complete) data. For example, I want to randomly make 10% of the entries in the Iris dataset missing (i.e. NA). I don't want any case to have all missing values and I don't want any case to be missing the classification variable. I can do this in about 3 passes, but I haven't figured out whether there is an efficient way to do this in one or two passes through the data. My approach involves creating a dummy vector with a length equal to the full length of the Iris data (750 elements). >sample(750, 1:10, replace=T). I then assigned all values of 2 to be 0 and all others to be 1. This left me with approximately 10% of the entries as "missing". I reshaped this into a 150 x 5 matrix. From here, things were pretty straightforward. Is there anyway to bypass the dummy vector and operate directly on a copy of the original Iris matrix and get to the point above without the intermediate steps? Thanks. ====================Dr. Marc R. Feldesman Professor and Chairman Anthropology Department Portland State University 1721 SW Broadway Portland, Oregon 97201 email: feldesmanm at pdx.edu phone: 503-725-3081 fax: 503-725-3905 http://web.pdx.edu/~h1mf PGP Key Available On Request ===================== "Beyond every credibility gap lies a gullibility fill" Powered by Latochoerus and Windows 2000, SP1 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._