Robert A. LaBudde
2008-Mar-08 16:48 UTC
[R] How to do multi-factor stratified sampling in R
Given a set of data with a number of variables plus a response, I'd like to obtain a randomized subset of the rows such that the marginal proportions of each variable are maintained closely in the subset to that of the dataset, and possibly maintaining as well the two-factor interaction marginal proportions as well for some pairs. This must be a common problem in data mining, but I don't seem to be able to locate the proper library or function for doing this in R. Thanks for any help. ===============================================================Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
"Robert A. LaBudde" <ral at lcfltd.com> wrote in news:0JXF00LSO864ATE0 at vms040.mailsrvcs.net:> Given a set of data with a number of variables plus a response, I'd > like to obtain a randomized subset of the rows such that the > marginal proportions of each variable are maintained closely in the > subset to that of the dataset, and possibly maintaining as well the > two-factor interaction marginal proportions as well for some pairs. > > This must be a common problem in data mining, but I don't seem to be > able to locate the proper library or function for doing this in R. > > Thanks for any help.Have you looked at the "sampling" package? I have never used it, but the strata() function appears to be capable. -- David Winsemius