Hi R-helpers, I have a dataframe (called data) with trees in rows (n=100) and insect species (n=10) in columns. My tree IDs are in a column called TREE and each species has a column labeled SPEC1, SPEC2, SPEC3, etc... I wish to randomize the values in my dataframe such that row and column totals are held constant, i.e. in my randomized data each tree will have the same number of individual insects as in the real data (constant row totals) and each species will have the same number of individuals as in the real data (constant column totals). I will eventually want to do this many times, but I would appreciate help getting started with the randomization. Thank you, Mark Na [[alternative HTML version deleted]]
On Wed, Jul 8, 2009 at 8:54 AM, Mark Na<mtb954 at gmail.com> wrote:> Hi R-helpers, > > I have a dataframe (called data) with trees in rows (n=100) and insect > species (n=10) in columns. My tree IDs are in a column called TREE and each > species has a column labeled SPEC1, SPEC2, SPEC3, etc... > > I wish to randomize the values in my dataframe such that row and column > totals are held constant, i.e. in my randomized data each tree will have the > same number of individual insects as in the real data (constant row totals) > and each species will have the same number of individuals as in the real > data (constant column totals). > > I will eventually want to do this many times, but I would appreciate help > getting started with the randomization. > > Thank you, Mark Na > > ? ? ? ?[[alternative HTML version deleted]] >Sounds like maybe you're looking for some form of Monte Carlo experiments in R which is on my list of to-do for the next month. I need to do something like rearrange the dates in one database as in Monte Carlo but then rearrange all my other databases so that dates still match up. It's just not bubbled to the top of the list yet. I took a quick look in Google and found MCMCpack pretty quickly. There's some documentation out there which is easy to find if it's of interest. Good luck and I'll be following the thread. cheers, Mark
Here is one approach (there are others, some that are probably better, but this can get you started): 1. rearrange your data so that every insect is a single row with 2 columns: the tree id and the species (this new dataset will have as many rows as the sum of the values in the old dataset). The reshape package may be able to help with this step (you may also need the rep function). 2. randomly permute one of the 2 columns (see ?sample). 3. restructure the permuted data back to the original (the table function may be enough here, the reshape package will give more options). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Mark Na > Sent: Wednesday, July 08, 2009 9:54 AM > To: r-help at r-project.org > Subject: [R] Randomizing a dataframe > > Hi R-helpers, > > I have a dataframe (called data) with trees in rows (n=100) and insect > species (n=10) in columns. My tree IDs are in a column called TREE and > each > species has a column labeled SPEC1, SPEC2, SPEC3, etc... > > I wish to randomize the values in my dataframe such that row and column > totals are held constant, i.e. in my randomized data each tree will > have the > same number of individual insects as in the real data (constant row > totals) > and each species will have the same number of individuals as in the > real > data (constant column totals). > > I will eventually want to do this many times, but I would appreciate > help > getting started with the randomization. > > Thank you, Mark Na > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.