mariasve
2012-May-09 14:34 UTC
[R] Random resampling of columns in species association matrices
I have a host-parasite association matrix in which parasite species are rows and host species columns and cells contain the frequency of interactions. Some parasites are associated with many hosts, and some hosts harbor several parasites, and I want to repeatedly select only one single representative host per "generalized" (multi-host) parasite to create a new matrix in which no hosts are repeated. That is, I want multiple randomly generated symmetric matrices in which a host and a parasite species appear only once. Furthermore, I want to weight the probability of selecting a particular host for a parasite by the frequency of interactions between the two. Finally, a handful of parasites associate with only one single host. I do not want to lose these from the matrix, but rather fix these associations and only randomly select hosts for the generalized parasite species. My goal is to eventually perform generalized least squares regressions between a parasite trait and several host traits, but the first major hurdle for me to get over is how to randomly select only one host per parasite with no repetition of species in the matrix. I am also generally interested in how to resample columns instead of rows (in the package boot, for instance) because of another analysis I'm working on, and I have been unable to find a solution to this when searching the R help site. Any suggestions would be most welcomed. Maria -- View this message in context: http://r.789695.n4.nabble.com/Random-resampling-of-columns-in-species-association-matrices-tp4620618.html Sent from the R help mailing list archive at Nabble.com.
David L Carlson
2012-May-09 16:01 UTC
[R] Random resampling of columns in species association matrices
Sample data would make it possible to explore the options in more detail, but here are two possibilities: 1. Convert each row of the matrix to row proportions and then take the cumulative sum. Now draw a random uniform number between 0 and 1 and find the first column that is larger than the random number. That column is your randomly selected host. If there is one host, the cumulative sums will be zero until you reach that column and then it will flip to 1 so that you will always select that host. 2. For each parasite, create a vector of host names with each host repeated by the number of interactions with that host. Use sample() to randomly draw a host. You'll probably want to combine the vectors into a list to automate the process over all parasites. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of mariasve > Sent: Wednesday, May 09, 2012 9:35 AM > To: r-help at r-project.org > Subject: [R] Random resampling of columns in species association > matrices > > I have a host-parasite association matrix in which parasite species are > rows > and host species columns and cells contain the frequency of > interactions. > Some parasites are associated with many hosts, and some hosts harbor > several > parasites, and I want to repeatedly select only one single > representative > host per "generalized" (multi-host) parasite to create a new matrix in > which > no hosts are repeated. That is, I want multiple randomly generated > symmetric > matrices in which a host and a parasite species appear only once. > Furthermore, I want to weight the probability of selecting a particular > host > for a parasite by the frequency of interactions between the two. > Finally, a > handful of parasites associate with only one single host. I do not want > to > lose these from the matrix, but rather fix these associations and only > randomly select hosts for the generalized parasite species. > > My goal is to eventually perform generalized least squares regressions > between a parasite trait and several host traits, but the first major > hurdle > for me to get over is how to randomly select only one host per parasite > with > no repetition of species in the matrix. I am also generally interested > in > how to resample columns instead of rows (in the package boot, for > instance) > because of another analysis I'm working on, and I have been unable to > find a > solution to this when searching the R help site. > > Any suggestions would be most welcomed. > > Maria > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Random- > resampling-of-columns-in-species-association-matrices-tp4620618.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
mariasve
2012-May-10 13:46 UTC
[R] Random resampling of columns in species association matrices
Hi David, Thank you for your suggestions. I am quite the beginner at R and don?t understand how to actually implement your suggestion and am hoping for some further advice on that, if possible. This is a subset of my data. Rows are host species, and columns parasite species. Three of the parasites are generalists, but P4L is a strict specialist on FORCOL (27 individuals have this parasite). H17L P25L P41L P4L AUTINF 39 0 0 0 GLYSPI 16 2 15 0 FORCOL 1 0 0 27 HYLPOE 3 0 2 0 HYLNAE 1 4 2 0 MYRMYO 2 5 2 0 THAARD 0 8 0 0 This is a list of host trait values for each of the hosts: abundance weight survival AUTINF 488 38 0.48 GLYSPI 827 14.1 0.59 FORCOL 156 44.3 0.55 HYLPOE 322 17.5 0.54 HYLNAE 309 14.5 0.73 MYRMYO 475 20.8 0.59 THAARD 429 18.4 0.67 And this is an estimate of host specificity of the parasites, incorporating prevalence and phylogeny: Specificity H17L 2.08 P25L 1.72 P41L 2.19 P4L 0 I want to determine whether specificity of the parasites relates to any of the host traits. For this, I would like to do a multiple regression. To avoid psedureplication, I want to include a host species only once in the matrix. So, for H17L, I could pick either of the hosts (except THAARD), etc., but once a host is picked for one parasite, it cannot be picked for another. For example, if I pick GLYSPI for H17L, GLYSPI has to be removed as a choice for P25L and P41L. Thus, I also have to randomize which parasite has its host picked first. In all cases, I want to lock FORCOL and P4L, so FORCOL will not be an option for H17L anymore. This last part I?m still uncertain about, I might just randomly pick hosts for all parasites and then risk losing the strict host species specialists from some matrices. If I make 2 random selections I might end up with: Random1 Random2 H17L AUTINF GLYSPI P25L GLYSPI HYLNAE P41L HYLPOE MYRMYO P4L FORCOL FORCOL For the first random table I would then do a multiple regression on the dependent specificity variable and independent host trait values: Specificity abundance weight survival 2.08 488 38 0.48 1.72 827 14.1 0.59 2.19 322 17.5 0.54 0 156 44.3 0.55 If I generate 1000 randomly selected host-parasite combinations, I would have 1000 such tables, on which I would have to run 1000 independent regressions. Since I?m using model selection and multimodel inference to estimate parameter values, I will end up doing the model selection 1000 times. Your second suggestion makes most sense to me, but I don?t understand how to implement it. Would you (or someone else) please give me some advise on that? Also, once I have the 1000 random host-parasite matrices, how do I link these to the tables of actual values (host traits and parasite specificity)? Thanks so much! Maria -- View this message in context: http://r.789695.n4.nabble.com/Random-resampling-of-columns-in-species-association-matrices-tp4620618p4623563.html Sent from the R help mailing list archive at Nabble.com.