Hi, I've been modelling some data over the past few days, of my work, repeatedly challenging microbes to a certain concentration of cleaner, until the required concentration to inhibit or kill them increaces, at which point they are challenged to a slightly higher concentration each day. I'm doing ths for two different cleaners and I'm collecting the required concentration to kill them as a percentage, the challenge number, the cleaner as a two level variable, and the lineage theyre in, because I have several different lineages. I'm expecting the values to rise for one cleaner but not the other as they aqquire resistance for one but not the other. Which has happened, but I have wide variation because one linage aqquired a very dramatic change which has made it immune to 50%, whereas the others, have exhibited a much more gradual increace, and so I have very weak p values for the cleaner variable, because it is secondary to the challenge vector, which has the most explanatory power, because without time and these challenges, the selection would no happen. I was using two bacterium species, but one was keen on giving hight erratic results, and insisted on becoming cross contaminated, BUT if I include it's data, It shoves cleaner over the p0.05 threshold, so i may just be having a problem with lack of data. So I've been asking about bootstrapping, which I plan to do to my cases, and thenfit a model to see what the confidence is like then. I assume if I bootstrap then it will re-select whole cases, and not jumble everything up, otherwise a microbe (totake the most extreme value as an example) with 50% concentration tolerance at the beginning, would make no sense at all. I'm also planning on doing models lineage by lineage, rather than putting them into one whole, just to have a look at what happens. But what I really wanted to know from this email, was if there's a package or function for natrual selection simulation I could make use of, to see if I can simulate the experiment. I want to start with a distribution of concentration tolerance values, taken from the inhibitory concentration values from my first lot of microbes, back when term began. Draw 3000 from this. Then values in that draw that fall below the exposure concentration I did in my experiment, are removed, or have a high chance of being removed. Then, from what is left, a draw is made again - or perhaps a copy operation (rather than a random draw) until I have 3000 again, rather than have all exactly the same concentration, then a value can be added to some of them, that increaces their concentration tolerance slightly, but not by a great deal, except in a few individuals, where it may be increaced dramatically(some sort of exponential dstribution perhaps). Then when the distribution of this simulated population of microbes has reached the next concentration (possibly the mean or mode of the distribution) (I have a series of 1 in 2 dilutions, so 100% 50%, 25% and so on), then they move on to the next concentration. I know it's probably quite a heavy thing, it was just a thought that came to me, if anybody has any experience in this area of R or knows of something that allows this to be done, please let me know. Thanks, Ben.
> Date: Wed, 5 Jan 2011 15:48:46 +0000 > From: benjamin.ward at bathspa.org > To: r-help at r-project.org > Subject: [R] Simulation - Natrual Selection > > Hi, > > I've been modelling some data over the past few days, of my work, > repeatedly challenging microbes to a certain concentration of cleaner, > until the required concentration to inhibit or kill them increaces, at > which point they are challenged to a slightly higher concentration each > day. I'm doing ths for two different cleaners and I'm collecting the > required concentration to kill them as a percentage, the challenge > number, the cleaner as a two level variable, and the lineage theyre in, > because I have several different lineages. I'm expecting the values to > rise for one cleaner but not the other as they aqquire resistance for > one but not the other. Which has happened, but I have wide variation > because one linage aqquired a very dramatic change which has made it > immune to 50%, whereas the others, have exhibited a much more gradual > increace, and so I have very weak p values for the cleaner variable, > because it is secondary to the challenge vector, which has the most > explanatory power, because without time and these challenges, the > selection would no happen. I was using two bacterium species, but one > was keen on giving hight erratic results, and insisted on becoming cross > contaminated, BUT if I include it's data, It shoves cleaner over the > p0.05 threshold, so i may just be having a problem with lack of data. So > I've been asking about bootstrapping, which I plan to do to my cases, > and thenfit a model to see what the confidence is like then. I assume if > I bootstrap then it will re-select whole cases, and not jumble > everything up, otherwise a microbe (totake the most extreme value as an > example) with 50% concentration tolerance at the beginning, would make > no sense at all. I'm also planning on doing models lineage by lineage, > rather than putting them into one whole, just to have a look at what > happens. >You can't really have a p-value without a specific hypothesis to test, if you have that then all your other questions are probably easy to answer. Generally you want to sample from things that are "iid" or maybe you want to test the "identical" i. Generally you want to have done a lit search ahead of time and had some idea of likely evolution dynamics of your system given your design and things like your forcing functions etc. Most statisticians would not take seriously a posteriori designs and indeed it can be hard to avoid rationalization and selection bias ( problems that always and only effect people who disagree with me LOL) as being anything other than exploratory or hypothesis generating- you are looking for predictive value. While it is not always worthwhile doing blind tests, it may be something worth considering ( do you know which group gets what thing?)> But what I really wanted to know from this email, was if there's a > package or function for natrual selection simulation I could make use > of, to see if I can simulate the experiment. I want to start with ahttp://www.google.com/#sclient=psy&hl=en&q=%22R+package%22+natural+selection but as implied above, R has lots of analysis stuff and maybe you would find something more useful that is not linked to the keywords you suggest. You may find, for whatever reason, you could write a differential equation to express your results but that isn't often used with "natural selection."> distribution of concentration tolerance values, taken from the> inhibitory concentration values from my first lot of microbes, back when > term began. Draw 3000 from this. Then values in that draw that fall > below the exposure concentration I did in my experiment, are removed, or > have a high chance of being removed. Then, from what is left, a draw is > made again - or perhaps a copy operation (rather than a random draw) > until I have 3000 again, rather than have all exactly the same > concentration, then a value can be added to some of them, that increaces > their concentration tolerance slightly, but not by a great deal, except > in a few individuals, where it may be increaced dramatically(some sort > of exponential dstribution perhaps). Then when the distribution of this > simulated population of microbes has reached the next concentration > (possibly the mean or mode of the distribution) (I have a series of 1 in > 2 dilutions, so 100% 50%, 25% and so on), then they move on to the next > concentration. > > I know it's probably quite a heavy thing, it was just a thought that > came to me, if anybody has any experience in this area of R or knows of > something that allows this to be done, please let me know. > > Thanks, > Ben. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 05/01/2011 17:40, Bert Gunter wrote:>> My hypothesis was specified before I did my experiment. Whilst far from >> perfect, I've tried to do the best I can to assess rise in resistance, >> without going into genetics as it's not possible. (Although may be at the >> next institution I've applied for MSc). >> >> With my hypothesis (I mentioned it below), I was of the frame of mind that a >> nonsignificant p-value on the cleaner variable (for now - experiment is far >> from over), indicated a lack of evidence for rejecting the null. And so at >> the minute, it looks like the type of cleaner makes no difference. > I have no fundamental objection, but be careful. I would simply > qualify your last sentence by saying that it means that the > experimental noise is to great to precisely determine the size of the > cleaner effect. Scientific reality tells us that it is never exactly > 0; what your results show is that your uncertainty about the value of > the difference encompasses both positive and negative values. This > does NOT mean that the difference might not be scientifically large > enough to be of interest -- a confidence interval for the difference > (MUCH better than a P value) would help you determine that. If the > interval is narrow enough that the difference, positive or negative, > is too small to be of scientific interest, then you're done. If the > linterval is large, then it tells you that you need more data, a > better experiment (less noisy) etc. > > -- Bert >At the moment I wouldn't call the confidence interval small, it's definately wide, and at the minute the confidence interval covers zero. My R-squared at the minite is also 0.5, this is mostly due to the few extreme cases of adaptation as I mentioned before, but I'm hesitant to remove it as papers in my literature study which also evolve bacteria show that there is often (sometimes wide) variation in the paths populations take. So whilst mathematically a bit undesirable, and makes me and the model uncertain, it does fall into place with what is known, or has been previously shown of the reality of selection. Again if I include the data from the bacteria dropped from the study, all that "improves", and uncertainty is reduced. It may also be worth me mentioning, I am also taking a more traditional approach (by that I mean a more "Statistics 101" approach, indeed that is all the stats tuition covered in my course as a taught element), incase what I've described above did not work or was not ideal, because we (me and my supervisor) did forsee a model report may contain a lot of uncertainty. Indeed we did expect some populations to adapt and some to not etc. So I've also been collecting data on the width of the zones of inhibition shown by putting disks of cleaner on plates of growth, and measuring the dead zone that results. I can get lots of data from this with only a few plates, and doing this at the start of the study, a few times in the middle, and at the end. Will allow me to do more traditional analysis, for example t.test on the dead zone widths at the end of the study, between cleaner a and b. Or a non parametric equivalent, maybe even a permutation test. The modelling stuff is already beyond what my supervisor expects of me, but I felt it would add value and a lot more insight to the study, allowing more variables to be accounted for, than a more short-sighted traditional "test".