Sbarrato, Thomas
2010-Aug-20 11:38 UTC
[R] Assign statistically relevant groups following multimodal distribution of data
Dear R mailing list members, ? I am trying to find a way to assign samples into groups depending on their mathematical distribution in a computational way (that is, looped many times). My input is by row, a series of 34 samples per row with n rows. The goal would be to compute the distribution of row n and assign for that row n the samples into bins that would match the distribution of the given values. Finally, a stat test yielding a pvalue for the fit would be great. For example, if the distribution is mono-modal then samples are grouped into 1 group, bi-modal -> 2 groups and so on and so forth. The pvalue will help to?rule out those row for which the fit was not accurate enough. Please can you guide me to a function or package that could do the trick? ? Because I am considering only one row at a time, I don't?think I can really use the clustering tools available, as they take as input a matrix and cluster rows together. I have tried to make use of the Shapiro.test() function in order to rank my rows by pvalue using the normality test, hoping that multimodal distributions would stand out, without any success. I have also tried to use the hist() function with the output $counts. I was thinking?of looping on those to identify?if more than one maximum exists and then splitting the?data around those maxima. Would that be a ("brute-force") solution? But I'll still lack a pvalue to check the fit... ? Thanks in advance for your time. ? Kind regards. ? Thomas Sbarrato PhD Student/Part-Time Researcher ?? Medical Research Council ?? Toxicology Unit ?? University of Leicester ?? Lancaster Road, Leicester ?? LE1 9HN, UK Tel: +44 (0)116 252 5591Thomas Sbarrato PhD Student/Part-Time Researcher Medical Research Council Toxicology Unit University of Leicester Lancaster Road, Leicester LE1 9HN, UK Tel: +44 (0)116 252 5591 Email: ts165 at leicester.ac.uk paxtaos at nottingham.ac.uk