Sbarrato, Thomas
2010-Aug-20 11:38 UTC
[R] Assign statistically relevant groups following multimodal distribution of data
Dear R mailing list members,
?
I am trying to find a way to assign samples into groups depending on their
mathematical distribution in a computational way (that is, looped many times).
My input is by row, a series of 34 samples per row with n rows. The goal would
be to compute the distribution of row n and assign for that row n the samples
into bins that would match the distribution of the given values. Finally, a stat
test yielding a pvalue for the fit would be great. For example, if the
distribution is mono-modal then samples are grouped into 1 group, bi-modal ->
2 groups and so on and so forth. The pvalue will help to?rule out those row for
which the fit was not accurate enough. Please can you guide me to a function or
package that could do the trick?
?
Because I am considering only one row at a time, I don't?think I can really
use the clustering tools available, as they take as input a matrix and cluster
rows together. I have tried to make use of the Shapiro.test() function in order
to rank my rows by pvalue using the normality test, hoping that multimodal
distributions would stand out, without any success. I have also tried to use the
hist() function with the output $counts. I was thinking?of looping on those to
identify?if more than one maximum exists and then splitting the?data around
those maxima. Would that be a ("brute-force") solution? But I'll
still lack a pvalue to check the fit...
?
Thanks in advance for your time.
?
Kind regards.
?
Thomas Sbarrato
PhD Student/Part-Time Researcher
?? Medical Research Council
?? Toxicology Unit
?? University of Leicester
?? Lancaster Road, Leicester
?? LE1 9HN, UK
Tel: +44 (0)116 252 5591Thomas Sbarrato
PhD Student/Part-Time Researcher
Medical Research Council
Toxicology Unit
University of Leicester
Lancaster Road, Leicester
LE1 9HN, UK
Tel: +44 (0)116 252 5591
Email: ts165 at leicester.ac.uk
paxtaos at nottingham.ac.uk
