Thomas Groen
2009-Aug-28 10:34 UTC
[R] breaking multi-modal histograms down into combinations of unimodal distributions
Dear All, Does anybody know if there is a functionality in R to break histograms that show a clear bi-modal (or multi-modal) distribution into a series of unimodal histograms that added up result in the original histogram? I was thinking of using QQ-plots (for which tools are available in R), and then observing the number of times the observed quantiles cross the 1:1 line, but this only gives an indication of how many "peaks" the current histogram has. Therefore I was wondering whether other approaches exist. Thanks in advance for any suggestions. p.s. also thanks to those who helped me on my previous question on Modelling different combinations of explanatory variables. The leaps package and the regsubsets command worked really well!
(Ted Harding)
2009-Aug-28 11:12 UTC
[R] breaking multi-modal histograms down into combinations o
On 28-Aug-09 10:34:46, Thomas Groen wrote:> Dear All, > > Does anybody know if there is a functionality in R to break histograms > that show a clear bi-modal (or multi-modal) distribution into a series > of unimodal histograms that added up result in the original histogram? > I was thinking of using QQ-plots (for which tools are available in R), > and then observing the number of times the observed quantiles cross > the 1:1 line, but this only gives an indication of how many "peaks" > the current histogram has. Therefore I was wondering whether other > approaches exist. > > Thanks in advance for any suggestions. > > p.s. also thanks to those who helped me on my previous question on > Modelling different combinations of explanatory variables. The leaps > package and the regsubsets command worked really well!There are a number of points of information which would help us to be more specific about suggestions. 1: Do you have the raw data from which the histogram was constructed? Decomposition of a multimodal sample into constituent unimodal components is best done by adopting a generic distirbution type (e.g. Normal) for each component, and then estimating the paramaters of each component from the data. There is more information (and there better estimation) in the raw data than in the histogram. 2: Do you have a preferred generic distribution type (e.g. Normal) for the component distributions? (If not, and you don't care what distribution you adopt, then what is to stop you drawing arbitary dividing lines between the peaks, and asserting that what lies between two consecutive divisions is one component of the mixture? Then you would end up with a set of disjoint histograms, one for each component, chosen in a somewhat arbitrary way. Since you presumably don't intend that to happen, you presumably have reasons why it should not happen which would amount to a preference for generic distribution type). Once the generic type is chosen, a specific method is indicated. For example, do an R Site Search on "normal mixture" in "Functions" at: finzi.psych.upenn.edu/nmz.html You may want to look at finzi.psych.upenn.edu/R/library/mclust/html/00Index.html ("Model-Based Clustering / Normal Mixture Modeling"). Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 28-Aug-09 Time: 12:12:36 ------------------------------ XFMail ------------------------------