Hi I have two questions, the first perhaps dumber than the second. Firstly, I have a data set, and when I plot a histogram it looks like a normal distribution. So I want to overlay a bell-shaped normal distribution on top of it, to demonstrate how similar it is to the normal distribution. I have read the help on dnorm(), rnorm(), pnorm() etc but still can't figure out how to plot a normal distribution. Any code would be appreciated.... Secondly, and perhaps more difficult, is a second data set. This, when plotted as a histogram, has two clear peaks, perhaps even three, all of which look as though they are normally distributed. So the theory is that my data set is actually made up of two, possibly three, underlying sub-sets of data which are normally distributed, but with different means and standard deviations. So 1) how do I test for this? And 2) how can I estimate the parameters (mean and SD) for the underlying distributions? Thanks in advance for your help Mick
Hi Mick, regarding your first question try the following, #if `x' is your data vector, then y <- seq(min(x), max(x), length=200) hist(x, prob=TRUE) lines(y, dnorm(y, mean(x), sd(x))) regarding your second question, you'd probably want to fit a mixture model -> look at package `mclust'. help(package="mclust") I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/396887 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm ----- Original Message ----- From: "michael watson (IAH-C)" <michael.watson at bbsrc.ac.uk> To: <r-help at stat.math.ethz.ch> Sent: Monday, October 04, 2004 12:04 PM Subject: [R] Help with normal distributions> Hi > > I have two questions, the first perhaps dumber than the second. > > Firstly, I have a data set, and when I plot a histogram it looks > like a > normal distribution. So I want to overlay a bell-shaped normal > distribution on top of it, to demonstrate how similar it is to the > normal distribution. I have read the help on dnorm(), rnorm(), > pnorm() > etc but still can't figure out how to plot a normal distribution. > Any > code would be appreciated.... > > Secondly, and perhaps more difficult, is a second data set. This, > when > plotted as a histogram, has two clear peaks, perhaps even three, all > of > which look as though they are normally distributed. So the theory > is > that my data set is actually made up of two, possibly three, > underlying > sub-sets of data which are normally distributed, but with different > means and standard deviations. So 1) how do I test for this? And 2) > how > can I estimate the parameters (mean and SD) for the underlying > distributions? > > Thanks in advance for your help > > Mick > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Hi Michael,> Secondly, and perhaps more difficult, is a second data set. This, when > plotted as a histogram, has two clear peaks, perhaps even three, all of > which look as though they are normally distributed. So the theory is > that my data set is actually made up of two, possibly three, underlying > sub-sets of data which are normally distributed, but with different > means and standard deviations. So 1) how do I test for this? And 2) how > can I estimate the parameters (mean and SD) for the underlying > distributions?The answer to 2, as pointed out already, is to use EMclust in package mclust. Testing for the presence of a mixture is difficult from a theoretical point of view, and as far as I know, nothing is already implemented in R. What you can do is: a) Let EMclust estimate the number of mixture components by BIC (it can also decide for only one component). b) Use a standard normality test such as shapiro.test to exclude homogeneous normality. This tells you that you have to fit something more complex than a single normal, but it does not tell you what. Christian> > Thanks in advance for your help > > Mick > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >*********************************************************************** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/ ####################################################################### ich empfehle www.boag-online.de
On Mon, 4 Oct 2004, Christian Hennig wrote:> Hi Michael, > >> Secondly, and perhaps more difficult, is a second data set. This, when >> plotted as a histogram, has two clear peaks, perhaps even three, all of >> which look as though they are normally distributed. So the theory is >> that my data set is actually made up of two, possibly three, underlying >> sub-sets of data which are normally distributed, but with different >> means and standard deviations. So 1) how do I test for this? And 2) how >> can I estimate the parameters (mean and SD) for the underlying >> distributions? > > The answer to 2, as pointed out already, is to use EMclust in package > mclust. > Testing for the presence of a mixture is difficult from a theoretical > point of view, and as far as I know, nothing is already implemented in R. >For testing for a mixture of two random variables there is the dip test of Hartigan---see diptest on CRAN David Scott _________________________________________________________________ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: d.scott at auckland.ac.nz Graduate Officer, Department of Statistics