Hi, I just get a question (sorry if it is a dumb one) and I "phase" my question in the following R codes: group1<-rnorm(n=50, mean=0, sd=1) group2<-rnorm(n=20, mean=1, sd=1.5) group3<-c(group1,group2) Now, if I am given a dataset from group3, what method (discriminant analysis, clustering, maybe) is the best to cluster them by using R. The known info includes: 2 clusters, normal distribution (but the parameters are unknown). Thanks, Ed
The cluster analysis should be able to handle that. I think if you know how many clusters you have, "kmeans" is ok, or the EM algorithm can also do that. On Thu, Jan 27, 2005 at 03:44:42PM -0500, WeiWei Shi wrote:> Hi, > I just get a question (sorry if it is a dumb one) and I "phase" my > question in the following R codes: > > group1<-rnorm(n=50, mean=0, sd=1) > group2<-rnorm(n=20, mean=1, sd=1.5) > group3<-c(group1,group2) > > > Now, if I am given a dataset from group3, what method (discriminant > analysis, clustering, maybe) is the best to cluster them by using R. > The known info includes: 2 clusters, normal distribution (but the > parameters are unknown). > > Thanks, > > Ed > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi, thanks for reply. In fact, I tried both of them and I also tried the other method and I found all of them gave me different boundaries (to my real datasets). I am thinking about k-median but hoping to get more suggestions from all of you in this forum. Cheers, Ed On Thu, 27 Jan 2005 15:37:16 -0600, msck9 at mizzou.edu <msck9 at mizzou.edu> wrote:> The cluster analysis should be able to handle that. I think if you > know how many clusters you have, "kmeans" is ok, or the EM algorithm > can also do that. > On Thu, Jan 27, 2005 at 03:44:42PM -0500, WeiWei Shi wrote: > > Hi, > > I just get a question (sorry if it is a dumb one) and I "phase" my > > question in the following R codes: > > > > group1<-rnorm(n=50, mean=0, sd=1) > > group2<-rnorm(n=20, mean=1, sd=1.5) > > group3<-c(group1,group2) > > > > > > Now, if I am given a dataset from group3, what method (discriminant > > analysis, clustering, maybe) is the best to cluster them by using R. > > The known info includes: 2 clusters, normal distribution (but the > > parameters are unknown). > > > > Thanks, > > > > Ed > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
It depends a lot on what you know or don't know about the data, and what problem you're trying to solve. If you know for sure it's a mixture of gaussians, likelihood based approaches might be better. MASS (the book) has an example of fitting univariate mixture of gaussians using various optimizers. The code is even in $R_HOME/library/MASS/scripts/ch16.R. Andy> From: WeiWei Shi > > Hi, > thanks for reply. In fact, I tried both of them and I also tried the > other method and I found all of them gave me different boundaries (to > my real datasets). I am thinking about k-median but hoping to get more > suggestions from all of you in this forum. > > Cheers, > > Ed > > > On Thu, 27 Jan 2005 15:37:16 -0600, msck9 at mizzou.edu > <msck9 at mizzou.edu> wrote: > > The cluster analysis should be able to handle that. I think if you > > know how many clusters you have, "kmeans" is ok, or the EM algorithm > > can also do that. > > On Thu, Jan 27, 2005 at 03:44:42PM -0500, WeiWei Shi wrote: > > > Hi, > > > I just get a question (sorry if it is a dumb one) and I "phase" my > > > question in the following R codes: > > > > > > group1<-rnorm(n=50, mean=0, sd=1) > > > group2<-rnorm(n=20, mean=1, sd=1.5) > > > group3<-c(group1,group2) > > > > > > > > > Now, if I am given a dataset from group3, what method > (discriminant > > > analysis, clustering, maybe) is the best to cluster them > by using R. > > > The known info includes: 2 clusters, normal distribution (but the > > > parameters are unknown). > > > > > > Thanks, > > > > > > Ed > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Reasonably Related Threads
- missing level of a nested factor results in an NA in lm output
- More flexible aggregate / eval
- lme4 extracting individual variance components
- how to explain the interaction terms regarding "treatment contrast" of lm model
- How to apply a function to subsets of a data frame *and* obtain a data frame again?