eewwaaww at interia.pl
2007-Jul-05 13:54 UTC
[R] model-based question-better readable version
It is going to be easy question to you. I've started to interest in
model-based clustering.
Adrian E. Raftery "Recent Advances in Model-Based Clustering: Image
Segmentation and Variable Selection"
(www.stat.washington.edu/Raftery)showed that we can compare different
classification methods using BIC
statistic. For diabetes dataset the best model is VVV model with 3 classes- for
this model the BIC curve reaches the highest value and the error rate=12%
BIC curve for EII model~k-means is much under the VVV model curve and the error
rate equals 18%, so k-means (EII) is worse then VVV, what is clear for me.
I would like to apply model-based to economic data set (GDP, life expectancy
data of UE countries), because I am PhD student of University of Economics in
Poland.
Using this data (standardized) I get the best model EEV (2 classes), EII
(k-means) curve is under EEV curve what suggests that k-means is worse then EEV,
but class error for EII equals 0 and for EEV= 6% (and more for another
variables), why?
Even applying iris data we get lower class error for EII model (10%) than for
VEV (33%) for 2 classes, in spite of that VEV model and others models curves
are above EII model at the BIC plot.
For this data BIC chooses VEV for 2 clusters while the right number of classes,
given in
column "Species"
My second question is: when model-based clustering (for which data sets, are
there any special type of data) is better than k-means (kmeans), hierarchical
clustering(hclust)?
I am looking forward to hearing from you.
Best regards,
Ewa
-----------------------------------------
Rozdajemy bilety na koncert
wi?cej na >> http://link.interia.pl/f1ae9
