Stefan Habermehl
2008-Jan-14 09:07 UTC
[R] clusterwise regression from fpc (fixed point clustering) package
hi there, whenever i try the clusterwise regression from the fpc package, there occurs the following problem: the first cluster is always designed in a way, that when i run a normal linear regression on the independent variables to describe the dependent variable (only on those respondents from the first cluster) - then the regression uses only one independent variable that describes the whole variance of the dep variable. this seems interesting, but doesnt make sense. (it seems as if he is searching for a variable, that has identical values as the dependent var and gives this variable the whole power of explanation. some words to the clusterwise regression: it seems to me as if it is a very interesting method that combines clustering and regression in that way, that it clusters the respondents in n clusters, in a way, that their n regressions (one individual for every cluster) have a high explanation power. my data looks that way: (just an example - it wouldnt make sense to post the whole table) dep__i1__i2__i3__i4... 3____2__1___5__2 2____2__3___1__1 5____5__3___4__41____1__2___1__2 in other words: 1 dep var (1-5) and a couple of indep vars (1-5) ... so pretty standart (and about 4000 respondents) the result of the clusterwise regression seems to be the following (when i use it): for the first cluster he just looks if there is a ind var that is identiacal with the dep var (among a lot of respondents) in this case he would choose i1 since it is approx. identical to the dep var (expect of the first respondent) so in the result he gives me this clustering: dep__i1__i2__i3__i4__cl 3____2__1___5__2___2 2____2__3___1__1___1 5____5__3___4__4___11____1__2___1__2___1 when i run a linear reg for cluster one - explaining dep on i1-i4 he uses almost only i1 for explaining dep with 100% explained variance but often the model is insignificant and it is not a kind of cluster i am looking for) this is my code: library(fpc)unabh <- read.table("G:/Data_Files/SPSS Files/unabh.csv", sep=";", header=TRUE)abh <- read.table("G:/Data_Files/SPSS Files/abh.csv", sep=";", header=TRUE)m <- as.matrix(unabh)attach(abh)rmt1 <- regmix(m, VVV, ir=1, nclus=1:2,icrit=1.e-5, minsig=1.e-6, warning=TRUE)write.table(rmt1$g) this is just an example i have tried other nclusters and without the warning=TRUE and without any instruction beyond the variables but when trying more clusters he never stops calculating (maybe there are too many respondents r.b. 4000 ???) my objective: i want a couple of meaningful clusters with meaningful regressions does anybody know what to change? thanks a lot for any kind of recommendation stafan _________________________________________________________________ [[alternative HTML version deleted]]