Stefan Habermehl
2008-Jan-14 09:07 UTC
[R] clusterwise regression from fpc (fixed point clustering) package
hi there, whenever i try the clusterwise regression from the fpc package, there
occurs the following problem:
the first cluster is always designed in a way, that when i run a normal linear
regression on the independent variables to describe the dependent variable (only
on those respondents from the first cluster) - then the regression uses only one
independent variable that describes the whole variance of the dep variable.
this seems interesting, but doesnt make sense. (it seems as if he is searching
for a variable, that has identical values as the dependent var and gives this
variable the whole power of explanation.
some words to the clusterwise regression:
it seems to me as if it is a very interesting method that combines clustering
and regression in that way, that it clusters the respondents in n clusters, in a
way, that their n regressions (one individual for every cluster) have a high
explanation power.
my data looks that way: (just an example - it wouldnt make sense to post the
whole table)
dep__i1__i2__i3__i4...
3____2__1___5__2
2____2__3___1__1
5____5__3___4__41____1__2___1__2
in other words: 1 dep var (1-5) and a couple of indep vars (1-5) ... so pretty
standart (and about 4000 respondents)
the result of the clusterwise regression seems to be the following (when i use
it):
for the first cluster he just looks if there is a ind var that is identiacal
with the dep var (among a lot of respondents)
in this case he would choose i1 since it is approx. identical to the dep var
(expect of the first respondent)
so in the result he gives me this clustering:
dep__i1__i2__i3__i4__cl
3____2__1___5__2___2
2____2__3___1__1___1
5____5__3___4__4___11____1__2___1__2___1
when i run a linear reg for cluster one - explaining dep on i1-i4 he uses almost
only i1 for explaining dep with 100% explained variance but often the model is
insignificant and it is not a kind of cluster i am looking for)
this is my code:
library(fpc)unabh <- read.table("G:/Data_Files/SPSS
Files/unabh.csv", sep=";", header=TRUE)abh <-
read.table("G:/Data_Files/SPSS Files/abh.csv", sep=";",
header=TRUE)m <- as.matrix(unabh)attach(abh)rmt1 <- regmix(m, VVV, ir=1,
nclus=1:2,icrit=1.e-5, minsig=1.e-6, warning=TRUE)write.table(rmt1$g)
this is just an example i have tried other nclusters and without the
warning=TRUE and without any instruction beyond the variables but when trying
more clusters he never stops calculating (maybe there are too many respondents
r.b. 4000 ???)
my objective:
i want a couple of meaningful clusters with meaningful regressions
does anybody know what to change?
thanks a lot for any kind of recommendation
stafan
_________________________________________________________________
[[alternative HTML version deleted]]
