eugen pircalabelu
2008-Feb-28 20:02 UTC
[R] question regarding using weights in the hierarchical/ kmeans clustering process
Hi R users! I have a bit of a problem with using an hierarchical clustering algorithm: a<-c(1:15) b<-rep(seq(1:3), 5) c<-rnorm(15, 0,1) d<-c(sample(1:100, 15, replace=T)) e<-c(sample(1:100, 15, replace=T)) f<-c(sample(1:100, 15, replace=T)) data<-data.frame(a,b,c,d,e,f) q<-data.frame(data$d, data$e, data$f) q<-scale(q) What i want to do is to use an hierarchical cluster analysis on q data.frame, but using data$c as a weighting variable, could it be done? or is there a package that would let me use my weights in the clustering process, but an hierarchical process? Another question: say i wanted to t.test data$d, data$e but having again data$c as weights, how could it be done? and the last 2 questions: 1. how can i weight a whole dataframe in order for me to keep my weights for a specific analysis, like cluster or t.test or any other analysis that does not let me incorporate a "weight" option? I am looking for something like in spss where i can weight a whole data frame and use it for a subsequent analysis, or something like the survey package from R but one that offers flexibility to use any analysis that i want (i saw that survey package offers limited connectivity to such analyses ) 2. why does a kmeans cluster analysis offer a multitude of different results? I tried both several times>cclust(scale(q), 3, verbose=T) >kmeans(scale(q), 3)and they both seem vary unstable even with this small data.frame with respect to the cluster sizing, and i don't know why? Does it always behave like this ? Thank you and have a great day!! --------------------------------- [[alternative HTML version deleted]]