Thorsten Biegner
2011-Jan-02 01:28 UTC
[R] Clusteranalysis Chi-square test and SingleLinkage
Hi The short version of my questions is this: How can I run a chi-square test over a matrix (table) to get the distanaces between rows and then run a SingleLinkage (or other fusion algorithm over the resulting table? ------------ The long-version of my question: My data consists of different data of different countries so I have stuff like how many people can read, write in X,Y,Z countries and then percentages for each country. And I want to find out which countries might be similar by doing a cluster analysis. So first I want to take the data which would look something like this: Plastikbecher Kartonbox Papier Rama 24 65 12 Homa 83 30 21 Flora 75 28 22 SB 35 55 21 Holl. Butter 20 40 75 And then run a chi-square test over it (I think that makes the most sense or does anybody think something different)? So for that I will put each row with every other row in a single different matrix (mat1) and use the use the chisq.test. So mat 1 would for example looks like this: Plastikbecher Kartonbox Papier Rama 24 65 12 Flora 75 28 22 And then I would run matResult[1,3] <- sqrt(chisq.test(mat1)[[1]]) So in the end I would get a matrix like this: Rama Homa Flora SB HollButter Rama 0.000 6.642 6.470 2.209 6.931 Homa 6.642 0.000 0.430 4.994 8.387 Flora 6.470 0.430 0.000 4.754 7.941 SB 2.209 4.994 4.754 0.000 5.901 HollButter 6.931 8.387 7.941 5.901 0.000 So here is my question: How can I run a single linkage algorithm over this matrix? I thought a good stating point might be "hclust" hclust(d, method = "complete", members=NULL) But the R reference says d must be "a dissimilarity structure as produced by dist." But the dist function does not have a method chisquared-test or something similar. So does anybody have an idea how I can do a clusteranalysis with a chi-squared test and then use a fusion algorithm to join the clusters? Thanks Thorsten [[alternative HTML version deleted]]
On 02.01.2011 02:28, Thorsten Biegner wrote:> Hi > > The short version of my questions is this: > > How can I run a chi-square test over a matrix (table) to get the distanaces > between rows and then run a SingleLinkage (or other fusion algorithm over > the resulting table? > > ------------ > > The long-version of my question: > > My data consists of different data of different countries so I have stuff > like how many people can read, write in X,Y,Z countries and then percentages > for each country. And I want to find out which countries might be similar by > doing a cluster analysis. > > So first I want to take the data which would look something like this: > > Plastikbecher Kartonbox Papier > Rama 24 65 12 > Homa 83 30 21 > Flora 75 28 22 > SB 35 55 21 > Holl. Butter 20 40 75 > > And then run a chi-square test over it (I think that makes the most sense or > does anybody think something different)? > > So for that I will put each row with every other row in a single different > matrix (mat1) and use the use the chisq.test. > > So mat 1 would for example looks like this: > > Plastikbecher Kartonbox Papier > Rama 24 65 12 > Flora 75 28 22 > > And then I would run matResult[1,3]<- sqrt(chisq.test(mat1)[[1]]) > > So in the end I would get a matrix like this: > Rama Homa Flora SB HollButter > Rama 0.000 6.642 6.470 2.209 6.931 > Homa 6.642 0.000 0.430 4.994 8.387 > Flora 6.470 0.430 0.000 4.754 7.941 > SB 2.209 4.994 4.754 0.000 5.901 > HollButter 6.931 8.387 7.941 5.901 0.000 > > So here is my question: > How can I run a single linkage algorithm over this matrix? > > I thought a good stating point might be "hclust" > > hclust(d, method = "complete", members=NULL) > > But the R reference says d must be "a dissimilarity structure as produced by > dist." > > But the dist function does not have a method chisquared-test or something > similar.Well, there is as.dist, so just use: hclust(as.dist(matResult), .......) Uwe Ligges> So does anybody have an idea how I can do a clusteranalysis with a > chi-squared test and then use a fusion algorithm to join the clusters? > > Thanks > > Thorsten > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.