I am doing a cluster analysis with hclust. I want to get hclust to output the Hotelling's T squared statistic for each cluster so I can evaluate is data points should be in a cluster or not. My research to answer this question has been unsuccessful. Does anyone know how to get hclust to output the Hotelling's T squared statistic for each cluster? Mike [[alternative HTML version deleted]]
Bert Gunter
2016-Apr-08 14:48 UTC
[R] Generating Hotelling's T squared statistic with hclust
1. Where did you get the idea that this was a good thing to do? 2. Don't do it. 3. Do not reply to r-help: my comments are about statistics, not R, and so further discussion is off topic here. Either ignore me or post follow up to a statistics list like stats.stackexchange.com . Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Apr 8, 2016 at 6:54 AM, Michael <elopomorph at hotmail.com> wrote:> I am doing a cluster analysis with hclust. I want to get hclust to output the Hotelling's T squared statistic for each cluster so I can evaluate is data points should be in a cluster or not. My research to answer this question has been unsuccessful. Does anyone know how to get hclust to output the Hotelling's T squared statistic for each cluster? > > > Mike > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I believe the package "rattle" can do this. If not, see page 5 of the PDF below. https://cran.r-project.org/web/packages/Hotelling/Hotelling.pdf On 04/08/2016 06:54 AM, Michael wrote:> I am doing a cluster analysis with hclust. I want to get hclust to output the Hotelling's T squared statistic for each cluster so I can evaluate is data points should be in a cluster or not. My research to answer this question has been unsuccessful. Does anyone know how to get hclust to output the Hotelling's T squared statistic for each cluster? > > > Mike > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David L Carlson
2016-Apr-08 18:13 UTC
[R] Generating Hotelling's T squared statistic with hclust
As Burt pointed out, your plan is not advisable (that is putting it diplomatically) and not about R, but we can use R to show you why it is not advisable. What you are doing is inherently circular. You use the data to create groups and then you test the groups against the data you used to create them. The null hypothesis in Hotelling's T is that the groups are completely independent of the data.> set.seed(42) > x <- matrix(rnorm(25*4), 25, 4) > x.hcl <- hclust(dist(x), method="ward.D2") > plot(x.hcl)Now you have a dendrogram showing three nice looking clusters that are based on completely random numbers. Unless the pseudo random number function is flawed, there is no structure in these data, but the dendrogram looks plausible. We need 2 groups for Hotelling's T:> grps <- cutree(x.hcl, 2) > library(DescTools) > HotellingsT2Test(x~grps)Hotelling's two sample T2-test data: x by grps T.2 = 8.3476, df1 = 4, df2 = 20, p-value = 0.0003947 alternative hypothesis: true location difference is not equal to c(0,0,0,0) No surprise. There is a significant difference between the groups. That just tells us the hclust() is working properly. It tells us exactly nothing about any structure or pattern in the data (there is none). An equally bad (but surprisingly common) approach is to use linear discriminant analysis. Here we will use 3 groups:> grps <- cutree(x.hcl, 3) > library(MASS) > x.lda <- lda(x, grps) > x.pre <- predict(x.lda) > plot(x.lda) > for (i in 1:3) { segments(centers[i, 2], centers[i, 3],+ x.pre$x[grps==i, 1], x.pre$x[grps==i, 2], lty=2) + } Now we have 3 well-separated clusters created from completely random data. Hierarchical clustering always creates clusters. It does not question the data you provide and it does not stop and refuse to continue if there are no clusters in the data. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Sent: Friday, April 8, 2016 8:55 AM To: r-help at r-project.org Subject: [R] Generating Hotelling's T squared statistic with hclust I am doing a cluster analysis with hclust. I want to get hclust to output the Hotelling's T squared statistic for each cluster so I can evaluate is data points should be in a cluster or not. My research to answer this question has been unsuccessful. Does anyone know how to get hclust to output the Hotelling's T squared statistic for each cluster? Mike [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Possibly Parallel Threads
- Mahalanobis distance and probability of group membership using Hotelling's T2 distribution
- Hotelling T-Squared vs Two-Factor Anova
- How to do Hotelling's t2 test?
- test for equality of two data sets with multidimensional variables
- Finding an order for an hclust (dendrogram) object without intersections