Mike White
2007-Feb-20 11:18 UTC
[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution
I want to calculate the probability that a group will include a particular point using the squared Mahalanobis distance to the centroid. I understand that the squared Mahalanobis distance is distributed as chi-squared but that for a small number of random samples from a multivariate normal population the Hotellings T2 (T squared) distribution should be used. I cannot find a function for Hotelling's T2 distribution in R (although from a previous post I have been provided with functions for the Hotelling Test). My understanding is that the Hotelling's T2 distribution is related to the F distribution using the equation: T2(u,v) = F(u, v-u+1)*vu/(v-u+1) where u is the number of variables and v the number of group members. I have written the R code below to compare the results from the chi-squared distribution with the Hotelling's T2 distribution for probability of a member being included within a group. Please can anyone confirm whether or not this is the correct way to use Hotelling's T2 distribution for probability of group membership. Also, when testing a particular group member, is it preferable to leave that member out when calculating the centre and covariance of the group for the Mahalanobis distances? Thanks Mike White ############################################################################ #### ## Hotelling T^2 distribution function ph<-function(q, u, v, ...){ # q vector of quantiles as in function pf # u number of independent variables # v number of observations if (!v > u+1) stop("n must be greater than p+1") df1 <- u df2 <- v-u+1 pf(q*df2/(v*u), df1, df2, ...) } # compare Chi-squared and Hotelling T^2 distributions for a group member u<-3 v<-10 set.seed(1) mat<-matrix(rnorm(v*u), nrow=v, ncol=u) MD2<-mahalanobis(mat, center=colMeans(mat), cov=cov(mat)) d<-MD2[order(MD2)] # select a point midway between nearest and furthest from centroid dm<-d[length(d)/2] 1-ph(dm,u,v) # probability using Hotelling T^2 distribution # [1] 0.6577069 1-pchisq(dm, u) # probability using Chi-squared distribution # [1] 0.5538466
Mike White
2007-Feb-20 16:14 UTC
[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution
I want to calculate the probability that a group will include a particular point using the squared Mahalanobis distance to the centroid. I understand that the squared Mahalanobis distance is distributed as chi-squared but that for a small number of random samples from a multivariate normal population the Hotellings T2 (T squared) distribution should be used. I cannot find a function for Hotelling's T2 distribution in R (although from a previous post I have been provided with functions for the Hotelling Test). My understanding is that the Hotelling's T2 distribution is related to the F distribution using the equation: T2(u,v) = F(u, v-u+1)*vu/(v-u+1) where u is the number of variables and v the number of group members. I have written the R code below to compare the results from the chi-squared distribution with the Hotelling's T2 distribution for probability of a member being included within a group. Please can anyone confirm whether or not this is the correct way to use Hotelling's T2 distribution for probability of group membership. Also, when testing a particular group member, is it preferable to leave that member out when calculating the centre and covariance of the group for the Mahalanobis distances? Thanks Mike White ############################################################################ #### ## Hotelling T^2 distribution function ph<-function(q, u, v, ...){ # q vector of quantiles as in function pf # u number of independent variables # v number of observations if (!v > u+1) stop("n must be greater than p+1") df1 <- u df2 <- v-u+1 pf(q*df2/(v*u), df1, df2, ...) } # compare Chi-squared and Hotelling T^2 distributions for a group member u<-3 v<-10 set.seed(1) mat<-matrix(rnorm(v*u), nrow=v, ncol=u) MD2<-mahalanobis(mat, center=colMeans(mat), cov=cov(mat)) d<-MD2[order(MD2)] # select a point midway between nearest and furthest from centroid dm<-d[length(d)/2] 1-ph(dm,u,v) # probability using Hotelling T^2 distribution # [1] 0.6577069 1-pchisq(dm, u) # probability using Chi-squared distribution # [1] 0.5538466