Liaw, Andy
2004-Jan-20 21:55 UTC
[R] evaluation of discriminant functions+multivariate homosce dasticity
While I don't know anything about Box's M test, I googled around and found a Matlab M-file that computes it. Below is my straight-forward translation of the code, without knowing Matlab or the formula (and done in a few minutes). I hope this demonstrates one of Prof. Ripley's point: If you really want to shoot yourself in the foot, you can probably program R to do that for you. [BTW: I left the original comments largely intact. The output I get from R for the example is the same as what is shown in the comments.] [BTW #2: I'd imagine the original Matlab code probably could be improved a bit...] Andy ======================================================================BoxMTest <- function(X, cl, alpha=0.05) { ## Multivariate Statistical Testing for the Homogeneity of Covariance ## Matrices by the Box's M. ## ## Syntax: function [MBox] = BoxMTest(X,alpha) ## ## Inputs: ## X - data matrix (Size of matrix must be n-by-(1+p); sample=column 1, ## variables=column 2:p). ## alpha - significance level (default = 0.05). ## Output: ## MBox - the Box's M statistic. ## Chi-sqr. or F - the approximation statistic test. ## df's - degrees' of freedom of the approximation statistic test. ## P - observed significance level. ## ## If the groups sample-size is at least 20 (sufficiently large), ## Box's M test takes a Chi-square approximation; otherwise it takes ## an F approximation. ## ## Example: For a two groups (g = 2) with three independent variables ## (p = 3), we are interested in testing the homogeneity of covariances ## matrices with a significance level = 0.05. The two groups have the ## same sample-size n1 = n2 = 5. ## Group ## --------------------------------------- ## 1 2 ## --------------------------------------- ## x1 x2 x3 x1 x2 x3 ## --------------------------------------- ## 23 45 15 277 230 63 ## 40 85 18 153 80 29 ## 215 307 60 306 440 105 ## 110 110 50 252 350 175 ## 65 105 24 143 205 42 ## --------------------------------------- ## ## Total data matrix must be: ## X=[1 23 45 15;1 40 85 18;1 215 307 60;1 110 110 50;1 65 105 24; ## 2 277 230 63;2 153 80 29;2 306 440 105;2 252 350 175;2 143 205 42]; ## ## Calling on Matlab the function: ## MBoxtest(X,0.05) ## ## Answer is: ## ## ------------------------------------------------------------ ## MBox F df1 df2 P ## ------------------------------------------------------------ ## 27.1622 2.6293 6 463 0.0162 ## ------------------------------------------------------------ ## Covariance matrices are significantly different. ## ## Created by A. Trujillo-Ortiz and R. Hernandez-Walls ## Facultad de Ciencias Marinas ## Universidad Autonoma de Baja California ## Apdo. Postal 453 ## Ensenada, Baja California ## Mexico. ## atrujo at uabc.mx ## And the special collaboration of the post-graduate students of the 2002:2 ## Multivariate Statistics Course: Karel Castro-Morales, ## Alejandro Espinoza-Tenorio, Andrea Guia-Ramirez, Raquel Muniz-Salazar, ## Jose Luis Sanchez-Osorio and Roberto Carmona-Pina. ## November 2002. ## ## To cite this file, this would be an appropriate format: ## Trujillo-Ortiz, A., R. Hernandez-Walls, K. Castro-Morales, ## A. Espinoza-Tenorio, A. Guia-Ramirez and R. Carmona-Pina. (2002). ## MBoxtest: Multivariate Statistical Testing for the Homogeneity of ## Covariance Matrices by the Box's M. A MATLAB file. [WWW document]. ## URL http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=273 3&objectType=FILE ## ## References: ## ## Stevens, J. (1992), Applied Multivariate Statistics for Social Sciences. ## 2nd. ed., New-Jersey:Lawrance Erlbaum Associates Publishers. pp. 260-269. if (alpha <= 0 || alpha >= 1) stop('significance level must be between 0 and 1') g = nlevels(cl) ## Number of groups. n = table(cl) ## Vector of groups-size. N = nrow(X) p = ncol(X) bandera = 2 if (any(n >= 20)) bandera = 1 ## Partition of the group covariance matrices. covList <- tapply(X, rep(cl, ncol(X)), function(x, nc) cov(matrix(x, nc=nc)), ncol(X)) deno = sum(n) - g suma = array(0, dim=dim(covList[[1]])) for (k in 1:g) suma = suma + (n[k] - 1) * covList[[k]] Sp = suma / deno ## Pooled covariance matrix. Falta=0 for (k in 1:g) Falta = Falta + ((n[k] - 1) * log(det(covList[[k]]))) MB = (sum(n) - g) * log(det(Sp)) - Falta ## Box's M statistic. suma1 = sum(1 / (n[1:g] - 1)) suma2 = sum(1 / ((n[1:g] - 1)^2)) C = (((2 * p^2) + (3 * p) - 1) / (6 * (p + 1) * (g - 1))) * (suma1 - (1 / deno)) ## Computing of correction factor. if (bandera == 1) { X2 = MB * (1 - C) ## Chi-square approximation. v = as.integer((p * (p + 1) * (g - 1)) / 2) ## Degrees of freedom. ## Significance value associated to the observed Chi-square statistic. P = pchisq(X2, v, lower=TRUE) cat('------------------------------------------------\n'); cat(' MBox Chi-sqr. df P\n') cat('------------------------------------------------\n') cat(sprintf("%10.4f%11.4f%12.i%13.4f\n", MB, X2, v, P)) cat('------------------------------------------------\n') if (P >= alpha) { cat('Covariance matrices are not significantly different.\n') } else { cat('Covariance matrices are significantly different.\n') } return(list(MBox=MB, ChiSq=X2, df=v, pValue=P)) } else { ## To obtain the F approximation we first define Co, which combined to ## the before C value are used to estimate the denominator degrees of ## freedom (v2); resulting two possible cases. Co = (((p-1) * (p+2)) / (6 * (g-1))) * (suma2 - (1 / (deno^2))) if (Co - (C^2) >= 0) { v1 = as.integer((p * (p + 1) * (g - 1)) / 2) ## Numerator DF. v21 = as.integer(trunc((v1 + 2) / (Co - (C^2)))) ## Denominator DF. F1 = MB * ((1 - C - (v1 / v21)) / v1) ## F approximation. ## Significance value associated to the observed F statistic. P1 = pf(F1, v1, v21, lower=FALSE) cat('\n------------------------------------------------------------\n') cat(' MBox F df1 df2 P\n') cat('------------------------------------------------------------\n') cat(sprintf("%10.4f%11.4f%11.i%14.i%13.4f\n", MB, F1, v1, v21, P1)) cat('------------------------------------------------------------\n') if (P1 >= alpha) { cat('Covariance matrices are not significantly different.\n') } else { cat('Covariance matrices are significantly different.\n') } return(list(MBox=MB, F=F1, df1=v1, df2=v21, pValue=P1)) } else { v1 = as.integer((p * (p + 1) * (g - 1)) / 2) ## Numerator df. v22 = as.integer(trunc((v1 + 2) / ((C^2) - Co))) ## Denominator df. b = v22 / (1 - C - (2 / v22)) F2 = (v22 * MB) / (v1 * (b - MB)) ## F approximation. ## Significance value associated to the observed F statistic. P2 = pf(F2, v1, v22, lower=FALSE) cat('\n------------------------------------------------------------\n') cat(' MBox F df1 df2 P\n') cat('------------------------------------------------------------\n') cat(sprintf('%10.4f%11.4f%11.i%14.i%13.4f\n', MB, F2, v1, v22, P2)) cat('------------------------------------------------------------\n') if (P2 >= alpha) { cat('Covariance matrices are not significantly different.\n') } else { cat('Covariance matrices are significantly different.\n') } return(list(MBox=MB, F=F2, df1=v1, df2=v22, pValue=P2)) } } }> From: Janke ten Holt > > Hello, > > I am switching from SPSS-Windows to R-Linux. My university is very > SPSS-oriented so maybe that's the cause of my problems. I am > a beginner > in R and my assignments are SPSS-oriented, so I hope I don't annoy > anyone with my questions... > > Right now I've got 2 problems: > -I have to evaluate discriminant functions I have calculated with > lda(MASS). I can't find a measure that evaluates their significance > (Wilk's lambda in my textbook (Stevens,(2002),"Applied multivariate > statistics for the social sciences")and in SPSS). Is there a Wilk's > lambda for discriminant functions in R? or can I use an alternative > measure? or am I thinking in the wrong direction? I have searched the > help-archive to find similar questions to mine but no answer to them. > > -My second problem: to check the assumption of multivariate > homoscedasticity I have to test if the variance-covariance > matrices for > my variables are homogene. My textbook suggests Box's M test. I can't > find this statistic in R. Again I have found similar questions in the > help-archives, but no answers. Is there a way to calculate > Box's M in R? > Or is there an alternative way to check for multivariate > homoscedasticity? > > Any suggestion would be greatly appreciated! > > Cheers, > Janke ten Holt------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
Dave Andrae
2004-Jan-21 15:58 UTC
[R] evaluation of discriminant functions+multivariate homosce dasticity
I seem to remember, from a course in which I used SPSS for LDA, that Box's M is an ultra-sensitive test as well and that in almost all practical applications it's not useful, so Prof. Ripley's comments apply to that test, too. HTH, Dave --- "Liaw, Andy" <andy_liaw at merck.com> wrote:> While I don't know anything about Box's M test, I googled around and > found a > Matlab M-file that computes it. Below is my straight-forward > translation of > the code, without knowing Matlab or the formula (and done in a few > minutes). > I hope this demonstrates one of Prof. Ripley's point: If you really > want to > shoot yourself in the foot, you can probably program R to do that for > you. > > [BTW: I left the original comments largely intact. The output I get > from R > for the example is the same as what is shown in the comments.] > > [BTW #2: I'd imagine the original Matlab code probably could be > improved a > bit...] > > Andy > >======================================================================> BoxMTest <- function(X, cl, alpha=0.05) {> ## Multivariate Statistical Testing for the Homogeneity of > Covariance > ## Matrices by the Box's M. > ## > ## Syntax: function [MBox] = BoxMTest(X,alpha) > ## > ## Inputs: > ## X - data matrix (Size of matrix must be n-by-(1+p); > sample=column > 1, > ## variables=column 2:p). > ## alpha - significance level (default = 0.05). > ## Output: > ## MBox - the Box's M statistic. > ## Chi-sqr. or F - the approximation statistic test. > ## df's - degrees' of freedom of the approximation statistic > test. > ## P - observed significance level. > ## > ## If the groups sample-size is at least 20 (sufficiently large), > ## Box's M test takes a Chi-square approximation; otherwise it > takes > ## an F approximation. > ## > ## Example: For a two groups (g = 2) with three independent > variables > ## (p = 3), we are interested in testing the homogeneity of > covariances > ## matrices with a significance level = 0.05. The two groups > have the > ## same sample-size n1 = n2 = 5. > ## Group > ## --------------------------------------- > > ## 1 2 > ## --------------------------------------- > ## x1 x2 x3 x1 x2 x3 > ## --------------------------------------- > ## 23 45 15 277 230 63 > ## 40 85 18 153 80 29 > ## 215 307 60 306 440 105 > ## 110 110 50 252 350 175 > ## 65 105 24 143 205 42 > ## --------------------------------------- > ## > ## Total data matrix must be: > ## X=[1 23 45 15;1 40 85 18;1 215 307 60;1 110 110 50;1 65 > 105 > 24; > ## 2 277 230 63;2 153 80 29;2 306 440 105;2 252 350 175;2 > 143 205 > 42]; > ## > ## Calling on Matlab the function: > ## MBoxtest(X,0.05) > ## > ## Answer is: > ## > ## ------------------------------------------------------------ > ## MBox F df1 df2 P > ## ------------------------------------------------------------ > ## 27.1622 2.6293 6 463 0.0162 > ## ------------------------------------------------------------ > ## Covariance matrices are significantly different. > ## > > ## Created by A. Trujillo-Ortiz and R. Hernandez-Walls > ## Facultad de Ciencias Marinas > ## Universidad Autonoma de Baja California > ## Apdo. Postal 453 > ## Ensenada, Baja California > ## Mexico. > ## atrujo at uabc.mx > ## And the special collaboration of the post-graduate students of > the > 2002:2 > ## Multivariate Statistics Course: Karel Castro-Morales, > ## Alejandro Espinoza-Tenorio, Andrea Guia-Ramirez, Raquel > Muniz-Salazar, > ## Jose Luis Sanchez-Osorio and Roberto Carmona-Pina. > ## November 2002. > ## > ## To cite this file, this would be an appropriate format: > ## Trujillo-Ortiz, A., R. Hernandez-Walls, K. Castro-Morales, > ## A. Espinoza-Tenorio, A. Guia-Ramirez and R. Carmona-Pina. > (2002). > ## MBoxtest: Multivariate Statistical Testing for the Homogeneity > of > ## Covariance Matrices by the Box's M. A MATLAB file. [WWW > document]. > ## URL >http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=273> 3&objectType=FILE > ## > ## References: > ## > ## Stevens, J. (1992), Applied Multivariate Statistics for Social > Sciences. > ## 2nd. ed., New-Jersey:Lawrance Erlbaum Associates Publishers. > pp. > 260-269. > > if (alpha <= 0 || alpha >= 1) > stop('significance level must be between 0 and 1') > > g = nlevels(cl) ## Number of groups. > n = table(cl) ## Vector of groups-size. > > N = nrow(X) > p = ncol(X) > > bandera = 2 > if (any(n >= 20)) bandera = 1 > > ## Partition of the group covariance matrices. > covList <- tapply(X, rep(cl, ncol(X)), function(x, nc) > cov(matrix(x, > nc=nc)), > ncol(X)) > > deno = sum(n) - g > suma = array(0, dim=dim(covList[[1]])) > > for (k in 1:g) > suma = suma + (n[k] - 1) * covList[[k]] > > Sp = suma / deno ## Pooled covariance matrix. > Falta=0 > > for (k in 1:g) > Falta = Falta + ((n[k] - 1) * log(det(covList[[k]]))) > > MB = (sum(n) - g) * log(det(Sp)) - Falta ## Box's M statistic. > suma1 = sum(1 / (n[1:g] - 1)) > suma2 = sum(1 / ((n[1:g] - 1)^2)) > C = (((2 * p^2) + (3 * p) - 1) / (6 * (p + 1) * (g - 1))) * > (suma1 - (1 / deno)) ## Computing of correction factor. > if (bandera == 1) { > X2 = MB * (1 - C) ## Chi-square approximation. > v = as.integer((p * (p + 1) * (g - 1)) / 2) ## Degrees of > freedom. > ## Significance value associated to the observed Chi-square > statistic. > P = pchisq(X2, v, lower=TRUE) > > cat('------------------------------------------------\n'); > cat(' MBox Chi-sqr. df P\n') > cat('------------------------------------------------\n') > cat(sprintf("%10.4f%11.4f%12.i%13.4f\n", MB, X2, v, P)) > cat('------------------------------------------------\n') > if (P >= alpha) { > cat('Covariance matrices are not significantly different.\n') > } else { > cat('Covariance matrices are significantly different.\n') > } > return(list(MBox=MB, ChiSq=X2, df=v, pValue=P)) > } else { > ## To obtain the F approximation we first define Co, which > combined to > ## the before C value are used to estimate the denominator > degrees of > ## freedom (v2); resulting two possible cases. > Co = (((p-1) * (p+2)) / (6 * (g-1))) * (suma2 - (1 / (deno^2))) > if (Co - (C^2) >= 0) { > v1 = as.integer((p * (p + 1) * (g - 1)) / 2) ## Numerator DF. > v21 = as.integer(trunc((v1 + 2) / (Co - (C^2)))) ## > Denominator DF. > F1 = MB * ((1 - C - (v1 / v21)) / v1) ## F approximation. > ## Significance value associated to the observed F statistic. > P1 = pf(F1, v1, v21, lower=FALSE) > > >cat('\n------------------------------------------------------------\n')> cat(' MBox F df1 df2 > P\n') > > cat('------------------------------------------------------------\n') > cat(sprintf("%10.4f%11.4f%11.i%14.i%13.4f\n", MB, F1, v1, v21, > P1)) >=== message truncated ===