Roland Goecke
2002-Dec-04 20:46 UTC
[R] Interpreting canonical correlation (cancor) results
Hi, from what I understand about the canonical correlation function 'cancor', it looks for correlations in two sets of variables, each represented in matrix form. Right? Sounds exactly like what I need. I have tried the following but I am not sure how to interpret the results. AudioPCs <- c(ArTHarF0PCA$x[,2], ArTHarF1PCA$x[,2], ArTHarF2PCA$x[,2], ArTHarF3PCA$x[,2], ArTHarRMSPCA$x[,2]) VideoPCs <- c(ArTHarHeightPCA$x[,2], ArTHarWidthPCA$x[,2], ArTHarProUpperPCA$x[,2], ArTHarProLowerPCA$x[,2], ArTHarRelTeethPCA$x[,2]) AudioMatrix <- matrix(AudioPCs, nrow=20, ncol=5) VideoMatrix <- matrix(VideoPCs, nrow=20, ncol=5) ArTHarCCA <- cancor(AudioMatrix, VideoMatrix) ArTHarCCA $cor [1] 0.852092 0.833079 0.467436 0.279688 0.026228 $xcoef [,1] [,2] [,3] [,4] [,5] [1,] -0.0118794 0.0305097 -0.058891 -0.0601489 0.029186 [2,] -0.0350698 0.0163593 0.086743 0.0642735 0.100922 [3,] 0.1228351 0.0035069 -0.061669 -0.0019221 0.047723 [4,] -0.0461149 0.0186040 0.057543 -0.0649049 -0.132400 [5,] -0.0021663 -0.0624439 0.071591 -0.0457682 0.029516 $ycoef [,1] [,2] [,3] [,4] [,5] [1,] -0.018006 -0.074138 -0.038670 0.0072364 0.082370 [2,] -0.293414 -0.176453 -0.015322 -0.0111357 -0.072555 [3,] 0.179000 0.048471 -0.103974 0.3313531 -0.049797 [4,] -0.126606 -0.088371 0.214449 -0.2998246 0.063524 [5,] 0.133073 0.011817 -0.073828 -0.0278944 -0.081489 $xcenter [1] 1.9984e-16 2.2177e-15 -7.5495e-16 -2.6312e-15 1.5543e-16 $ycenter [1] -5.5511e-17 1.4683e-15 -3.1086e-16 -1.9984e-16 -3.5527e-16 So in this example, I took the second principal components each from a bunch of variables, stuck them together in matrices and then performed CCA on it. The results tell me that the correlation for two variables was quite high 0.85 and 0.83 but how do I know which variables these actually are? I mean the correlation values are always given in order from highest to lowest, so that is not much help. How can I find something like that? Or is all I can get out of this that there is a linear combination of the parameters of set 1 that is well correlated to the parameters of set 2? Cheers Roland
Stephane Dray
2002-Dec-05 10:32 UTC
[R] Interpreting canonical correlation (cancor) results
Consider your two matrices X and Y. Cancor finds linear combination of the variables of X and linear combination of the variables of Y of maximal correlation. This linear combination are named canonical variates: Max(cor(a1X1+a2X2+...,b1Y1+b2Y2...) The correlation are given by $cor The coeficient are given by $xcoef and $ycoef. This coefficient are used to interpret cancor. However, Cancor needs many individuals compared to the number of variables (and it is not your case:20, 5,5) because it is based on two multivariate regression (2 mahalanobis metrics) and is very sensitive to collinearity among variables in each data set. So you must compute the correlation between canonical variates and variables to look for consistency between correlation and coefficient (same sign, same order of value). You can also look if the intraset correlation (i.e. correlations among variables of the same matrix) are consistent with coefficients. For example, if the second and the third variable of Y are positively correlated, there are problems of collinearity because coefficients are -0.29 and 0.17 and your analysis is very unstable. In this case, an alternative is Coinertia analysis which maximizes the covariance and not the correlation. This methods will be soon available for R in the ADE4 package available at http://pbil.univ-lyon1.fr/R/rplus/. For the moment, the documentation is in french but it will be soon available in english and submitted to CRAN. References that can help you: Gittins, R. (1985) Canonical analysis, a review with applications in ecology. Springer-Verlag, Berlin. 1-351. Ter Braak, C.J.F. (1990) Interpreting canonical correlation analysis through biplots of structure correlations and weights. Psychometrika : 55, 519-531. Doledec, S. & Chessel, D. (1994) Co-inertia analysis: an alternative method for studying species-environment relationships. Freshwater Biology : 31, 277-294.>Hi, > >from what I understand about the canonical correlation function >''cancor'', it looks for correlations in two sets of variables, each >represented in matrix form. Right? Sounds exactly like what I need. > >I have tried the following but I am not sure how to interpret the results. > >AudioPCs <- c(ArTHarF0PCA$x[,2], ArTHarF1PCA$x[,2], >ArTHarF2PCA$x[,2], ArTHarF3PCA$x[,2], ArTHarRMSPCA$x[,2]) >VideoPCs <- c(ArTHarHeightPCA$x[,2], ArTHarWidthPCA$x[,2], >ArTHarProUpperPCA$x[,2], ArTHarProLowerPCA$x[,2], >ArTHarRelTeethPCA$x[,2]) > >AudioMatrix <- matrix(AudioPCs, nrow=20, ncol=5) >VideoMatrix <- matrix(VideoPCs, nrow=20, ncol=5) > >ArTHarCCA <- cancor(AudioMatrix, VideoMatrix) >ArTHarCCA >$cor >[1] 0.852092 0.833079 0.467436 0.279688 0.026228 > >$xcoef > [,1] [,2] [,3] [,4] [,5] >[1,] -0.0118794 0.0305097 -0.058891 -0.0601489 0.029186 >[2,] -0.0350698 0.0163593 0.086743 0.0642735 0.100922 >[3,] 0.1228351 0.0035069 -0.061669 -0.0019221 0.047723 >[4,] -0.0461149 0.0186040 0.057543 -0.0649049 -0.132400 >[5,] -0.0021663 -0.0624439 0.071591 -0.0457682 0.029516 > >$ycoef > [,1] [,2] [,3] [,4] [,5] >[1,] -0.018006 -0.074138 -0.038670 0.0072364 0.082370 >[2,] -0.293414 -0.176453 -0.015322 -0.0111357 -0.072555 >[3,] 0.179000 0.048471 -0.103974 0.3313531 -0.049797 >[4,] -0.126606 -0.088371 0.214449 -0.2998246 0.063524 >[5,] 0.133073 0.011817 -0.073828 -0.0278944 -0.081489 > >$xcenter >[1] 1.9984e-16 2.2177e-15 -7.5495e-16 -2.6312e-15 1.5543e-16 > >$ycenter >[1] -5.5511e-17 1.4683e-15 -3.1086e-16 -1.9984e-16 -3.5527e-16 > > >So in this example, I took the second principal components each from >a bunch of variables, stuck them together in matrices and then >performed CCA on it. > >The results tell me that the correlation for two variables was quite >high 0.85 and 0.83 but how do I know which variables these actually >are? I mean the correlation values are always given in order from >highest to lowest, so that is not much help. > >How can I find something like that? Or is all I can get out of this >that there is a linear combination of the parameters of set 1 that >is well correlated to the parameters of set 2? > >Cheers >Roland > >______________________________________________ >R-help@stat.math.ethz.ch mailing list >http://www.stat.math.ethz.ch/mailman/listinfo/r-help-- Stéphane DRAY --------------------------------------------------------------- Biométrie et Biologie évolutive - Equipe "Écologie Statistique" Universite Lyon 1 - Bat 711 - 69622 Villeurbanne CEDEX - France Tel : 04 72 43 27 56 Fax : 04 78 89 27 19 04 72 43 27 57 E-mail : dray@biomserv.univ-lyon1.fr --------------------------------------------------------------- ADE-4 http://pbil.univ-lyon1.fr/ADE-4/ADE-4F.html --------------------------------------------------------------- [[alternate HTML version deleted]]