Hi, I have a dataset which has around 138 variables and 30,000 cases. I am trying to calculate a mahalanobis distance matrix for them and my procedure is like this: Suppose my data is stored in mymatrix> S<-cov(mymatrix) # this is fine > D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S))Error in solve.default(cov, ...) : system is computationally singular: reciprocal condition number = 1.09501e-25 I understand the error message but I don't know how to trace down which variables caused this so that I can "sacrifice" them if there are not a lot. Again, not sure if it is due to some variables and not sure if dropping variables is a good idea either. Thanks for help, weiwei -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III
Once I had a situation where the reason was that the variables were scaled to extremely different magnitudes. 1e-25 is a *very* small number but still there is some probability that it may help to look up standard deviations and to multiply the variable with the smallest st.dev. with 1e20 or something. Best, Christian On Mon, 8 Aug 2005, Weiwei Shi wrote:> Hi, > I have a dataset which has around 138 variables and 30,000 cases. I am > trying to calculate a mahalanobis distance matrix for them and my > procedure is like this: > > Suppose my data is stored in mymatrix > > S<-cov(mymatrix) # this is fine > > D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S)) > Error in solve.default(cov, ...) : system is computationally singular: > reciprocal condition number = 1.09501e-25 > > I understand the error message but I don't know how to trace down > which variables caused this so that I can "sacrifice" them if there > are not a lot. Again, not sure if it is due to some variables and not > sure if dropping variables is a good idea either. > > Thanks for help, > > weiwei > > > -- > Weiwei Shi, Ph.D > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >*** NEW ADDRESS! *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
More ideas: You can also perform an Eigenvalue decomposition of the covariance matrix and see along which directions the singularity occurs and how strong it is. Consequences could be: rescaling (or omission) of variables that are strong in these directions, taking principal components, or linear transformation of the whole data in order to attain less extreme ratios between cov eigenvalues. Generally I would say that information reduction (principal components or leaving out variables) should only be done if "small variance along a direction" means that "this direction is not important" in terms of the subject matter problem. Otherwise transformation could help. (Perhaps my guess was wrong in the first mail, you don't have to multiply something by 1e20 to repair a 1e-25 condition number and a more moderate transformation suffices.) Best, Christian On Mon, 8 Aug 2005, Weiwei Shi wrote:> Hi, > I have a dataset which has around 138 variables and 30,000 cases. I am > trying to calculate a mahalanobis distance matrix for them and my > procedure is like this: > > Suppose my data is stored in mymatrix > > S<-cov(mymatrix) # this is fine > > D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S)) > Error in solve.default(cov, ...) : system is computationally singular: > reciprocal condition number = 1.09501e-25 > > I understand the error message but I don't know how to trace down > which variables caused this so that I can "sacrifice" them if there > are not a lot. Again, not sure if it is due to some variables and not > sure if dropping variables is a good idea either. > > Thanks for help, > > weiwei > > > -- > Weiwei Shi, Ph.D > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >*** NEW ADDRESS! *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche