Hello R helpers, I'm trying to use Mahalanobis distance to calculate distance of two time series, to make some comparations with euclidean distance, DTW, etc, but I'm having some dificults. I have, for example, two objects: s.1 <- c( 5.6324702, 1.3994353, -3.2572327, -3.8311846, -1.2248719, 0.9894694, -2.2835332, -5.1969285, -5.2823988, -3.1499400, -1.7307950, 2.8221209, 0.7005370, 4.9601216, 9.4527303) s.2 <- c(-1.000489e-03, -8.577807e-04, -7.150633e-04, -5.716564e-04, -4.280622e-04, -2.860101e-04, -1.451796e-04, -2.202688e-06, 1.441569e-04, 2.891237e-04, 4.280430e-04, 5.652797e-04, 7.100960e-04, 8.619236e-04, 1.007821e-03) when I try to calculate distance with *dist *function and *proxy *package like this: library(proxy) dist(rbind(s.1, s.2), method="mahalanobis") I have the following error: system is computationally singular: reciprocal condition number 3.84863e-020 if I try with de* mahalanobis() *function I have the same problem test <- rbind(s.1, s.2) mahalanobis(test, center=colMeans(test), cov=var(test)) And trying with diferent series I have the following error: "Lapack routine dgesv: system is exactly singular" I found some similar errors on the mailing list, but couldn't find some useful help for my case. Am I doing something wrong? Isn't it possible to use mahalanobis distance with that kind of data? Thank you very much for your help. -- View this message in context: http://r.789695.n4.nabble.com/Mahalanobis-Distance-tp3844960p3844960.html Sent from the R help mailing list archive at Nabble.com.
When I first saw your question, I thought the problem might have something to do with inverting the variance-covariance matrix, S, but that is not the case, I think: S for s.1 and s.2:> S[,1] [,2] [1,] 1.835044e+01 8.392485e-04 [2,] 8.392485e-04 4.093558e-07 inverse(S):> solve(S)[,1] [,2] [1,] 0.06013287 -123.2825 [2,] -123.28254430 2695612.8008 So, I am not sure what the difficulty is with your calculations. However, I wonder how much value there is to computing the Mahalanobis distance with two variables that are measured on such different scales?> summary(s.1)Min. 1st Qu. Median Mean 3rd Qu. Max. -5.282 -3.204 -1.225 0.000 2.111 9.453> summary(s.2)Min. 1st Qu. Median Mean 3rd Qu. Max. -1.000e-03 -4.999e-04 -2.203e-06 0.000e+00 4.967e-04 1.008e-03 How would you interpret such a distance? David Cross d.cross at tcu.edu www.davidcross.us On Sep 26, 2011, at 2:05 PM, jorgeA wrote:> Hello R helpers, > > I'm trying to use Mahalanobis distance to calculate distance of two time > series, to make some comparations with euclidean distance, DTW, etc, but I'm > having some dificults. > > I have, for example, two objects: > > s.1 <- c( 5.6324702, 1.3994353, -3.2572327, -3.8311846, -1.2248719, > 0.9894694, -2.2835332, -5.1969285, -5.2823988, -3.1499400, -1.7307950, > 2.8221209, 0.7005370, 4.9601216, 9.4527303) > > s.2 <- c(-1.000489e-03, -8.577807e-04, -7.150633e-04, -5.716564e-04, > -4.280622e-04, -2.860101e-04, -1.451796e-04, -2.202688e-06, 1.441569e-04, > 2.891237e-04, 4.280430e-04, 5.652797e-04, 7.100960e-04, 8.619236e-04, > 1.007821e-03) > > when I try to calculate distance with *dist *function and *proxy *package > like this: > > library(proxy) > dist(rbind(s.1, s.2), method="mahalanobis") > > I have the following error: > system is computationally singular: reciprocal condition number > 3.84863e-020 > > if I try with de* mahalanobis() *function I have the same problem > test <- rbind(s.1, s.2) > mahalanobis(test, center=colMeans(test), cov=var(test)) > > And trying with diferent series I have the following error: > "Lapack routine dgesv: system is exactly singular" > > I found some similar errors on the mailing list, but couldn't find some > useful help for my case. > > > Am I doing something wrong? Isn't it possible to use mahalanobis distance > with that kind of data? > > Thank you very much for your help. > > -- > View this message in context: http://r.789695.n4.nabble.com/Mahalanobis-Distance-tp3844960p3844960.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Since you are only looking at the distance between two points, they must fall on a line so no matter how many values you have for each point, their dimension is still 1. Mahalanobis distance is a way of measuring distance in multivariate space when the variables (columns) are correlated with one another. In this case, Euclidian distance (which assumes each dimension is orthogonal to all the others) is inappropriate. With two points and one dimension, all distance measures are effectively equivalent since they can be converted to one another by multiplying by an appropriate constant. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of jorgeA Sent: Tuesday, September 27, 2011 12:08 PM To: r-help at r-project.org Subject: Re: [R] Mahalanobis Distance Hello David(s), First of all, thank you for your help. I was running some tests, and I wish to know if I have correctly understood your explanation. Well, when I use rbind(), I get the variables binded by row, and when I use cbind() I get the variables binded by column. The dist() function, as the help says, "computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix", so, in that case I use rbind() (as the help example does). The mahalanobis() function help says "returns the squared Mahalanobis distance of all rows in x and the vector mu = center with respect to Sigma cov.", so, here again, the calculations are done by row. Using cbind() I get one result for each row like this: mahalanobis(testeCbind, center = colMeans(testeCbind), cov=var(testeCbind)) I get as result 15 values (the number of rows). With dist(), using euclidean and rbind() I get only one value (because is calculated by row). Thinking on that way, mahalanobis distance is not so aproprietad for my kind of input data. Am I correct? Or is there a way to make the calculation of mahalanobis of all points and get only one value as the result of how "distante" the variables (subseries) are? Thank you all again. Best regars, Jorge Aikes Junior -- View this message in context: http://r.789695.n4.nabble.com/Mahalanobis-Distance-tp3844960p3848247.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.