Hello, I am a new R user and trying to learn how to implement the mahalanobis function to measure the distance between to 2 population centroids. I have used STATISTICA to calculate these differences, but was hoping to learn to do the analysis in R. I have implemented the code as below, but my results are very different from that of STATISTICA, and I believe I may not have interpreted the help correctly and may have implemented the code incorrectly. Though I am not certain, I believe that my error may be in calculating the common covariance matrix (the third argument supplied to the mahalanobis funtion). Any help or guidance would be greatly appreciated. Thank you! RL CODE fit<-lda(pop~v1 + v2 + v3 +...+vn, data=my.data) x1<-subset(my.data, pop==1) x2<-subset(my.data, pop==2) #Save Covariance Matices for each group cov1<-cov(x1) cov2<-cov(x2) #Determine number of rows in each matrix n1<-nrow(x1); n2<-nrow(x2); n.rows<-c(n1,n2) #store mean vectors from lda object mu1<-fit$means[1,] mu2<-fit$means[2,] #Calculate the common Covariance Matrix S<-(((n.rows[1]-1)*cov1)+((n.rows[2]-1)*cov2)/ (sum(n.rows[1:2])-1)) #Calculate the common Covariance Matrix mahalanobis(mu1, mu2, S) [[alternative HTML version deleted]]
If the goal is to *use* the Mahalanobis distance, rather than to learn how to write your own code, there are several existing implementations. rseek.org is a good place to find functions. Sarah On Fri, Jan 29, 2010 at 9:48 PM, Robert Lonsinger <rob.lonsinger at gmail.com> wrote:> Hello, > I am a new R user and trying to learn how to implement the mahalanobis > function to measure the distance between to 2 population centroids. ?I > have used STATISTICA to calculate these differences, but was hoping to learn > to do the analysis in R. ?I have implemented the code as below, but my > results are very different from that of STATISTICA, and I believe I may not > have interpreted the help correctly and may have implemented the > code incorrectly. > > Though I am not certain, I believe that my error may be in calculating the > common covariance matrix (the third argument supplied to the mahalanobis > funtion). > > Any help or guidance would be greatly appreciated. > > Thank you! RL > > CODE > > fit<-lda(pop~v1 + v2 + v3 +...+vn, data=my.data) > > x1<-subset(my.data, pop==1) > > x2<-subset(my.data, pop==2) > > > > ?#Save Covariance Matices for each group > cov1<-cov(x1) > cov2<-cov(x2) > > > > #Determine number of rows in each matrix > n1<-nrow(x1); n2<-nrow(x2); > n.rows<-c(n1,n2) > > > #store mean vectors from lda object > mu1<-fit$means[1,] > mu2<-fit$means[2,] > > > > #Calculate the common Covariance Matrix > S<-(((n.rows[1]-1)*cov1)+((n.rows[2]-1)*cov2)/ (sum(n.rows[1:2])-1)) > > #Calculate the common Covariance Matrix > mahalanobis(mu1, mu2, S) > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Sarah Goslee http://www.functionaldiversity.org
I have been able to implement the Mahalanobis distance function, and I have also been able to generate code that computes the mahalanobis distance calculations. Both have resulted in the same results, though these results differ from the true results. I believe my problem is in the formulation of 'S' (see below), but I am not sure as to how to correct it. Would anybody who has successfully implemented the mahalanobis() please provide some guidance on what I am doing wrong??????? #Both of the following two methods for computing this distance give the same result #To calculate Mahalanobis distance for populations 1 and 2 m.x1<-mean(subset(my.data, pop==1)) m.x2<-mean(subset(my.data, pop==2)) s1<-cov(subset(my.data, pop==1)) s2<-cov(subset(my.data, pop==2)) #I believe I am doing something wrong with the calculation of 'S' S<-((((n.rows[1]-1)*s1) + ((n.rows[2]-1)*s2)) / ((n.rows[1]+n.rows[2])-1)) Si<-ginv(S) d2<-t(m.x1-m.x2) %*% Si %*% (m.x1-m.x2) d2 #or using the mahalanobis() function mahalanobis(m.x1,m.x2,S)> If the goal is to *use* the Mahalanobis distance, rather than to learn > how to write your own code, there are several existing implementations. > rseek.org is a good place to find functions. > > Sarah > > On Fri, Jan 29, 2010 at 9:48 PM, Robert Lonsinger > <rob.lonsinger@gmail.com> wrote: >> Hello, >> I am a new R user and trying to learn how to implement the mahalanobis >> function to measure the distance between to 2 population centroids. I >> have used STATISTICA to calculate these differences, but was hoping to >> learn >> to do the analysis in R. I have implemented the code as below, but my >> results are very different from that of STATISTICA, and I believe I may >> not >> have interpreted the help correctly and may have implemented the >> code incorrectly. >> >> Though I am not certain, I believe that my error may be in calculating >> the >> common covariance matrix (the third argument supplied to the mahalanobis >> funtion). >> >> Any help or guidance would be greatly appreciated. >> >> Thank you! RL >> >> CODE >> >> fit<-lda(pop~v1 + v2 + v3 +...+vn, data=my.data) >> >> x1<-subset(my.data, pop==1) >> >> x2<-subset(my.data, pop==2) >> >> >> >> #Save Covariance Matices for each group >> cov1<-cov(x1) >> cov2<-cov(x2) >> >> >> >> #Determine number of rows in each matrix >> n1<-nrow(x1); n2<-nrow(x2); >> n.rows<-c(n1,n2) >> >> >> #store mean vectors from lda object >> mu1<-fit$means[1,] >> mu2<-fit$means[2,] >> >> >> >> #Calculate the common Covariance Matrix >> S<-(((n.rows[1]-1)*cov1)+((n.rows[2]-1)*cov2)/ (sum(n.rows[1:2])-1)) >> >> #Calculate the common Covariance Matrix >> mahalanobis(mu1, mu2, S) >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Sarah Goslee > http://www.functionaldiversity.org >-- [[alternative HTML version deleted]]