Dear all, I've sent this question 2 days ago and got response from Sarah. Thanks for that. But unfortunately, it did not really solve our problem. The main issue is that we want to use our own (manipulated) covariance matrix in the calculation of the mahalanobis distance. Does anyone know how to vectorize the below code instead of using a loop (which slows it down)? I'd really appreciate any help on this, thank you all in advance! Cheers, Frank This is what I posted 2 days ago: We have a data frame x with n people as rows and k variables as columns. Now, for each person (i.e., each row) we want to calculate a distance between him/her and EACH other person in x. In other words, we want to create a n x n matrix with distances (with zeros in the diagonal). However, we do not want to calculate Euclidian distances. We want to calculate Mahalanobis distances, which take into account the covariance among variables. Below is the piece of code we wrote ("covmat" in the function below is the variance-covariance matrix among variables in Data that has to be fed into mahalonobis function we are using). mahadist = function(x, covmat) { dismat = matrix(0,ncol=nrow(x),nrow=nrow(x)) for (i in 1:nrow(x)) { dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5 } return(dismat) } This piece of code works, but it is very slow. We were wondering if it's at all possible to somehow vectorize this function. Any help would be greatly appreciated. Thanks, Frank [[alternative HTML version deleted]]
One thing that would speed it up is if you inverted 'covmat' once and then used 'inverted=TRUE' in the call to 'mahalanobis'. Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Frank Hedler wrote:> Dear all, > I've sent this question 2 days ago and got response from Sarah. Thanks for > that. But unfortunately, it did not really solve our problem. The main issue > is that we want to use our own (manipulated) covariance matrix in the > calculation of the mahalanobis distance. Does anyone know how to vectorize > the below code instead of using a loop (which slows it down)? > I'd really appreciate any help on this, thank you all in advance! > Cheers, > Frank > > This is what I posted 2 days ago: > We have a data frame x with n people as rows and k variables as columns. > Now, for each person (i.e., each row) we want to calculate a distance > between him/her and EACH other person in x. In other words, we want to > create a n x n matrix with distances (with zeros in the diagonal). > However, we do not want to calculate Euclidian distances. We want to > calculate Mahalanobis distances, which take into account the covariance > among variables. > Below is the piece of code we wrote ("covmat" in the function below is the > variance-covariance matrix among variables in Data that has to be fed into > mahalonobis function we are using). > mahadist = function(x, covmat) { > dismat = matrix(0,ncol=nrow(x),nrow=nrow(x)) > for (i in 1:nrow(x)) { > dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5 > } > return(dismat) > } > > This piece of code works, but it is very slow. We were wondering if it's at > all possible to somehow vectorize this function. Any help would be greatly > appreciated. > Thanks, > Frank > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >
Richard.Cotton at hsl.gov.uk
2008-Oct-09 16:11 UTC
[R] vectorization instead of using loop
> I've sent this question 2 days ago and got response from Sarah. Thanksfor> that. But unfortunately, it did not really solve our problem. The mainissue> is that we want to use our own (manipulated) covariance matrix in the > calculation of the mahalanobis distance. Does anyone know how tovectorize> the below code instead of using a loop (which slows it down)? > I'd really appreciate any help on this, thank you all in advance! > Cheers, > Frank > > This is what I posted 2 days ago: > We have a data frame x with n people as rows and k variables as columns. > Now, for each person (i.e., each row) we want to calculate a distance > between him/her and EACH other person in x. In other words, we want to > create a n x n matrix with distances (with zeros in the diagonal). > However, we do not want to calculate Euclidian distances. We want to > calculate Mahalanobis distances, which take into account the covariance > among variables. > Below is the piece of code we wrote ("covmat" in the function below isthe> variance-covariance matrix among variables in Data that has to be fedinto> mahalonobis function we are using). > mahadist = function(x, covmat) { > dismat = matrix(0,ncol=nrow(x),nrow=nrow(x)) > for (i in 1:nrow(x)) { > dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]),covmat)^.5> } > return(dismat) > } > > This piece of code works, but it is very slow. We were wondering if it'sat> all possible to somehow vectorize this function. Any help would begreatly> appreciated.You can save a substantial time by calling as.matrix before the loop, e.g. x <- data.frame(runif(1000), runif(1000), runif(1000)) covmat <- cov(x) mahadist = function(x, covmat) #yours { dismat = matrix(0,ncol=nrow(x),nrow=nrow(x)) for (i in 1:nrow(x)) { dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5 } return(dismat) } mahadist2 <- function(x, covmat) #my modification { n <- nrow(x) dismat <- matrix(0,ncol=n,nrow=n) matx <- as.matrix(x) for (i in 1:n) { dismat[i,] <- mahalanobis(matx, matx[i,], covmat)^.5 } dismat } system.time(mahadist(x, covmat)) # user system elapsed # 2.82 0.06 2.95 system.time(mahadist2(x, covmat)) # user system elapsed # 1.39 0.04 1.45 Regards, Richie. Mathematical Sciences Unit HSL ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential inform...{{dropped:20}}