I wrote a function to calculate cosine distances between rows of a matrix. It uses two loops and is slow. Any suggestions to speed this up? Thanks in advance. theta.dist <- function(x){ res <- matrix(NA, nrow(x), nrow(x)) for (i in 1:nrow(x)){ for(j in 1:nrow(x)){ if (i > j) res[i, j] <- res[j, i] else { v1 <- x[i,] v2 <- x[j,] good <- !is.na(v1) & !is.na(v2) v1 <- v1[good] v2 <- v2[good] theta <- acos(v1%*%v2 / sqrt(v1%*%v1 * v2%*%v2 )) / pi * 180 res[i,j] <- theta } } } as.dist(res) }
I think this will do what you want, though there may be ways of speeding it up further. theta.dist <- function(x) as.dist(acos(crossprod(t(x))/sqrt(crossprod(t(rowSums(x*x)))))/pi*180) *********************************** Simon Gatehouse CSIRO Exploration and Mining, Newbigin Close off Julius Ave North Ryde, NSW Mail: PO Box 136, North Ryde NSW 1670, Australia Phone: 61 (2) 9490 8677 Fax: 61 (2) 9490 8921 Mobile: 61 0407 130 635 E-mail: simon.gatehouse@csiro.au Web Page: http://www.csiro.au/ <http://www.csiro.au/> -----Original Message----- From: Xiao-Jun Ma [mailto:xma@arcturusag.com <mailto:xma@arcturusag.com> ] Sent: Friday, November 28, 2003 10:02 AM To: 'r-help@stat.math.ethz.ch ' Subject: [R] Getting rid of loops? I wrote a function to calculate cosine distances between rows of a matrix. It uses two loops and is slow. Any suggestions to speed this up? Thanks in advance. theta.dist <- function(x){ res <- matrix(NA, nrow(x), nrow(x)) for (i in 1:nrow(x)){ for(j in 1:nrow(x)){ if (i > j) res[i, j] <- res[j, i] else { v1 <- x[i,] v2 <- x[j,] good <- !is.na(v1) & !is.na(v2) v1 <- v1[good] v2 <- v2[good] theta <- acos(v1%*%v2 / sqrt(v1%*%v1 * v2%*%v2 )) / pi * 180 res[i,j] <- theta } } } as.dist(res) } ______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help <https://www.stat.math.ethz.ch/mailman/listinfo/r-help> [[alternative HTML version deleted]]
Simon and Peter, Thanks for your help. Peter's function speeds it up 25x vs. my naive code! XiaoJun -----Original Message----- From: Peter Dalgaard To: Simon.Gatehouse at csiro.au Cc: r-help at stat.math.ethz.ch; Xiao-Jun Ma Sent: 02-12-03 15.57 Subject: Re: [R] Getting rid of loops? Simon.Gatehouse at csiro.au writes:> I think this will do what you want, though there may be ways ofspeeding it> up further. >theta.dist2 <- function(x) as.dist(acos(crossprod(t(x))/sqrt(crossprod(t(rowSums(x*x)))))/pi*180) Or, theta.dist <- function(x) as.dist(acos(cov2cor(crossprod(t(x))))/pi*180) Now, if only there was a way to tell cor() not to center the variables, we'd have as.dist(acos(cor(t(x),center=F))/pi*180) Unfortunately there's no such argument.> > theta.dist <- function(x){ > > res <- matrix(NA, nrow(x), nrow(x)) > > for (i in 1:nrow(x)){ > for(j in 1:nrow(x)){ > if (i > j) > res[i, j] <- res[j, i] > else { > v1 <- x[i,] > v2 <- x[j,] > good <- !is.na(v1) & !is.na(v2) > v1 <- v1[good] > v2 <- v2[good] > theta <- acos(v1%*%v2 / sqrt(v1%*%v1 * v2%*%v2 )) / pi * 180 > res[i,j] <- theta > } > } > } > as.dist(res) > } > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > <https://www.stat.math.ethz.ch/mailman/listinfo/r-help> > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >-- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help