Hello, I have a self-defined function to be computed on each column in a matrix. The basic idea is to ignore the elements that have value of 0 during computation. I should be able to write my own function but it could be computational expensive, so I'd love to ask if anyone may have suggestions on how to implement it more efficiently. Thanks in advance. For example, there are three vectors in the matrix, which are A B C 1 0 1 -1 1 1 -1 -1 1 1 0 -1 Distance(AB) = (-1X1+(-1)X(-1))/de(AB) , and de(AB) = sqrt(square(-1)+square(-1)) X sqrt(square(1)+square(-1)) Distance(BC) = (1X1+(-1)X1)/de(BC) ,and de(BC) = sqrt(square(1)+square(-1)) X sqrt(square(1)+square(1)) Distance(AC) = (1X1+(-1)X1+(-1)X1+1X(-1))/de(AC), and de(BC) = sqrt(square(1)+square(-1)+square(-1)+square(1)) X sqrt(square(1)+square(1)+square(1)+square(-1)) As you may see, the numerator is basically the dot product of the two vectors; this function actually is more like the cosine function in R, but with some variations. I would need to compute the distance between any two vectors in a matrix. It would be ideal if the results could be the output that produces by some R distance function. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/self-defined-distance-function-to-be-computed-on-matrix-tp4641860.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]
R. Michael Weylandt
2012-Aug-30 19:13 UTC
[R] self-defined distance function to be computed on matrix
Hi zz, The help file for the dist() function ( ?dist) says that "Missing values are allowed, and are excluded from all computations involving the rows within which they occur" so if you can cajole this into any of the standard distance metrics, you could do something like: x[x == 0] <- NA before dist(x, method = ??) It's not quite clear to me how your distance metric is defined (perhaps give it symbolically, or at least on an output that's not all +/- 1 where it's hard to see what comes from where) but if you clarify, I can help you think through that. Cheers, Michael On Thu, Aug 30, 2012 at 12:48 PM, zz <czhang at uams.edu> wrote:> Hello, > > I have a self-defined function to be computed on each column in a matrix. > The basic idea is to ignore the elements that have value of 0 during > computation. > > I should be able to write my own function but it could be computational > expensive, so I'd love to ask if anyone may have suggestions on how to > implement it more efficiently. Thanks in advance. > > For example, there are three vectors in the matrix, which are > A B C > 1 0 1 > -1 1 1 > -1 -1 1 > 1 0 -1 > > Distance(AB) = (-1X1+(-1)X(-1))/de(AB) , and > de(AB) = sqrt(square(-1)+square(-1)) X sqrt(square(1)+square(-1)) > > Distance(BC) = (1X1+(-1)X1)/de(BC) ,and > de(BC) = sqrt(square(1)+square(-1)) X sqrt(square(1)+square(1)) > > Distance(AC) = (1X1+(-1)X1+(-1)X1+1X(-1))/de(AC), and > de(BC) = sqrt(square(1)+square(-1)+square(-1)+square(1)) X > sqrt(square(1)+square(1)+square(1)+square(-1)) > > As you may see, the numerator is basically the dot product of the two > vectors; this function actually is more like the cosine function in R, but > with some variations. > > I would need to compute the distance between any two vectors in a matrix. > It would be ideal if the results could be the output that produces by some R > distance function. > > Thanks. > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/self-defined-distance-function-to-be-computed-on-matrix-tp4641860.html > Sent from the R help mailing list archive at Nabble.com. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Peter Langfelder
2012-Aug-30 19:17 UTC
[R] self-defined distance function to be computed on matrix
On Thu, Aug 30, 2012 at 10:48 AM, zz <czhang at uams.edu> wrote:> Hello, > > I have a self-defined function to be computed on each column in a matrix. > The basic idea is to ignore the elements that have value of 0 during > computation. > > I should be able to write my own function but it could be computational > expensive, so I'd love to ask if anyone may have suggestions on how to > implement it more efficiently. Thanks in advance. > > For example, there are three vectors in the matrix, which are > A B C > 1 0 1 > -1 1 1 > -1 -1 1 > 1 0 -1 > > Distance(AB) = (-1X1+(-1)X(-1))/de(AB) , and > de(AB) = sqrt(square(-1)+square(-1)) X sqrt(square(1)+square(-1)) > > Distance(BC) = (1X1+(-1)X1)/de(BC) ,and > de(BC) = sqrt(square(1)+square(-1)) X sqrt(square(1)+square(1)) > > Distance(AC) = (1X1+(-1)X1+(-1)X1+1X(-1))/de(AC), and > de(BC) = sqrt(square(1)+square(-1)+square(-1)+square(1)) X > sqrt(square(1)+square(1)+square(1)+square(-1)) > > As you may see, the numerator is basically the dot product of the two > vectors; this function actually is more like the cosine function in R, but > with some variations. >If I understand it correctly, you are trying to calculate the "cosine correlation" while excluding all rows where one of the wto columns has a zero? There may be other ways to do it, but (shameless plug) my package WGCNA defines a replacement for the usual correlation function cor() that lets you specify the argument cosine = TRUE to calculate cosine correlation (i.e., Pearson correlation without centering). To ignore the zeroes, turn them into NA, and specify argument use = "pairwise.complete.obs" (or just use = "p") to the function cor. So define a matrix (say ABC), set all zero values to NA ABC[ABC==0] = NA then issue library(WGCNA) sim = cor(ABC, cosine = TRUE, use = 'p') Note that the correlation gives you a similarity; to turn it into a dissimilarity or distance you have to subtract it from 1 dissim = 1-sim HTH, Peter