Hi,
I am using R 2.9.0. It seems the documentation for the calculation of
Canberra distance using stats::dist is ambiguous. Does anyone have the
original definition given in the Lance & Williams paper from Aust. Comput.
J. 1, 15-20, 1967?
When there are zeros at certain position in both vectors, they are not
omitted as documented in the function (see below). Instead, Canberra
distance is calculated as described in Frédéric Chiroleu's post (
http://tolstoy.newcastle.edu.au/R/e3/help/07/10/1370.html )
d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y)), where NZ is the number of
none-zero positions. This can also be seen from the example given in the
document for stats::dist (see below).
However, when there is no such a position where the values are zero in both
vectors, the Canberra distance is calculated using the formula given in the
document.
Examples:
> dist(rbind(c(1,2,3,4), c(2,3,4,5)), method='canberra')
1
2 0.7873016
> dist(rbind(c(1,2,3,4,0), c(2,3,4,5,0)), method='canberra')
1
2 0.984127
> help(dist)
dist package:stats R Documentation
Distance Matrix Computation
......
'canberra': sum(|x_i - y_i| / |x_i + y_i|). Terms with zero
numerator and denominator are omitted from the sum and
treated as if the values were missing.
## example of binary and canberra distances.
x <- c(0, 0, 1, 1, 1, 1)
y <- c(1, 0, 1, 1, 0, 1)
dist(rbind(x,y), method= "binary")
## answer 0.4 = 2/5
dist(rbind(x,y), method= "canberra")
## answer 2 * (6/5)
Thanks!
--
Hongbo
[[alternative HTML version deleted]]