Hi, I misunderstand the definition of Canberra distance in R. On Internet and in function description pages of dist() from stats and Dist() from amap, Canberra distance between vectors x and y, d(x,y), is : d(x,y) = sum(abs(x-y)/(x+y)) But in use, through simple examples, we find that the formula is : d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y)) with NZ = nb of pairs of coordinates that are different from (0,0) (Non Zeros) Functions vegdist() from vegan and gdist() from mvpart, like documentation of ADE4 software, use (for positive variables) : d(x,y) = 1/NZ * sum(abs(x-y)/(x+y)) Can someone help me to understand the differences in the choice of the formula and why there's a difference between calculus and explaination for dist() ? Thank you for your help. Best regards, Fred PS : Be careful with function dudi.pca() from ade4 ; in values, "norm" doesn't give you what is written in the help page : "norm" returns the vector of standard deviations of initial variables when you choose "normed" PCA and the vector of standard deviations of normed variables, ie 1, when you choose non "normed" PCA. We contacted authors of the package unsuccessly to rectify the information. -- Dr. Fr?d?ric Chiroleu Biom?tricien CIRAD-Syst?mes Biologiques (Cirad-Bios) UMR 53 PVBMT (Peuplements V?g?taux et Bio-agresseurs en Milieu Tropical) Laboratoire d'Ecologie Terrestre et de Lutte Int?gr?e (LETLI) P?le de Protection des Plantes (3P) 7, chemin de l'IRAT Ligne Paradis 97410 Saint-Pierre ?le de la R?union - France T?l. : +262 (0)262 499 230 Standard : +262 (0)262 499 200 Fax : +262 (0)262 499 293 Courriel : frederic.chiroleu at cirad.fr
Fr?d?ric Chiroleu <frederic.chiroleu <at> cirad.fr> writes:> > Hi, > > I misunderstand the definition of Canberra distance in R. > > On Internet and in function description pages of dist() from stats and > Dist() from amap, Canberra distance between vectors x and y, d(x,y), is : > > d(x,y) = sum(abs(x-y)/(x+y)) > > But in use, through simple examples, we find that the formula is : > > d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y)) > > with NZ = nb of pairs of coordinates that are different from (0,0) (Non > Zeros) >I think you must try another example. At least in my simple experiments the multiplier seemed to be NZ/NZ or one instead of your almost one, and this one was also the documented case. I could not find any difference to the documentation. However, there is a note about "double zeros" (zero denominator and numerator) in the dist documentation. Could that cause some difference? If you really want to know how the distance is calculated, download the R source file and look at there. If you want to know how the index was originally suggested to be calculated, you must find the Lance & Williams paper in Aust. Comput. J. 1, 15-20, 1967 (I haven't found it, but would be curious to see it). Cheers, jari oksanen
Fr?d?ric Chiroleu wrote:> PS : Be careful with function dudi.pca() from ade4 ; in values, "norm" > doesn't give you what is written in the help page : "norm" returns the > vector of standard deviations of initial variables when you choose > "normed" PCA and the vector of standard deviations of normed variables, > ie 1, when you choose non "normed" PCA. We contacted authors of the > package unsuccessly to rectify the information. > >Dear Frederic, AB Dufour told me that you send it an email, but it would be better to send an email to the adelist, as suggested in the DESCRIPTION file of the package. BTW, this is a problem in the doc, that we will correct. If you consider: dudi1<- dudi.pca(X) dudi&$norm is a vector of values so that dudi1$tab[i,j] = (X[i,j]-dudi1$center[j])/dudi1$norm[j] There is no bug, just a problem in the doc. Your "be careful with function dudi.pca" is a little bit exagerated I think. We will correct it but there is sometimes some delay between a user request and its implementation in the package. Note that we have other activities than the development and the maintenance of ade4 (we have other scientific projects, to teach, to write papers, reviewing activities... like others) and sometimes, we forget to answer to an email... sorry. Our group provide free softwares, free documentation and teaching ressources for ecologists since a long time. This is a free contribution, and I think that it merits more consideration than what your email suggests. Cheers, -- St?phane DRAY (dray at biomserv.univ-lyon1.fr ) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I 43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88 http://biomserv.univ-lyon1.fr/~dray/