On Tue, 2004-11-09 at 12:59, Alessio Boattini wrote:> Dear All,
>
> I would like to ask clarifications on the gower distnce matrix calculated
by the function gdistin the library mvpart.
> Here is a dummy example:
>
> > library(mvpart)
> Loading required package: survival
> Loading required package: splines
> mvpart package loaded: extends rpart to include
> multivariate and distance-based partitioning
> > x=matrix(1:6, byrow=T, ncol=2)
> > x
> [,1] [,2]
> [1,] 1 2
> [2,] 3 4
> [3,] 5 6
> > gdist(x, method="euclid")
> 1 2
> 2 2.828427
> 3 5.656854 2.828427
>
> ##########################
> doing the calculations by hand according to the formula in gdist help page
I get the same results. The formula given is:
> 'euclidean' d[jk] = sqrt(sum (x[ij]-x[ik])^2)
> #################################
>
> > sqrt(8)
> [1] 2.828427
> > gdist(x, method="gower")
> 1 2
> 2 0.7071068
> 3 1.4142136 0.7071068
>
> #######################################
> doing the calculations by hand according to the formula in gdist help page
cannot reproduce the same results. The formula given is:
> 'gower' d[jk] = sum (abs(x[ij]-x[ik])/(max(i)-min(i))
> ##########################################
>
> Could anybody please shed some light?
>
There seems to be a bug in documentation. The function uses different
calculation than the help page specifies. Look at the 'gdist' code. Just
to make things easier: In the function body, gower is method 6, and
Euclidean distances are method 2.
Gower's original paper is available through http://www.jstor.org/
(Biometrics Vol. 27, No. 4, p. 857-871; 1971).
cheers, jari oksanen
--
Jari Oksanen <jarioksa at sun3.oulu.fi>