Martin Maechler
2022-Apr-08 11:55 UTC
[R] pam() with more general dissimilarity / distance
I was asked in private, but reply in public, so others can also find this answer in the future: On Fri, Apr 8, 2022 at 1:11 PM ..... wrote :> Hello > dear Dr. Maechler > I have a question about "pam" function in the cluster package. In this > function, we choose one of the euclidean or manhattan distances to > calculate dissimilarity but in the mixed typed data sets the true index may > be jaccard or other indicators. > How can we allocate the "true" metric for each variable? > Best regards >yes, you can use pam() use in two ways; see this part of the help page : Arguments: x: data matrix or data frame, or dissimilarity matrix or object, depending on the value of the ?diss? argument. In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) _are_ allowed-as long as every pair of observations has at least one case not missing. In case of a dissimilarity matrix, ?x? is typically the output of daisy or dist. Also a vector of length n*(n-1)/2 is allowed (where n is the number of observations), and will be interpreted in the same way as the output of the above-mentioned functions. Missing values (NAs) are _not_ allowed. So, you can first use dx <- daisy(x, ...) and use the correct distance between your observational units, After that you can use the computed distance / dissimilarity matrix (the `dx`) in you call to pam(): px <- pam(dx, k=., ....) I hope this helps you. With best regards, Martin -- Martin Maechler ETH Zurich ?