Dear R-helpers, First of all, a happy new year to everyone! I succesfully used the daisy function (from package cluster) to find which two rows from a dataframe differ by only one value, and I now want to come up with a simpler way to find _which_ value makes the difference between any such pair of two rows. Consider a very small example (the actual data counts thousands of rows): input <- matrix(letters[c(1,2,1,2,2,3,2,1,1,2,2,2)], ncol=3) > input X1 X2 X3 1 a b a 2 b c b 3 a b b 4 b a b I am interested by the rows which differ by one value only; I easily do that with: library(cluster) distance <- daisy(as.data.frame(input))*ncol(input) > distance Dissimilarities : 1 2 3 2 3 3 1 2 4 3 1 2 Metric : mixed ; Types = N, N, N Number of objects : 4 The first and the third rows differ only with respect to variable V3, and the second and the fourth rows differ only with respect to variable V2. Now I want to replace the different values by an "x"; currently my code is: distance <- as.matrix(distance) distance[!upper.tri(distance)] <- NA to.be.compared <- as.matrix(which(distance == 1, arr.ind=T)) logical.result <- t(apply(to.be.compared, 1, 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 function(idx) {input[idx[1], ] == input[idx[2], ]})) result <- t(sapply(1:nrow(to.be.compared), 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 function(idx) {input[to.be.compared[idx, 1], ]})) result[!logical.result] <- "x" > as.data.frame(result) V1 V2 V3 1 a b x 2 b x b I wonder if the daisy function could be persuaded to output a similar object as the dissimilarities one; it would be fantastic to also get something like: First.difference.found: 1 2 3 2 1 3 3 1 4 1 2 1 Here, 3 means the third variable (V3) that the first and third rows differ on. I could try to do that myself, but I don't know where to find the Fortran code daisy uses. Thanks for any hint, Adrian -- Adrian DUSA Romanian Social Data Archive 1, Schitu Magureanu Bd 050025 Bucharest sector 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101