thr3ads.net - R help - [R] more on the daisy function [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Adrian DUSA

2006-Jan-05 11:07 UTC

[R] more on the daisy function

Dear R-helpers,

First of all, a happy new year to everyone!

I succesfully used the daisy function (from package cluster) to find which two 
rows from a dataframe differ by only one value, and I now want to come up with 
a simpler way to find _which_ value makes the difference between any such 
pair of two rows.
Consider a very small example (the actual data counts thousands of rows):

   input <- matrix(letters[c(1,2,1,2,2,3,2,1,1,2,2,2)], ncol=3)

   > input
     X1 X2 X3
   1  a  b  a
   2  b  c  b
   3  a  b  b
   4  b  a  b

I am interested by the rows which differ by one value only; I easily do that 
with:

   library(cluster)
   distance <- daisy(as.data.frame(input))*ncol(input)

   > distance
   Dissimilarities :
     1 2 3
   2 3
   3 1 2
   4 3 1 2

   Metric :  mixed ;  Types = N, N, N
   Number of objects : 4


The first and the third rows differ only with respect to variable V3, and the 
second and the fourth rows differ only with respect to variable V2.


Now I want to replace the different values by an "x"; currently my
code is:

   distance <- as.matrix(distance)
   distance[!upper.tri(distance)] <- NA
   to.be.compared <- as.matrix(which(distance == 1, arr.ind=T))
   logical.result <- t(apply(to.be.compared, 1,
    伮仩 伮仩 伮仩 伮仩 伮仩  伮仩 伮仩 伮仩 伮仩 function(idx) {input[idx[1], ] == input[idx[2],
]}))
   result <- t(sapply(1:nrow(to.be.compared), 
   伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 伮仩 function(idx) {input[to.be.compared[idx, 1],
]}))
   result[!logical.result] <- "x"

   > as.data.frame(result)
     V1 V2 V3
   1  a  b  x
   2  b  x  b

I wonder if the daisy function could be persuaded to output a similar object 
as the dissimilarities one; it would be fantastic to also get something like:

   First.difference.found:
     1 2 3
   2 1
   3 3 1
   4 1 2 1

Here, 3 means the third variable (V3) that the first and third rows differ on. 
I could try to do that myself, but I don't know where to find the Fortran 
code daisy uses.

Thanks for any hint,
Adrian

-- 
Adrian DUSA
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101

Maybe Matching Threads

Search for more reasonably related threads

R help - Jan 2006 - more on the daisy function

[R] more on the daisy function

Maybe Matching Threads

Wisdom of the Ancients