thr3ads.net - R help - [R] Correct use of the cluster::daisy function [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Stefan Petersson

2013-Jan-08 22:39 UTC

[R] Correct use of the cluster::daisy function

Hi,

I have two groups, and I want to find the dissimiarity between the members
of the two groups. Since I have mixed level variables on the members, I opt
for the daisy function in the cluster package.

Let's pretend that the following represent my groups:

x <- data.frame(sex=factor(c(1,0,0,1,0,1),
    levels=0:1,
labels=c('Male','Female'),
ordered=FALSE),
  age=c(31,28,25,30,29,28),
  x=c(1,0,0,1,0,1)
)
y <- data.frame(sex=factor(c(1,0,0),
    levels=0:1,
labels=c('Male','Female'),
ordered=FALSE),
  age=c(27,30,30),
  x=c(0,1,0)
)

Now I'm thinking that I take the first member of group x and compare to the
first member of group y. I bind the two members together since daisy
computes dissimilarities row-wise. Like this:

library(cluster)

daisy(rbind(x[1, ], y[1, ]))

Is this is correct thinking? If so, I can build a nested loop to calculate
the full between group dissimilarity matrix. Like this (ignoring the
performance issues of the for loop):

m <- matrix(nrow=nrow(x), ncol=nrow(y))
for(i in 1:nrow(x)){
  for(j in 1:nrow(y)){
    m[i,j]<- daisy(rbind(x[i, ], y[j, ]))[1]
  }
}

I get a matrix with group x along the rows and group y along the columns.

m

Now, looking at the first row of the result matrix; is it meaningful to say
that subject x1 is 'closest' to subject y1 and y3 as they have the
lowest
dissimilarity coefficient? And that x2 is closest to y3, by the same logic?
Etc...

Thanks a lot in advnce!

	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jan 2013 - Correct use of the cluster::daisy function

[R] Correct use of the cluster::daisy function

Reasonably Related Threads

Wisdom of the Ancients