Hi folks, Here is the problem. I am giving an example .I want to find a measure of similarity or dissimilarity among ranking (of students of a same class of size say 50)by two judges. But instead of observing the rank of all the 50 students (Where we could have used rank correlation measures)in each case what I have is 2 list of top 20 students chosen by each judge. The following paper gives out a few measures for such problem www.almaden.ibm.com/cs/people/fagin/topk.pdf Now I have written the code for the kendal's - measure of distance here is the code topklist <- function(df1,df2,matchby="name",rankby="pat",p=0.5, normalize=TRUE){ library(gtools) df1$rank <- rank(-df1[,rankby],ties.method="first") df2$rank <- rank(-df2[,rankby],ties.method="first") dftmp <- merge(df1,df2,matchby,all=TRUE) rownames(dftmp) <- dftmp[,matchby] df <- combinations(length(dftmp[,matchby]),2 ,as.character(dftmp[,matchby])) concor <- function(x,dftmp,p){ a <- NA n <- sum(as.numeric(!is.na(dftmp$rank.x))) x <- dftmp[c(x[1],x[2]),c("rank.x","rank.y")] if (all(is.na(x$rank.x)== FALSE) && all(is.na(x$rank.y)==TRUE)) { a <- p} else if (all(is.na(x$rank.x)== TRUE) && all(is.na(x$rank.y)== FALSE)) {a <- p} else { x[is.na(x)] <- n+1 a <- 1 if((x$rank.x[1] > x$rank.x[2] && x$rank.y[1] > x$rank.y[2])|| (x$rank.x[1] < x$rank.x[2] && x$rank.y[1] < x$rank.y[2])) {a <- 0} } a } corr <- (sum(apply(df,1,function(x){concor(x,dftmp,p)}))) if(normalize){ dn <- p*choose(nrow(df1),2)+ p*choose(nrow(df2),2)+ choose(nrow(df1),1)*choose(nrow(df2),1) corr <- corr/dn } corr } Here is the sample use for it df1 <- structure(list(name = structure(c(21L, 12L, 3L, 16L, 15L, 5L, 8L, 23L, 7L, 18L, 4L, 17L, 2L, 6L, 22L, 20L, 10L, 1L, 19L, 14L ), .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X"), class = "factor"), pat = c(2051.55, 679.2, 502.77, 408.14, 278.62, 236.05, 232.44, 215.65, 202.92, 180.13, 172.82, 166.69, 152.82, 150.69, 130.69, 127.81, 121.59, 120.59, 120.42, 120.17 )), .Names = c("name", "pat"), row.names = c(NA, -20L), class "data.frame") df2 <- structure(list(name = structure(c(21L, 12L, 16L, 7L, 5L, 3L, 8L, 4L, 23L, 9L, 15L, 10L, 17L, 14L, 11L, 22L, 24L, 20L, 1L, 13L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X"), class = "factor"), pat = c(1604.25, 690.97, 463.64, 285.23, 280.3, 274.66, 261.84, 251.88, 234.94, 210.12, 202.89, 200.89, 185.43, 167.56, 161.1, 161.1, 155.47, 150.22, 121.19, 115.93)), .Names = c("name", "pat"), row.names = c(NA, -20L), class "data.frame") Now we get the result topklist(df1,df2,matchby="name",rankby="pat",p=0.5) See the measure gives 0 for tqo exactly similar list topklist(df1,df1,matchby="name",rankby="pat",p=0.5) So what do you guys think about this ?? thanks and regards Sayan Dasgupta [[alternative HTML version deleted]]