I'm sure this is a simple task, but how to do it has escaped me. I have imported data from two separate files (each file contains the results from an information retrieval algorithm) organized into a list. They are organized by File,Query, and Rank (in that order): [[1]] Doc Query Rank 5 1 1 9 1 2 7 1 3 5 2 1 7 2 2 9 2 3 [[2]] Doc Query Rank 4 1 1 5 1 2 9 1 3 8 2 1 5 2 2 7 2 3 I need to rearrange the data so that it is sorted by Query and Document, with columns for rank1 and rank2 (from files 1 and 2, respectively). For example: [[1]] Doc Query Rank1 Rank1 4 1 NA 1 5 1 1 2 7 1 3 NA 9 1 2 3 5 2 1 2 7 2 2 3 8 2 NA 1 9 2 3 NA My goal is to perform a Spearman/Kendall test to check the correlation between the rankings. Any help would be appreciated. Andrew Noyes
Use merge: # test data both <- list(structure(list(Doc = c(5, 9, 7, 5, 7, 9), Query = c(1, 1, 1, 2, 2, 2), Rank = c(1, 2, 3, 1, 2, 3)), .Names = c("Doc", "Query", "Rank"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6")), structure(list(Doc = c(4, 5, 9, 8, 5, 7), Query = c(1, 1, 1, 2, 2, 2), Rank = c(1, 2, 3, 1, 2, 3)), .Names = c("Doc", "Query", "Rank"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))) merge(both[[1]], both[[2]], all = TRUE, by = 1:2) On 4/23/06, kewley at eden.rutgers.edu <kewley at eden.rutgers.edu> wrote:> I'm sure this is a simple task, but how to do it has escaped me. > > I have imported data from two separate files (each file contains the > results from an information retrieval algorithm) organized into a list. > They are organized by File,Query, and Rank (in that order): > > [[1]] > Doc Query Rank > 5 1 1 > 9 1 2 > 7 1 3 > 5 2 1 > 7 2 2 > 9 2 3 > > [[2]] > Doc Query Rank > 4 1 1 > 5 1 2 > 9 1 3 > 8 2 1 > 5 2 2 > 7 2 3 > > I need to rearrange the data so that it is sorted by Query and Document, > with columns for rank1 and rank2 (from files 1 and 2, respectively). For > example: > > [[1]] > Doc Query Rank1 Rank1 > 4 1 NA 1 > 5 1 1 2 > 7 1 3 NA > 9 1 2 3 > 5 2 1 2 > 7 2 2 3 > 8 2 NA 1 > 9 2 3 NA > > My goal is to perform a Spearman/Kendall test to check the correlation > between the rankings. > > Any help would be appreciated. > > Andrew Noyes > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Suppose your data frames are A and B: AB <- merge(A, B, c("Doc", "Query"), all=TRUE) AB[order(AB$Query, AB$Doc),] gets the answer you are asking for. (Not sure why you want to sort it to use a correlation test, as those are indifferent to ordering.) On Sun, 23 Apr 2006, kewley at eden.rutgers.edu wrote:> I'm sure this is a simple task, but how to do it has escaped me. > > I have imported data from two separate files (each file contains the > results from an information retrieval algorithm) organized into a list. > They are organized by File,Query, and Rank (in that order): > > [[1]] > Doc Query Rank > 5 1 1 > 9 1 2 > 7 1 3 > 5 2 1 > 7 2 2 > 9 2 3 > > [[2]] > Doc Query Rank > 4 1 1 > 5 1 2 > 9 1 3 > 8 2 1 > 5 2 2 > 7 2 3 > > I need to rearrange the data so that it is sorted by Query and Document, > with columns for rank1 and rank2 (from files 1 and 2, respectively). For > example: > > [[1]] > Doc Query Rank1 Rank1 > 4 1 NA 1 > 5 1 1 2 > 7 1 3 NA > 9 1 2 3 > 5 2 1 2 > 7 2 2 3 > 8 2 NA 1 > 9 2 3 NA > > My goal is to perform a Spearman/Kendall test to check the correlation > between the rankings. > > Any help would be appreciated. > > Andrew Noyes > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595