I'm sure this is a simple task, but how to do it has escaped me. I have imported data from two separate files (each file contains the results from an information retrieval algorithm) organized into a list. They are organized by File,Query, and Rank (in that order): [[1]] Doc Query Rank 5 1 1 9 1 2 7 1 3 5 2 1 7 2 2 9 2 3 [[2]] Doc Query Rank 4 1 1 5 1 2 9 1 3 8 2 1 5 2 2 7 2 3 I need to rearrange the data so that it is sorted by Query and Document, with columns for rank1 and rank2 (from files 1 and 2, respectively). For example: [[1]] Doc Query Rank1 Rank1 4 1 NA 1 5 1 1 2 7 1 3 NA 9 1 2 3 5 2 1 2 7 2 2 3 8 2 NA 1 9 2 3 NA My goal is to perform a Spearman/Kendall test to check the correlation between the rankings. Any help would be appreciated. Andrew Noyes
Use merge:
# test data
both <- list(structure(list(Doc = c(5, 9, 7, 5, 7, 9), Query = c(1, 1,
1, 2, 2, 2), Rank = c(1, 2, 3, 1, 2, 3)), .Names = c("Doc",
"Query",
"Rank"), class = "data.frame", row.names = c("1",
"2", "3", "4",
"5", "6")), structure(list(Doc = c(4, 5, 9, 8, 5, 7), Query
= c(1,
1, 1, 2, 2, 2), Rank = c(1, 2, 3, 1, 2, 3)), .Names = c("Doc",
"Query", "Rank"), class = "data.frame", row.names
= c("1", "2",
"3", "4", "5", "6")))
merge(both[[1]], both[[2]], all = TRUE, by = 1:2)
On 4/23/06, kewley at eden.rutgers.edu <kewley at eden.rutgers.edu>
wrote:> I'm sure this is a simple task, but how to do it has escaped me.
>
> I have imported data from two separate files (each file contains the
> results from an information retrieval algorithm) organized into a list.
> They are organized by File,Query, and Rank (in that order):
>
> [[1]]
> Doc   Query   Rank
> 5     1       1
> 9     1       2
> 7     1       3
> 5     2       1
> 7     2       2
> 9     2       3
>
> [[2]]
> Doc   Query   Rank
> 4     1       1
> 5     1       2
> 9     1       3
> 8     2       1
> 5     2       2
> 7     2       3
>
> I need to rearrange the data so that it is sorted by Query and Document,
> with columns for rank1 and rank2 (from files 1 and 2, respectively). For
> example:
>
> [[1]]
> Doc   Query   Rank1   Rank1
> 4     1       NA      1
> 5     1       1       2
> 7     1       3       NA
> 9     1       2       3
> 5     2       1       2
> 7     2       2       3
> 8     2       NA      1
> 9     2       3       NA
>
> My goal is to perform a Spearman/Kendall test to check the correlation
> between the rankings.
>
> Any help would be appreciated.
>
> Andrew Noyes
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
Suppose your data frames are A and B:
AB <- merge(A, B, c("Doc", "Query"), all=TRUE)
AB[order(AB$Query, AB$Doc),]
gets the answer you are asking for.  (Not sure why you want to sort it to 
use a correlation test, as those are indifferent to ordering.)
On Sun, 23 Apr 2006, kewley at eden.rutgers.edu wrote:
> I'm sure this is a simple task, but how to do it has escaped me.
>
> I have imported data from two separate files (each file contains the
> results from an information retrieval algorithm) organized into a list.
> They are organized by File,Query, and Rank (in that order):
>
> [[1]]
> Doc   Query   Rank
> 5     1       1
> 9     1       2
> 7     1       3
> 5     2       1
> 7     2       2
> 9     2       3
>
> [[2]]
> Doc   Query   Rank
> 4     1       1
> 5     1       2
> 9     1       3
> 8     2       1
> 5     2       2
> 7     2       3
>
> I need to rearrange the data so that it is sorted by Query and Document,
> with columns for rank1 and rank2 (from files 1 and 2, respectively). For
> example:
>
> [[1]]
> Doc   Query   Rank1   Rank1
> 4     1       NA      1
> 5     1       1       2
> 7     1       3       NA
> 9     1       2       3
> 5     2       1       2
> 7     2       2       3
> 8     2       NA      1
> 9     2       3       NA
>
> My goal is to perform a Spearman/Kendall test to check the correlation
> between the rankings.
>
> Any help would be appreciated.
>
> Andrew Noyes
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595