Let's suppose I have userids and associated attributes... columns a and b a <- c(1,1,1,2,2,3,3,3,3) b <- c("a","b","c","a","d","a", "b", "e", "f") so a unique list of a would be id <- unique(a) I want a matrix like this... [,1] [,2] [,3] [1,] 3 1 2 [2,] 1 2 1 [3,] 2 1 4 Where element i,j is the number of items in b that id[i] and id[j] share... So for example, in element [1,3] of the result matrix, I want to see 2. That is, id's 1 and 3 share two common elements in b, namely "a" and "b". This is hard to articulate, so sorry for the terrible description here. The way I have solved it is to do a double loop, looping over every member of the id column and comparing it to every other member of id to see how many elements of b they share. This takes forever. Thanks cn
Here's one possibility:> library(ecodist) > a <- c(1,1,1,2,2,3,3,3,3) > b <- c("a","b","c","a","d","a", "b", "e", "f") > > x <- crosstab(a, b, rep(1, length(a))) > xa b c d e f 1 1 1 1 0 0 0 2 1 0 0 1 0 0 3 1 1 0 0 1 1> x %*% t(x)1 2 3 1 3 1 2 2 1 2 1 3 2 1 4 Sarah On Thu, Nov 4, 2010 at 3:42 PM, cory n <corynissen at gmail.com> wrote:> Let's suppose I have userids and associated attributes... ?columns a and b > > a <- c(1,1,1,2,2,3,3,3,3) > b <- c("a","b","c","a","d","a", "b", "e", "f") > > so a unique list of a would be > > id <- unique(a) > > I want a matrix like this... > > ? ? [,1] [,2] [,3] > [1,] ? ?3 ? ?1 ? ?2 > [2,] ? ?1 ? ?2 ? ?1 > [3,] ? ?2 ? ?1 ? ?4 > > Where element i,j is the number of items in b that id[i] and id[j] share... > > So for example, in element [1,3] of the result matrix, I want to see > 2. ?That is, id's 1 and 3 share two common elements in b, namely "a" > and "b". > > This is hard to articulate, so sorry for the terrible description > here. ?The way I have solved it is to do a double loop, looping over > every member of the id column and comparing it to every other member > of id to see how many elements of b they share. ?This takes forever. > > Thanks > > cn >-- Sarah Goslee http://www.functionaldiversity.org
Hi: To mimic Sarah Goslee's reply within base R, either of these work: crossprod(t(as.matrix(xtabs( ~ a + b)))) crossprod(t(as.matrix(table(a, b)))) HTH, Dennis On Thu, Nov 4, 2010 at 12:42 PM, cory n <corynissen@gmail.com> wrote:> Let's suppose I have userids and associated attributes... columns a and b > > a <- c(1,1,1,2,2,3,3,3,3) > b <- c("a","b","c","a","d","a", "b", "e", "f") > > so a unique list of a would be > > id <- unique(a) > > I want a matrix like this... > > [,1] [,2] [,3] > [1,] 3 1 2 > [2,] 1 2 1 > [3,] 2 1 4 > > Where element i,j is the number of items in b that id[i] and id[j] share... > > So for example, in element [1,3] of the result matrix, I want to see > 2. That is, id's 1 and 3 share two common elements in b, namely "a" > and "b". > > This is hard to articulate, so sorry for the terrible description > here. The way I have solved it is to do a double loop, looping over > every member of the id column and comparing it to every other member > of id to see how many elements of b they share. This takes forever. > > Thanks > > cn > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
And to wrap it up and help you choose, here are four functions based on these emails (the first one is my own slight variant): library(ecodist) a <- sample(1:1000, 10^4, replace = TRUE) b <- sample(letters[1:6], 10^4, replace = TRUE) foo1 <- function() { x <- table(a, b) return(x %*% t(x)) } foo2 <- function() { x <- crosstab(a, b, rep(1, length(a))) return(x %*% t(x)) } foo3 <- function() { sapply(1:1000, function(y) { sapply(1:1000, function(x) { length(intersect(b[a==y], b[a==x])) }) }) } foo4 <- function() {crossprod(t(as.matrix(table(a, b))))}> system.time(x1 <- foo1())user system elapsed 0.028 0.008 0.038> system.time(x2 <- foo2())user system elapsed 0.076 0.008 0.087 ## I got tired of waiting> system.time(x3 <- foo3())<menu-bar> <signals> <break> Timing stopped at: 104.951 1.336 110.909> system.time(x4 <- foo4())user system elapsed 0.024 0.020 0.043> all.equal(x1, x2, check.attributes = FALSE)[1] TRUE> all.equal(x1, x4, check.attributes = FALSE)[1] TRUE This suggests the speeds are: foo1 < foo4 < foo2 < foo3 Cheers, Josh On Thu, Nov 4, 2010 at 12:42 PM, cory n <corynissen at gmail.com> wrote:> Let's suppose I have userids and associated attributes... ?columns a and b > > a <- c(1,1,1,2,2,3,3,3,3) > b <- c("a","b","c","a","d","a", "b", "e", "f") > > so a unique list of a would be > > id <- unique(a) > > I want a matrix like this... > > ? ? [,1] [,2] [,3] > [1,] ? ?3 ? ?1 ? ?2 > [2,] ? ?1 ? ?2 ? ?1 > [3,] ? ?2 ? ?1 ? ?4 > > Where element i,j is the number of items in b that id[i] and id[j] share... > > So for example, in element [1,3] of the result matrix, I want to see > 2. ?That is, id's 1 and 3 share two common elements in b, namely "a" > and "b". > > This is hard to articulate, so sorry for the terrible description > here. ?The way I have solved it is to do a double loop, looping over > every member of the id column and comparing it to every other member > of id to see how many elements of b they share. ?This takes forever. > > Thanks > > cn > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/