thr3ads.net - R help - [R] avoid a loop [Nov 2010]

If this information is useful, please help other people find it:
Share via:

cory n

2010-Nov-04 19:42 UTC

[R] avoid a loop

Let's suppose I have userids and associated attributes...  columns a and b

a <- c(1,1,1,2,2,3,3,3,3)
b <-
c("a","b","c","a","d","a",
"b", "e", "f")

so a unique list of a would be

id <- unique(a)

I want a matrix like this...

     [,1] [,2] [,3]
[1,]    3    1    2
[2,]    1    2    1
[3,]    2    1    4

Where element i,j is the number of items in b that id[i] and id[j] share...

So for example, in element [1,3] of the result matrix, I want to see
2.  That is, id's 1 and 3 share two common elements in b, namely
"a"
and "b".

This is hard to articulate, so sorry for the terrible description
here.  The way I have solved it is to do a double loop, looping over
every member of the id column and comparing it to every other member
of id to see how many elements of b they share.  This takes forever.

Thanks

cn

Sarah Goslee

2010-Nov-04 20:24 UTC

head link

[R] avoid a loop

Here's one possibility:
> library(ecodist)
> a <- c(1,1,1,2,2,3,3,3,3)
> b <-
c("a","b","c","a","d","a",
"b", "e", "f")
>
> x <- crosstab(a, b, rep(1, length(a)))
> x  a b c d e f
1 1 1 1 0 0 0
2 1 0 0 1 0 0
3 1 1 0 0 1 1> x %*% t(x)  1 2 3
1 3 1 2
2 1 2 1
3 2 1 4

Sarah

On Thu, Nov 4, 2010 at 3:42 PM, cory n <corynissen at gmail.com>
wrote:> Let's suppose I have userids and associated attributes... ?columns a
and b
>
> a <- c(1,1,1,2,2,3,3,3,3)
> b <-
c("a","b","c","a","d","a",
"b", "e", "f")
>
> so a unique list of a would be
>
> id <- unique(a)
>
> I want a matrix like this...
>
> ? ? [,1] [,2] [,3]
> [1,] ? ?3 ? ?1 ? ?2
> [2,] ? ?1 ? ?2 ? ?1
> [3,] ? ?2 ? ?1 ? ?4
>
> Where element i,j is the number of items in b that id[i] and id[j] share...
>
> So for example, in element [1,3] of the result matrix, I want to see
> 2. ?That is, id's 1 and 3 share two common elements in b, namely
"a"
> and "b".
>
> This is hard to articulate, so sorry for the terrible description
> here. ?The way I have solved it is to do a double loop, looping over
> every member of the id column and comparing it to every other member
> of id to see how many elements of b they share. ?This takes forever.
>
> Thanks
>
> cn
>


-- 
Sarah Goslee
http://www.functionaldiversity.org

Dennis Murphy

2010-Nov-04 20:40 UTC

head link

[R] avoid a loop

Hi:

To mimic Sarah Goslee's reply within base R, either of these work:

crossprod(t(as.matrix(xtabs( ~ a + b))))
crossprod(t(as.matrix(table(a, b))))

HTH,
Dennis

On Thu, Nov 4, 2010 at 12:42 PM, cory n <corynissen@gmail.com> wrote:
> Let's suppose I have userids and associated attributes...  columns a
and b
>
> a <- c(1,1,1,2,2,3,3,3,3)
> b <-
c("a","b","c","a","d","a",
"b", "e", "f")
>
> so a unique list of a would be
>
> id <- unique(a)
>
> I want a matrix like this...
>
>     [,1] [,2] [,3]
> [1,]    3    1    2
> [2,]    1    2    1
> [3,]    2    1    4
>
> Where element i,j is the number of items in b that id[i] and id[j] share...
>
> So for example, in element [1,3] of the result matrix, I want to see
> 2.  That is, id's 1 and 3 share two common elements in b, namely
"a"
> and "b".
>
> This is hard to articulate, so sorry for the terrible description
> here.  The way I have solved it is to do a double loop, looping over
> every member of the id column and comparing it to every other member
> of id to see how many elements of b they share.  This takes forever.
>
> Thanks
>
> cn
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Joshua Wiley

2010-Nov-04 20:57 UTC

head link

[R] avoid a loop

And to wrap it up and help you choose, here are four functions based
on these emails (the first one is my own slight variant):

library(ecodist)
a <- sample(1:1000, 10^4, replace = TRUE)
b <- sample(letters[1:6], 10^4, replace = TRUE)

foo1 <- function() {
  x <- table(a, b)
  return(x %*% t(x))
}

foo2 <- function() {
  x <- crosstab(a, b, rep(1, length(a)))
  return(x %*% t(x))
}

foo3 <- function() {
  sapply(1:1000, function(y) {
    sapply(1:1000, function(x) {
      length(intersect(b[a==y], b[a==x]))
    })
  })
}

foo4 <- function() {crossprod(t(as.matrix(table(a, b))))}
> system.time(x1 <- foo1())   user  system elapsed
  0.028   0.008   0.038> system.time(x2 <- foo2())   user  system elapsed
  0.076   0.008   0.087
## I got tired of waiting> system.time(x3 <- foo3())  <menu-bar> <signals> <break>
Timing stopped at: 104.951 1.336 110.909> system.time(x4 <- foo4())   user  system elapsed
  0.024   0.020   0.043
> all.equal(x1, x2, check.attributes = FALSE)
[1] TRUE> all.equal(x1, x4, check.attributes = FALSE)[1] TRUE

This suggests the speeds are:

foo1 < foo4 < foo2 < foo3

Cheers,

Josh

On Thu, Nov 4, 2010 at 12:42 PM, cory n <corynissen at gmail.com>
wrote:> Let's suppose I have userids and associated attributes... ?columns a
and b
>
> a <- c(1,1,1,2,2,3,3,3,3)
> b <-
c("a","b","c","a","d","a",
"b", "e", "f")
>
> so a unique list of a would be
>
> id <- unique(a)
>
> I want a matrix like this...
>
> ? ? [,1] [,2] [,3]
> [1,] ? ?3 ? ?1 ? ?2
> [2,] ? ?1 ? ?2 ? ?1
> [3,] ? ?2 ? ?1 ? ?4
>
> Where element i,j is the number of items in b that id[i] and id[j] share...
>
> So for example, in element [1,3] of the result matrix, I want to see
> 2. ?That is, id's 1 and 3 share two common elements in b, namely
"a"
> and "b".
>
> This is hard to articulate, so sorry for the terrible description
> here. ?The way I have solved it is to do a double loop, looping over
> every member of the id column and comparing it to every other member
> of id to see how many elements of b they share. ?This takes forever.
>
> Thanks
>
> cn
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Reasonably Related Threads

Search for more apparently analagous threads

R help - Nov 2010 - avoid a loop

[R] avoid a loop

[R] avoid a loop

[R] avoid a loop

[R] avoid a loop

Reasonably Related Threads