Hi, All, I have an n by m matrix with each entry between 1 and 15000. I want to know the frequency of each pair in 1:15000 that occur together in rows. So for example, if the matrix is 2 5 1 6 1 7 8 2 3 7 6 2 9 8 5 7 Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to return the value 2 for this pair as well as that for all pairs. Is there a fast way to do this avoiding loops? Loops take too long. Thank you, Cindy [[alternative HTML version deleted]]
cindy Guo wrote:> > Hi, All, > > I have an n by m matrix with each entry between 1 and 15000. I want to > know > the frequency of each pair in 1:15000 that occur together in rows. So for > example, if the matrix is > 2 5 1 6 > 1 7 8 2 > 3 7 6 2 > 9 8 5 7 > Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to return > the value 2 for this pair as well as that for all pairs. Is there a fast > way > to do this avoiding loops? Loops take too long. > > Thank you, > > Cindy >Use %in% to check for the presence of the numbers in a row and apply() to efficiently execute the test for each row: tstMatrix <- matrix( c(2,5,1,6, 1,7,8,2, 3,7,6,2, 9,8,5,7), nrow=4, byrow=T ) matches <- apply( tstMatrix, 1, function( row ){ if( 2 %in% row & 6 %in% row ){ return( 2 ) } else { return( 0 ) } }) matches [1] 2 0 2 0 If you have more than one pair, it gets a little tricky. Say you are also looking for the pair (7,8). Store them as a list: pairList <- list( c(2,6), c(7,8) ) Then use sapply() to efficiently iterate over the pair list and execute the apply() test: matchMatrix <- sapply( pairList, function( pair ){ matches <- apply( tstMatrix, 1, function( row ){ if( pair[1] %in% row & pair[2] %in% row ){ return( pair[1] ) } else { return( 0 ) } }) return( matches ) }) matchMatrix [,1] [,2] [1,] 2 0 [2,] 0 7 [3,] 2 0 [4,] 0 7 If you're looking to apply the above method to every possible permutation of 2 numbers that may be generated from the range of numbers 1:15000... that's 225,000,000 pairs. expand.grid() can generate the required pair list-- but that step alone causes a memory allocation of ~6 GB on my machine. If you don't have a pile of CPU cores and RAM at your disposal, you can probably: 1. Restrict the upper end of your range to the maximal entry present in your matrix since all other combinations have zero occurrences. 2. Break the list of pairs up into several sublists, run the tests, and aggregate the results. Either way, the analysis will take some time despite the efficiencies of the apply family of functions due to the shear size of the problem. If you have more than one CPU, I would recommend taking a look at parallelized apply functions, perhaps using a package like snowfall, as the testing of the pairs is an "embarrassingly parallel" problem. Hopefully I'm misunderstanding the scope of your problem. Good luck! -Charlie ----- Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University -- View this message in context: http://old.nabble.com/pairs-tp26364801p26365206.html Sent from the R help mailing list archive at Nabble.com.
Hope this help:> m <- matrix(c(2,1,3,9,5,7,7,8,1,8,6,5,6,2,2,7),4,4) > p <- c(2, 6)> apply(m == p[1], 1, any) & apply(m == p[2], 1, any)[1] TRUE FALSE TRUE FALSE If you want the number of rows which contain the pair, sum() could be used:> sum(apply(m == p[1], 1, any) & apply(m == p[2], 1, any))[1] 2 On Mon, Nov 16, 2009 at 6:26 AM, cindy Guo <cindy.guo3 at gmail.com> wrote:> Hi, All, > > I have an n by m matrix with each entry between 1 and 15000. I want to know > the frequency of each pair in 1:15000 that occur together in rows. So for > example, if the matrix is > 2 5 1 6 > 1 7 8 2 > 3 7 6 2 > 9 8 5 7 > Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to return > the value 2 for this pair as well as that for all pairs. Is there a fast way > to do this avoiding loops? Loops take too long. > > Thank you, > > Cindy > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
I could of course be wrong but have you yet specified the number of columns for this pairing exercise? On Nov 15, 2009, at 5:26 PM, cindy Guo wrote:> Hi, All, > > I have an n by m matrix with each entry between 1 and 15000. I want > to know > the frequency of each pair in 1:15000 that occur together in rows. > So for > example, if the matrix is > 2 5 1 6 > 1 7 8 2 > 3 7 6 2 > 9 8 5 7 > Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to > return > the value 2 for this pair as well as that for all pairs. Is there a > fast way > to do this avoiding loops? Loops take too long. > > and provide commented, minimal, self-contained, reproducible code.^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ David Winsemius, MD Heritage Laboratories West Hartford, CT
Assuming that the number of columns is 4, then consider this approach: > prs <-scan() 1: 2 5 1 6 5: 1 7 8 2 9: 3 7 6 2 13: 9 8 5 7 17: Read 16 items prmtx <- matrix(prs, 4,4, byrow=T) #Now make copus of x.y and y.x pair.str <- sapply(1:nrow(prmtx), function(z) c(apply(combn(prmtx[z,], 2), 2,function(x) paste(x[1],x[2], sep=".")) , apply(combn(prmtx[z,], 2), 2,function(x) paste(x[2],x[1], sep="."))) ) tpair <-table(pair.str) # This then gives you a duplicated list > tpair[tpair>1] pair.str 1.2 2.1 2.6 2.7 6.2 7.2 7.8 8.7 2 2 2 2 2 2 2 2 # So only take the first half of the pairs: > head(tpair[tpair>1], sum(tpair>1)/2) pair.str 1.2 2.1 2.6 2.7 2 2 2 2 -- David. On Nov 15, 2009, at 8:06 PM, David Winsemius wrote:> I could of course be wrong but have you yet specified the number of > columns for this pairing exercise? > > On Nov 15, 2009, at 5:26 PM, cindy Guo wrote: > >> Hi, All, >> >> I have an n by m matrix with each entry between 1 and 15000. I want >> to know >> the frequency of each pair in 1:15000 that occur together in rows. >> So for >> example, if the matrix is >> 2 5 1 6 >> 1 7 8 2 >> 3 7 6 2 >> 9 8 5 7 >> Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to >> return >> the value 2 for this pair as well as that for all pairs. Is there a >> fast way >> to do this avoiding loops? Loops take too long. >> >> and provide commented, minimal, self-contained, reproducible code. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT