thr3ads.net - R help - [R] Randomising matrices [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Nick Cutler

2007-Apr-27 08:45 UTC

[R] Randomising matrices

I would like to be able to randomise presence-absence (i.e. binary) 
matrices whilst keeping both the row and column totals constant. Is 
there a function in R that would allow me to do this?

I'm working with vegetation presence-absence matrices based on field 
observations. The matrices are formatted to have sites as rows and 
species as columns. The presence of a species on a site is indicated 
with a 1 (absence is obviously indicated with a 0).

I would like to randomise the matrices many times in order to construct 
null models. However, I cannot identify a function in R to do this, and 
the programming looks tricky for someone of my limited skills.

Can anybody help me out?

Many thanks,

Nick Cutler

Institute of Geography
School of Geosciences
University of Edinburgh
Drummond Street
Edinburgh EH8 9XP
United Kingdom

Tel: 0131 650 2532
Web: http://www.geos.ed.ac.uk/homes/s0455078

Finny Kuruvilla

2007-Apr-27 12:00 UTC

head link

[R] Randomising matrices

Hi Nick,

This way isn't the most elegant but works well, especially if the
matrices aren't too large:

# This function works on 2x2 arrays, randomizing them, but
# preserving row and column totals
shuffle_matrix <- function(x) {
   nrow = dim(x)[1]
   ncol = dim(x)[2]
   rmargins <- apply(x,1,sum)
   cmargins <- apply(x,2,sum)

   while(1) {
     shuffled <- array(sample(x,length(x),replace=TRUE),dim=c(nrow,ncol))
     if(all(apply(shuffled,1,sum) == rmargins)) {
       if(all(apply(shuffled,2,sum) == cmargins)) {
         break
       }
     }
   }
   shuffled
}

Example:
> a=array(sample(c(0,1),10,replace=TRUE),dim=c(5,2))
> a      [,1] [,2]
[1,]    0    1
[2,]    1    1
[3,]    0    1
[4,]    0    1
[5,]    1    0> shuffle_matrix(a)      [,1] [,2]
[1,]    0    1
[2,]    1    1
[3,]    1    0
[4,]    0    1
[5,]    0    1

Best,
Finny Kuruvilla

*****************************************************************
Finny Kuruvilla, MD, PhD
Harvard Medical School Fellowship Program in Transfusion Medicine
Broad Institute of MIT and Harvard
Homepage: http://www.people.fas.harvard.edu/~kuruvill/home/


On Fri, 27 Apr 2007, Nick Cutler wrote:
> I would like to be able to randomise presence-absence (i.e. binary)
> matrices whilst keeping both the row and column totals constant. Is
> there a function in R that would allow me to do this?
>
> I'm working with vegetation presence-absence matrices based on field
> observations. The matrices are formatted to have sites as rows and
> species as columns. The presence of a species on a site is indicated
> with a 1 (absence is obviously indicated with a 0).
>
> I would like to randomise the matrices many times in order to construct
> null models. However, I cannot identify a function in R to do this, and
> the programming looks tricky for someone of my limited skills.
>
> Can anybody help me out?
>
> Many thanks,
>
> Nick Cutler
>
> Institute of Geography
> School of Geosciences
> University of Edinburgh
> Drummond Street
> Edinburgh EH8 9XP
> United Kingdom
>
> Tel: 0131 650 2532
> Web: http://www.geos.ed.ac.uk/homes/s0455078
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Gabor Grothendieck

2007-Apr-27 13:12 UTC

head link

[R] Randomising matrices

Try

?r2dtable

On 4/27/07, Nick Cutler <s0455078 at sms.ed.ac.uk>
wrote:> I would like to be able to randomise presence-absence (i.e. binary)
> matrices whilst keeping both the row and column totals constant. Is
> there a function in R that would allow me to do this?
>
> I'm working with vegetation presence-absence matrices based on field
> observations. The matrices are formatted to have sites as rows and
> species as columns. The presence of a species on a site is indicated
> with a 1 (absence is obviously indicated with a 0).
>
> I would like to randomise the matrices many times in order to construct
> null models. However, I cannot identify a function in R to do this, and
> the programming looks tricky for someone of my limited skills.
>
> Can anybody help me out?
>
> Many thanks,
>
> Nick Cutler
>
> Institute of Geography
> School of Geosciences
> University of Edinburgh
> Drummond Street
> Edinburgh EH8 9XP
> United Kingdom
>
> Tel: 0131 650 2532
> Web: http://www.geos.ed.ac.uk/homes/s0455078
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Gavin Simpson

2007-Apr-27 13:19 UTC

head link

[R] Randomising matrices

On Fri, 2007-04-27 at 09:45 +0100, Nick Cutler wrote:

Hi Nick

(Been meaning to reply to your private email to me but I've been away on
vacation and at meeting for several weeks)
> I would like to be able to randomise presence-absence (i.e. binary) 
> matrices whilst keeping both the row and column totals constant. Is 
> there a function in R that would allow me to do this?
I thought r2dtable() would suffice, but this doesn't return binary
matrices.

There appears to be a lot of literature on this - Zaman and Simberloff
(2002, Environmental and Ecological Statistics 9, 405--421) discuss many
previous attempts to do this and present another approach.

I've been interested in this for a little while so cooked up one of
their reviewed methods this morning. It works by choosing at random 2
rows and 2 columns of your matrix, leading to a 2x2 sub matrix of your
original matrix.

If this matrix is:

1 0
0 1

or 

0 1
1 0

then you can swap the 0s and 1s and you haven't altered the row or
column sums any. You do this swap many times as a "burn in" period,
and
then you can sample a random matrix by making one further swap.

The problem with this method is that it might not faithfully represent
the full universe of possible matrices (with row column constraints), in
that you might end up sampling only a string of matrices from a small
region of all possible matrices. One way to get round this is that
following the burn in, you then take a matrix only after the k+1th swap
- i.e. you make k swaps and then draw the matrix, rather than draw after
each swap. Whether you need the skip is debatable (Manly 1995, Ecology
76, 1109-1115).

The appended rBinMat() function implements this method (it is after
Roberts and Stone, 1990, Oecologia 83, 560--567), using a burn in period
of 1000 and a skip step of 100 as defaults that you can change. I have
no idea if these default are sufficient as I've not read 

An example usage is:> set.seed(1234)
> dat <- matrix(sample(c(0,1), 100, replace = TRUE), ncol = 10)
> system.time(ran.mat <- rBinMat(dat))   user  system elapsed 
  0.923   0.002   0.953 > ran.mat      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1    0    0    1    0    1    0    0    1     0
 [2,]    1    0    0    1    1    1    0    0    1     0
 [3,]    1    0    0    0    0    0    1    0    0     0
 [4,]    1    1    1    0    1    0    1    1    0     1
 [5,]    0    0    0    0    1    0    0    0    0     0
 [6,]    1    1    1    1    1    1    1    1    1     1
 [7,]    0    1    0    0    1    0    0    0    0     0
 [8,]    1    0    1    0    0    1    0    0    0     0
 [9,]    0    1    1    0    0    0    0    1    0     0
[10,]    1    0    0    1    1    1    1    1    1     1> identical(rowSums(dat), rowSums(ran.mat))
[1] TRUE> identical(colSums(dat), colSums(ran.mat))[1] TRUE

The size of the matrix is not an issue, but rather the burn in required.
With larger matrices you need to do longer burn in, and make larger
skips perhaps. The 56 species by 28 islands example in Roberts and Stone
(1990), using a burn in of 100 000 (as per their paper), 56 seconds is
needed to get the first matrix and then 0.5 seconds to do the
recommended 1000 skips to get the next matrix. Subsequent matrices take
0.5 seconds to generate on my machine (see below).

rBinMat() returns a single matrix, so if you want to draw n random
matrices you need to run the function n times. But this is wasteful if
you do the burn in each time. So, generate the first matrix as such:

dat <- matrix(sample(c(0,1), 100, replace = TRUE), ncol = 10)
mat <- rBinMat(dat, burn.in = 1000, skip = 100)

then in your loop to generate stats on the NULL models, start from mat
and set burn.in to 0, e.g.

nit <- 1000
for(i in 1:nit) {
   mat2 <- rBinMat(mat, burn.in = 0, skip = 100)
   ### other stats here on the null model
}

Note that there are other ways to do this and the paper St?phane Dray
pointed you to plus the Zaman & Simberloff one I cite above look at
these in more detail.

HTH

G

Here is the function - some of it is a bit clunky, and can surely be
improved.

rBinMat <- function(x, burn.in = 10000, skip = 1000) {
  ## number rows/cols
  n.col <- ncol(x)
  n.row <- nrow(x)
  ## function to draw at random 2 rows and colums
  ## just returns the indices required
  randDraw <- function(x, nr, nc) {
    ran.row <- sample(nr, 2)
    ran.col <- sample(nc, 2)
    return(c(ran.row, ran.col))
  }
  ## is the 2x2 matrix diagonal or anti-diagonal
  isDiag <- function(x) {
    X <- as.vector(x)
    Diag <- aDiag <- FALSE
    if(identical(X, c(1,0,0,1)))
      return(TRUE)
    else if(identical(X, c(0,1,1,0)))
      return(TRUE)
    else
      return(FALSE)
  }
  changed <- 0
  ## do the burn in changes, then skip, then an extra change,
  ## this is then the first random matrix we want to draw
  while(changed <= (burn.in + skip + 1)) {
    want <- randDraw(x, n.row, n.col)
    X <- x[want[1:2], want[3:4]]
    if(isDiag(X)) {
      x[want[1:2], want[3:4]] <- c(1,0)[X + 1]
      changed <- changed + 1}
  }
  return(x)
}
> 
> I'm working with vegetation presence-absence matrices based on field 
> observations. The matrices are formatted to have sites as rows and 
> species as columns. The presence of a species on a site is indicated 
> with a 1 (absence is obviously indicated with a 0).
> 
> I would like to randomise the matrices many times in order to construct 
> null models. However, I cannot identify a function in R to do this, and 
> the programming looks tricky for someone of my limited skills.
> 
> Can anybody help me out?
> 
> Many thanks,
> 
> Nick Cutler
> 
> Institute of Geography
> School of Geosciences
> University of Edinburgh
> Drummond Street
> Edinburgh EH8 9XP
> United Kingdom
> 
> Tel: 0131 650 2532
> Web: http://www.geos.ed.ac.uk/homes/s0455078
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson                 [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Charles C. Berry

2007-Apr-28 18:23 UTC

head link

[R] Randomising matrices

Nick Cutler <s0455078 <at> sms.ed.ac.uk> writes:
> 
> I would like to be able to randomise presence-absence (i.e. binary) 
> matrices whilst keeping both the row and column totals constant. Is 
> there a function in R that would allow me to do this?
> 
> I'm working with vegetation presence-absence matrices based on field 
> observations. The matrices are formatted to have sites as rows and 
> species as columns. The presence of a species on a site is indicated 
> with a 1 (absence is obviously indicated with a 0).
> 
> I would like to randomise the matrices many times in order to construct 
> null models. However, I cannot identify a function in R to do this, and 
> the programming looks tricky for someone of my limited skills.
> 
> Can anybody help me out?
Nick,

For a 1001 x 1001 matrix, this method takes less than 2 seconds on my 2 year old
Windows PC.

ronetab( marg1, marg2 ) returns a table of 0's and 1's according to the
marginal
contraints. 

ck.ronetab( marg1, marg2 ) checks that all the constraints were honored.


msample <- function(x,marg)
{
  ## Purpose: sample at most one each from each cell of a margin
  ## ----------------------------------------------------------------------
  ## Arguments: x - number to sample, marg - a vector of integers
  ## ----------------------------------------------------------------------
  ## Author: Charles C. Berry, Date: 28 Apr 2007, 08:17
  ## GPL 2.0 or better

  if (!(x<=sum(marg) && all(marg>=0))) browser()
  wm <- which(marg!=0)
  if (length(wm)==1) {
    wm
  } else {
    sample( seq(along=marg), x, prob=marg )
  }
}

ronetab <- function(m1,m2,debug=F)
{
  ## Purpose: sample from a table with fixed margins and {0,1} cell values
  ## ----------------------------------------------------------------------
  ## Arguments: m1, m2 - two margins
  ## ----------------------------------------------------------------------
  ## Author: Charles C. Berry, Date: 28 Apr 2007, 08:21
  ## GPL 2.0 or better

  stopifnot( sum(m1)==sum(m2)|| max(m1)>length(m2) || max(m2)>length(m1) )
  
  i.list <- j.list <- list()
  k <- 0
  while( sum(m1)>0 ){
    k <- k+1
    if ( sum(m1!=0) > sum(m2!=0) ){
      i <- which.max( m1)
      j <- msample( m1[i], m2 )
      i.list[[ k ]] <- rep( i, m1[i] )
      j.list[[ k ]] <- j
      m1[i] <- 0
      m2[ j ] <- m2[ j ] - 1
    } else {
      j <- which.max( m2 )
      i <- msample( m2[j], m1 )
      i.list[[ k ]] <- i
      j.list[[ k ]] <- rep( j, m2[j] )
      m2[j] <- 0
      m1[ i ] <- m1[ i ] - 1
    }
  }
  res <- array(0, c(length(m1), length(m2) ) )
  res[ cbind( unlist(i.list), unlist(j.list) ) ] <- 1
  res
}

ck.ronetab <- function(m1,m2){
  tab <- ronetab(m1,m2)
  m1.ck <- all(m1==rowSums(tab))
  m2.ck <- all(m2==colSums(tab))
  cell.ck <- all( tab %in% 0:1 )
  res <- m1.ck & m2.ck & cell.ck
  if (!res) attr(res,"tab") <- tab
  res
}

I'll warn you that I have not worked through what looks to be a tedious
proof
that this randomly samples matrices under the constraints. The heuristics seem
right, and a few simulation spot checks look reasonable. If you do not want to
trust it, you can still use it to generate a starting value for an MCMC run.

HTH,

Chuck

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Apr 2007 - Randomising matrices

[R] Randomising matrices

[R] Randomising matrices

[R] Randomising matrices

[R] Randomising matrices

[R] Randomising matrices

Possibly Parallel Threads