I would like to be able to randomise presence-absence (i.e. binary) matrices whilst keeping both the row and column totals constant. Is there a function in R that would allow me to do this? I'm working with vegetation presence-absence matrices based on field observations. The matrices are formatted to have sites as rows and species as columns. The presence of a species on a site is indicated with a 1 (absence is obviously indicated with a 0). I would like to randomise the matrices many times in order to construct null models. However, I cannot identify a function in R to do this, and the programming looks tricky for someone of my limited skills. Can anybody help me out? Many thanks, Nick Cutler Institute of Geography School of Geosciences University of Edinburgh Drummond Street Edinburgh EH8 9XP United Kingdom Tel: 0131 650 2532 Web: http://www.geos.ed.ac.uk/homes/s0455078
Hi Nick, This way isn't the most elegant but works well, especially if the matrices aren't too large: # This function works on 2x2 arrays, randomizing them, but # preserving row and column totals shuffle_matrix <- function(x) { nrow = dim(x)[1] ncol = dim(x)[2] rmargins <- apply(x,1,sum) cmargins <- apply(x,2,sum) while(1) { shuffled <- array(sample(x,length(x),replace=TRUE),dim=c(nrow,ncol)) if(all(apply(shuffled,1,sum) == rmargins)) { if(all(apply(shuffled,2,sum) == cmargins)) { break } } } shuffled } Example:> a=array(sample(c(0,1),10,replace=TRUE),dim=c(5,2)) > a[,1] [,2] [1,] 0 1 [2,] 1 1 [3,] 0 1 [4,] 0 1 [5,] 1 0> shuffle_matrix(a)[,1] [,2] [1,] 0 1 [2,] 1 1 [3,] 1 0 [4,] 0 1 [5,] 0 1 Best, Finny Kuruvilla ***************************************************************** Finny Kuruvilla, MD, PhD Harvard Medical School Fellowship Program in Transfusion Medicine Broad Institute of MIT and Harvard Homepage: http://www.people.fas.harvard.edu/~kuruvill/home/ On Fri, 27 Apr 2007, Nick Cutler wrote:> I would like to be able to randomise presence-absence (i.e. binary) > matrices whilst keeping both the row and column totals constant. Is > there a function in R that would allow me to do this? > > I'm working with vegetation presence-absence matrices based on field > observations. The matrices are formatted to have sites as rows and > species as columns. The presence of a species on a site is indicated > with a 1 (absence is obviously indicated with a 0). > > I would like to randomise the matrices many times in order to construct > null models. However, I cannot identify a function in R to do this, and > the programming looks tricky for someone of my limited skills. > > Can anybody help me out? > > Many thanks, > > Nick Cutler > > Institute of Geography > School of Geosciences > University of Edinburgh > Drummond Street > Edinburgh EH8 9XP > United Kingdom > > Tel: 0131 650 2532 > Web: http://www.geos.ed.ac.uk/homes/s0455078 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Try ?r2dtable On 4/27/07, Nick Cutler <s0455078 at sms.ed.ac.uk> wrote:> I would like to be able to randomise presence-absence (i.e. binary) > matrices whilst keeping both the row and column totals constant. Is > there a function in R that would allow me to do this? > > I'm working with vegetation presence-absence matrices based on field > observations. The matrices are formatted to have sites as rows and > species as columns. The presence of a species on a site is indicated > with a 1 (absence is obviously indicated with a 0). > > I would like to randomise the matrices many times in order to construct > null models. However, I cannot identify a function in R to do this, and > the programming looks tricky for someone of my limited skills. > > Can anybody help me out? > > Many thanks, > > Nick Cutler > > Institute of Geography > School of Geosciences > University of Edinburgh > Drummond Street > Edinburgh EH8 9XP > United Kingdom > > Tel: 0131 650 2532 > Web: http://www.geos.ed.ac.uk/homes/s0455078 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Fri, 2007-04-27 at 09:45 +0100, Nick Cutler wrote: Hi Nick (Been meaning to reply to your private email to me but I've been away on vacation and at meeting for several weeks)> I would like to be able to randomise presence-absence (i.e. binary) > matrices whilst keeping both the row and column totals constant. Is > there a function in R that would allow me to do this?I thought r2dtable() would suffice, but this doesn't return binary matrices. There appears to be a lot of literature on this - Zaman and Simberloff (2002, Environmental and Ecological Statistics 9, 405--421) discuss many previous attempts to do this and present another approach. I've been interested in this for a little while so cooked up one of their reviewed methods this morning. It works by choosing at random 2 rows and 2 columns of your matrix, leading to a 2x2 sub matrix of your original matrix. If this matrix is: 1 0 0 1 or 0 1 1 0 then you can swap the 0s and 1s and you haven't altered the row or column sums any. You do this swap many times as a "burn in" period, and then you can sample a random matrix by making one further swap. The problem with this method is that it might not faithfully represent the full universe of possible matrices (with row column constraints), in that you might end up sampling only a string of matrices from a small region of all possible matrices. One way to get round this is that following the burn in, you then take a matrix only after the k+1th swap - i.e. you make k swaps and then draw the matrix, rather than draw after each swap. Whether you need the skip is debatable (Manly 1995, Ecology 76, 1109-1115). The appended rBinMat() function implements this method (it is after Roberts and Stone, 1990, Oecologia 83, 560--567), using a burn in period of 1000 and a skip step of 100 as defaults that you can change. I have no idea if these default are sufficient as I've not read An example usage is:> set.seed(1234) > dat <- matrix(sample(c(0,1), 100, replace = TRUE), ncol = 10) > system.time(ran.mat <- rBinMat(dat))user system elapsed 0.923 0.002 0.953> ran.mat[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 0 0 1 0 1 0 0 1 0 [2,] 1 0 0 1 1 1 0 0 1 0 [3,] 1 0 0 0 0 0 1 0 0 0 [4,] 1 1 1 0 1 0 1 1 0 1 [5,] 0 0 0 0 1 0 0 0 0 0 [6,] 1 1 1 1 1 1 1 1 1 1 [7,] 0 1 0 0 1 0 0 0 0 0 [8,] 1 0 1 0 0 1 0 0 0 0 [9,] 0 1 1 0 0 0 0 1 0 0 [10,] 1 0 0 1 1 1 1 1 1 1> identical(rowSums(dat), rowSums(ran.mat))[1] TRUE> identical(colSums(dat), colSums(ran.mat))[1] TRUE The size of the matrix is not an issue, but rather the burn in required. With larger matrices you need to do longer burn in, and make larger skips perhaps. The 56 species by 28 islands example in Roberts and Stone (1990), using a burn in of 100 000 (as per their paper), 56 seconds is needed to get the first matrix and then 0.5 seconds to do the recommended 1000 skips to get the next matrix. Subsequent matrices take 0.5 seconds to generate on my machine (see below). rBinMat() returns a single matrix, so if you want to draw n random matrices you need to run the function n times. But this is wasteful if you do the burn in each time. So, generate the first matrix as such: dat <- matrix(sample(c(0,1), 100, replace = TRUE), ncol = 10) mat <- rBinMat(dat, burn.in = 1000, skip = 100) then in your loop to generate stats on the NULL models, start from mat and set burn.in to 0, e.g. nit <- 1000 for(i in 1:nit) { mat2 <- rBinMat(mat, burn.in = 0, skip = 100) ### other stats here on the null model } Note that there are other ways to do this and the paper St?phane Dray pointed you to plus the Zaman & Simberloff one I cite above look at these in more detail. HTH G Here is the function - some of it is a bit clunky, and can surely be improved. rBinMat <- function(x, burn.in = 10000, skip = 1000) { ## number rows/cols n.col <- ncol(x) n.row <- nrow(x) ## function to draw at random 2 rows and colums ## just returns the indices required randDraw <- function(x, nr, nc) { ran.row <- sample(nr, 2) ran.col <- sample(nc, 2) return(c(ran.row, ran.col)) } ## is the 2x2 matrix diagonal or anti-diagonal isDiag <- function(x) { X <- as.vector(x) Diag <- aDiag <- FALSE if(identical(X, c(1,0,0,1))) return(TRUE) else if(identical(X, c(0,1,1,0))) return(TRUE) else return(FALSE) } changed <- 0 ## do the burn in changes, then skip, then an extra change, ## this is then the first random matrix we want to draw while(changed <= (burn.in + skip + 1)) { want <- randDraw(x, n.row, n.col) X <- x[want[1:2], want[3:4]] if(isDiag(X)) { x[want[1:2], want[3:4]] <- c(1,0)[X + 1] changed <- changed + 1} } return(x) }> > I'm working with vegetation presence-absence matrices based on field > observations. The matrices are formatted to have sites as rows and > species as columns. The presence of a species on a site is indicated > with a 1 (absence is obviously indicated with a 0). > > I would like to randomise the matrices many times in order to construct > null models. However, I cannot identify a function in R to do this, and > the programming looks tricky for someone of my limited skills. > > Can anybody help me out? > > Many thanks, > > Nick Cutler > > Institute of Geography > School of Geosciences > University of Edinburgh > Drummond Street > Edinburgh EH8 9XP > United Kingdom > > Tel: 0131 650 2532 > Web: http://www.geos.ed.ac.uk/homes/s0455078 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Nick Cutler <s0455078 <at> sms.ed.ac.uk> writes:> > I would like to be able to randomise presence-absence (i.e. binary) > matrices whilst keeping both the row and column totals constant. Is > there a function in R that would allow me to do this? > > I'm working with vegetation presence-absence matrices based on field > observations. The matrices are formatted to have sites as rows and > species as columns. The presence of a species on a site is indicated > with a 1 (absence is obviously indicated with a 0). > > I would like to randomise the matrices many times in order to construct > null models. However, I cannot identify a function in R to do this, and > the programming looks tricky for someone of my limited skills. > > Can anybody help me out?Nick, For a 1001 x 1001 matrix, this method takes less than 2 seconds on my 2 year old Windows PC. ronetab( marg1, marg2 ) returns a table of 0's and 1's according to the marginal contraints. ck.ronetab( marg1, marg2 ) checks that all the constraints were honored. msample <- function(x,marg) { ## Purpose: sample at most one each from each cell of a margin ## ---------------------------------------------------------------------- ## Arguments: x - number to sample, marg - a vector of integers ## ---------------------------------------------------------------------- ## Author: Charles C. Berry, Date: 28 Apr 2007, 08:17 ## GPL 2.0 or better if (!(x<=sum(marg) && all(marg>=0))) browser() wm <- which(marg!=0) if (length(wm)==1) { wm } else { sample( seq(along=marg), x, prob=marg ) } } ronetab <- function(m1,m2,debug=F) { ## Purpose: sample from a table with fixed margins and {0,1} cell values ## ---------------------------------------------------------------------- ## Arguments: m1, m2 - two margins ## ---------------------------------------------------------------------- ## Author: Charles C. Berry, Date: 28 Apr 2007, 08:21 ## GPL 2.0 or better stopifnot( sum(m1)==sum(m2)|| max(m1)>length(m2) || max(m2)>length(m1) ) i.list <- j.list <- list() k <- 0 while( sum(m1)>0 ){ k <- k+1 if ( sum(m1!=0) > sum(m2!=0) ){ i <- which.max( m1) j <- msample( m1[i], m2 ) i.list[[ k ]] <- rep( i, m1[i] ) j.list[[ k ]] <- j m1[i] <- 0 m2[ j ] <- m2[ j ] - 1 } else { j <- which.max( m2 ) i <- msample( m2[j], m1 ) i.list[[ k ]] <- i j.list[[ k ]] <- rep( j, m2[j] ) m2[j] <- 0 m1[ i ] <- m1[ i ] - 1 } } res <- array(0, c(length(m1), length(m2) ) ) res[ cbind( unlist(i.list), unlist(j.list) ) ] <- 1 res } ck.ronetab <- function(m1,m2){ tab <- ronetab(m1,m2) m1.ck <- all(m1==rowSums(tab)) m2.ck <- all(m2==colSums(tab)) cell.ck <- all( tab %in% 0:1 ) res <- m1.ck & m2.ck & cell.ck if (!res) attr(res,"tab") <- tab res } I'll warn you that I have not worked through what looks to be a tedious proof that this randomly samples matrices under the constraints. The heuristics seem right, and a few simulation spot checks look reasonable. If you do not want to trust it, you can still use it to generate a starting value for an MCMC run. HTH, Chuck