JiHO
2009-May-20 15:34 UTC
[R] Comparing spatial distributions - permutation test implementation
Hello everyone, I am looking at the joint spatial distribution of 2 kinds of organisms (estimated on a grid of points) and want to test for significant association or dissociation. My first question is: do you know a nice technique to do that, considering that I have a limited number of points (36) but that they are repeated (4 times)? I did GLMs to test for correlations between the two (hence forgetting about the spatial aspect of it) and was previously pointed to the SADIE software. Would there be anything explicitly spatial and available in R please? Then, Syrjala's test[1] seems appropriate and tests for differences in distribution. It computes a Cram?r-von Mises-type statistic and tests its significance with a permutation test. I implemented the test in R and posted the code on these mailing lists[2]. Some people checked it and confirmed that the statistic gives correct results but my estimation of the p-value does not match the one predicted with the orignal software from Syrjala. I don't know what I am doing wrong. The permutation test is described by Syrjala as: (...) Under the null hypothesis, at a given sampling location (x_k, y_k), either density ob- servation y_i(x_k, y_k), i = 1, 2, is equally likely for each population. Thus, for a given data set, the distribution of the test statistic can be constructed by calculating the value of the test statistic for all 2^k pairwise per- mutations of the data set. (...) The level of signif- icance of a specific realization of the test statistic T is determined from its position in the ordered set of test statistic values from all 2^k permutations. (...) My understanding is that, for each permutation I should choose a random number of points (between 1 and k), swap the values for species 1 and species 2 at those points, and recompute the test on the new data. But this does not work :/ . Here is my code and associated data from Syrjala (for which I have reference values). Any advice would be very welcome (in particular if there is a way to leverage boot() for this). NB: computing the 1000 permutations can be a bit lengthy, but fortunately, by using plyr, you get a nice progress bar to look at! syrjala.stat <- function(x, y=NULL, var1=NULL, var2=NULL) # # Compute Syrjala statistic # x, y coordinates # var1, var2 value of 2 parameters both measured at (x,y) points # NB: x can also be a data.frame/matrix containing x,y,var1,var2 as columns # { # Input checks if (!is.null(ncol(x))) { if (ncol(x) == 4) { names(x) = c("x","y","var1","var2") dat = x } else { stop("Wrong number of columns in argument x") } } else { dat = data.frame(x, y, var1, var2) } # Normalize abundances dat$var1 = dat$var1/sum(dat$var1) dat$var2 = dat$var2/sum(dat$var2) # For each point (each line of dat) # compute the squared difference in gammas from each origin meanSqDiff = apply(dat, 1, function(d, coord, variab) { north = (coord$x>=d[1]) east = (coord$y>=d[2]) south = (coord$x<=d[1]) west = (coord$y<=d[2]) return( mean( c( (diff(sapply(variab[(north & east),], sum)))^2, (diff(sapply(variab[(south & east),], sum)))^2, (diff(sapply(variab[(south & west),], sum)))^2, (diff(sapply(variab[(north & west),], sum)))^2 ) ) ) }, dat[,c("x","y")], dat[,c("var1","var2")]) # Compute the statistic (i.e. sum of mean squared differences) return(sum(meanSqDiff)) } # Get data online : http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv system("curl http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv > syrjala_data_cod.csv") dataCod = read.csv(file = "syrjala_data_cod.csv", header = TRUE) # Normalize abundances dataCod$var1 = dataCod$var1/sum(dataCod$var1) dataCod$var2 = dataCod$var2/sum(dataCod$var2) # Number of permutations nperm = 1000 # Create nperm-1 replicates of the data (one is the original observation) d = rep(list(dataCod), nperm-1) # Compute number of observations before to avoid doing that for every replicate n = nrow(dataCod) require(plyr) # Permute some observations and compute the syrjala stat for each permutation psis = ldply(d, .fun=function(x, n){ # choose indices of observations to swap idx = sample(1:n, runif(1, min=1, max=n)) # swap observations x[idx, 3:4] = x[idx, 4:3] # compute syrjala stat return(syrjala.stat(x)) }, n, .progress="text") } # Compute the syrjala stat for the observations psi = syrjala.stat(dataCod) # Estimate the pvalue pvalue = (sum(psis>=psi)+1)/nperm psi pvalue # Should be: # statistic = 0.224 # p-value = 0.1900 Thank you very much in advance. Sincerely, JiHO --- http://jo.irisson.free.fr/ [1] A statistical test for a difference between the spatial distributions of two populations. Syrjala SE. Ecology. 1996;77(1):75?80. http://dl.getdropbox.com/u/1047321/Syrjala1996.pdf [2] https://stat.ethz.ch/pipermail/r-sig-geo/2008-February/ thread.html#3137
Marcelino de la Cruz
2009-May-21 09:40 UTC
[R] [R-sig-Geo] Comparing spatial distributions - permutation test implementation
Hi, Jose M. Blanco-Moreno an myself have implemented Syrjala's test in the ecespa package. As a matter of fact, Jose M. has implemented a Frotran version of Syrjala's original QBasic function. Using your data the results are very close to your reported # statistic= 0.224 and # p-value = 0.1900. Cheers, Marcelino # with Fortran code > syrjala0(dataCod[,1:2], dataCod$var1, dataCod$var2,R=F,nsim=1000) Cramer-von Misses test for the difference between the spatial distributions of dataCod$var1 and dataCod$var2 based on 1000 permutations. psi: 0.223504 p-value: 0.2167832 Kolmogorov-Smirnov test for the difference between the spatial distributions of dataCod$var1 and dataCod$var2 based on 1000 permutations. psi: 0.061354 p-value: 0.2867133 ### With R-code (a bit lengthy, so I use only 100 permutations) > syrjala0(dataCod[,1:2], dataCod$var1, dataCod$var2,R=T,nsim=100) Syrjala test for the difference between the spatial distributions of dataCod$var1 and dataCod$var2 , based on 100 simulations psi: 0.223504 p-value: 0.2079208 At 17:34 20/05/2009, JiHO wrote:>Hello everyone, > >I am looking at the joint spatial distribution of 2 kinds of organisms >(estimated on a grid of points) and want to test for significant >association or dissociation. > >My first question is: do you know a nice technique to do that, >considering that I have a limited number of points (36) but that they >are repeated (4 times)? I did GLMs to test for correlations between >the two (hence forgetting about the spatial aspect of it) and was >previously pointed to the SADIE software. Would there be anything >explicitly spatial and available in R please? > >Then, Syrjala's test[1] seems appropriate and tests for differences in >distribution. It computes a Cram?r-von Mises-type statistic and tests >its significance with a permutation test. >I implemented the test in R and posted the code on these mailing >lists[2]. Some people checked it and confirmed that the statistic >gives correct results but my estimation of the p-value does not match >the one predicted with the orignal software from Syrjala. I don't know >what I am doing wrong. The permutation test is described by Syrjala as: > > (...) Under the null hypothesis, > at a given sampling location (x_k, y_k), either density ob- > servation y_i(x_k, y_k), i = 1, 2, is equally likely for each > population. Thus, for a given data set, the distribution > of the test statistic can be constructed by calculating > the value of the test statistic for all 2^k pairwise per- > mutations of the data set. (...) The level of signif- > icance of a specific realization of the test statistic T is > determined from its position in the ordered set of test > statistic values from all 2^k permutations. (...) > >My understanding is that, for each permutation I should choose a >random number of points (between 1 and k), swap the values for species >1 and species 2 at those points, and recompute the test on the new >data. But this does not work :/ . Here is my code and associated data >from Syrjala (for which I have reference values). Any advice would be >very welcome (in particular if there is a way to leverage boot() for >this). >NB: computing the 1000 permutations can be a bit lengthy, but >fortunately, by using plyr, you get a nice progress bar to look at! > >syrjala.stat <- function(x, y=NULL, var1=NULL, var2=NULL) ># ># Compute Syrjala statistic ># x, y coordinates ># var1, var2 value of 2 parameters both measured at (x,y) points ># NB: x can also be a data.frame/matrix containing x,y,var1,var2 as >columns ># >{ > # Input checks > if (!is.null(ncol(x))) { > if (ncol(x) == 4) { > names(x) = c("x","y","var1","var2") > dat = x > } else { > stop("Wrong number of columns in argument x") > } > } else { > dat = data.frame(x, y, var1, var2) > } > > # Normalize abundances > dat$var1 = dat$var1/sum(dat$var1) > dat$var2 = dat$var2/sum(dat$var2) > > # For each point (each line of dat) > # compute the squared difference in gammas from each origin > meanSqDiff = apply(dat, 1, function(d, coord, variab) { > north = (coord$x>=d[1]) > east = (coord$y>=d[2]) > south = (coord$x<=d[1]) > west = (coord$y<=d[2]) > return( mean( c( > (diff(sapply(variab[(north & east),], sum)))^2, > (diff(sapply(variab[(south & east),], sum)))^2, > (diff(sapply(variab[(south & west),], sum)))^2, > (diff(sapply(variab[(north & west),], sum)))^2 > ) ) > ) > }, dat[,c("x","y")], dat[,c("var1","var2")]) > > # Compute the statistic (i.e. sum of mean squared differences) > return(sum(meanSqDiff)) >} > > ># Get data online : http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv >system("curl http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv > >syrjala_data_cod.csv") > >dataCod = read.csv(file = "syrjala_data_cod.csv", header = TRUE) > ># Normalize abundances >dataCod$var1 = dataCod$var1/sum(dataCod$var1) >dataCod$var2 = dataCod$var2/sum(dataCod$var2) > ># Number of permutations >nperm = 1000 > ># Create nperm-1 replicates of the data (one is the original >observation) >d = rep(list(dataCod), nperm-1) > ># Compute number of observations before to avoid doing that for every >replicate >n = nrow(dataCod) > >require(plyr) ># Permute some observations and compute the syrjala stat for each >permutation >psis = ldply(d, .fun=function(x, n){ > # choose indices of observations to swap > idx = sample(1:n, runif(1, min=1, max=n)) > # swap observations > x[idx, 3:4] = x[idx, 4:3] > # compute syrjala stat > return(syrjala.stat(x)) >}, n, .progress="text") >} > ># Compute the syrjala stat for the observations >psi = syrjala.stat(dataCod) > ># Estimate the pvalue >pvalue = (sum(psis>=psi)+1)/nperm > >psi >pvalue ># Should be: ># statistic = 0.224 ># p-value = 0.1900 > >Thank you very much in advance. Sincerely, > >JiHO >--- >http://jo.irisson.free.fr/ > >[1] A statistical test for a difference between the spatial >distributions of two populations. Syrjala SE. Ecology. 1996;77(1):75?80. >http://dl.getdropbox.com/u/1047321/Syrjala1996.pdf > >[2] https://stat.ethz.ch/pipermail/r-sig-geo/2008-February/ thread.html#3137 > >_______________________________________________ >R-sig-Geo mailing list >R-sig-Geo at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/r-sig-geo________________________________ Marcelino de la Cruz Rot Departamento de Biolog?a Vegetal E.U.T.I. Agr?cola Universidad Polit?cnica de Madrid 28040-Madrid Tel.: 91 336 54 35 Fax: 91 336 56 56 marcelino.delacruz at upm.es
Reasonably Related Threads
- Reordering levels of a factor when the factor is part of a data frame
- getOptions("max.print") in R
- Using "rep", but don't know what to put after each =
- Comparing spatial point patterns - Syrjala test
- [PATCH] drm/nouveau: Replace the iturbt_709 prop with the standarad COLOR_ENCODNIG prop