thr3ads.net - R help - [R] how to subsample all possible combinations of n species taken 1:n at a time? [Apr 2009]

If this information is useful, please help other people find it:
Share via:

jasper slingsby

2009-Apr-06 15:39 UTC

[R] how to subsample all possible combinations of n species taken 1:n at a time?

Hello

I apologise for the length of this entry but please bear with me.

In short:
I need a way of subsampling communities from all possible communities of n
taxa taken 1:n at a time without having to calculate all possible
combinations (because this gives me a memory error - using 
combn() or expand.grid() at least). Does anyone know of a function? Or can
you help me edit the 
combn
or 
expand.grid 
functions to generate subsamples?

In long:
I have been creating all possible communities of n taxa taken 1:n at a time
to get a presence/absence matrix of species occurrence in communities as
below...

Rows are samples, columns are species:

    A    B    C   D     .     .    .    .
    1    0    1    1    1    0    0    0    1     1     1     1     0     0    
0     0
    0    1    1    1    1    0    0    0    1     1     1     1     0     0    
0     0
    1    1    1    1    1    0    0    0    1     1     1     1     0     0    
0     0
    0    0    0    0    0    1    0    0    1     1     1     1     0     0    
0     0
    1    0    0    0    0    1    0    0    1     1     1     1     0     0    
0     0
    0    1    0    0    0    1    0    0    1     1     1     1     0     0    
0     0
    1    1    0    0    0    1    0    0    1     1     1     1     0     0    
0     0
    0    0    1    0    0    1    0    0    1     1     1     1     0     0    
0     0

...but the number of possible communities increases exponentially with each
added taxon. 

n<-11     #number of taxa
sum(for (i in 0:n) choose(i, k = 0:i)) #number of combos

So all possible combinations of 11 taxa taken 1:11 at a time is 2048, all
combos of 12 taken 1:12 is 4096, 13 taken 1:13 = 8192...etc etc such that
when I reach about 25 taken 1:25 the number of combos is 33554432 and I get
a memory error.

I have found that the number of combos of x taxa taken from a pool of n
creates a very kurtotic unimodal distribution,... 

x<-vector("integer",20)
for (i in 1:20) {x[i]<-choose(20,i)}
plot(x)

...but have found that limiting the number of samples for any community size
to 1000 is good enough for the further analyses I wish to do.
My problem lies in sampling all possible combos without having to calculate
all possible combos. I have tried two methods but both give memory errors at
about 25 taxa.

The expand.grid() method:

n <- 11 
toto <- vector("list",n)
titi <- lapply(toto,function(x) c(0,1))
tutu <- expand.grid(titi)

The combn() method (a slightly lengthlier function):

samplecommunityD<- function(n,numsamples)
{
super<-mat.or.vec(,n)
for (numspploop in 1:n)
{
  minor<-t(combn(n,numspploop))
  if (dim(minor)[1]<numsamples)
  {
    minot<-mat.or.vec(dim(minor)[1],n)
    for (loopi in 1:dim(minor)[1])
    {
      for (loopbi in 1:dim(minor)[2])
      {
        minot[loopi,minor[loopi,loopbi]] <- 1
      }
    }
    super<-rbind(super,minot)
    rm(minot)
  }
  else
 {
   minot<-mat.or.vec(numsamples,n)
   for (loopii in 1:numsamples)
   {
     thousand<-sample(dim(minor)[1],numsamples)
       for (loopbii in 1:dim(minor)[2])
       {
       minot[loopii,minor[thousand[loopii],loopbii]] <- 1
       }
   }
   super<-rbind(super,minot)
   rm(minot)
 }
}
super<-super[!rowSums(super)>n-1&!rowSums(super)<2,]
return(super)
}

samplecommunityD(11,1000)


So unless anyone knows of another function I could try my next step would be
to modify the combn or expand.grid functions to generate subsamples, but
their coding beyond me at this stage (I'm a 3.5 month newbie). Can anyone
identify where in the code I would need to introduce a sampling term or
skipping sequence?

Thanks for your time
Jasper

-- 
View this message in context:
http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22911399.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2009-Apr-06 19:57 UTC

head link

[R] how to subsample all possible combinations of n species taken 1:n at a time?

Are you just trying to obtain a combination from 25 possible terms?
If so, then just sample the number you want and convert the number to
binary:
> sample(33554432,100)  [1]  6911360  5924262 23052661 12888381 25831589 16700013 24079278
33282839 12751862 26086726 31363494  7118320 21866536  4212929
 [15]  8966435 12955834   449305 12830805 29181967 11420211 16175915
20118079 16560488  6248422 27762022 22430005 26650247  3621985
 [29] 24283690 13800068 27546362 21711718 26270840 18556802 17774422
26486373   782865 16013167 24572344 23244187 16026237 28897360
 [43] 14700082  8214024  2371593  3337527 10612303 17402454 22213173
13650936 30630988  9851680 15403666 11153297 21839554  8657593
 [57] 16057288 25713076  2826853 29370859 11377380 28166893 11632747
11199608 15983665 29937151 29002363 13085852 26082502 32232925
 [71] 14584722 23907975 13421556 10916983 25403574  6801209 23861215
4083294  8237209  4808486  8040610  1977505 21551566 29402643
 [85] 26135975 26753178 15276437 13760103 27208220 20298140 21968831
11851302  9068401 33308858 21256448  7154058  4341004 16042933
 [99] 31006704 20091025

This is a 100 samples and you can convert each of the numbers to
binary and the bits will tell you might elements to combine.

On Mon, Apr 6, 2009 at 11:39 AM, jasper slingsby <jslingsby at gmail.com>
wrote:>
> Hello
>
> I apologise for the length of this entry but please bear with me.
>
> In short:
> I need a way of subsampling communities from all possible communities of n
> taxa taken 1:n at a time without having to calculate all possible
> combinations (because this gives me a memory error - using
> combn() or expand.grid() at least). Does anyone know of a function? Or can
> you help me edit the
> combn
> or
> expand.grid
> functions to generate subsamples?
>
> In long:
> I have been creating all possible communities of n taxa taken 1:n at a time
> to get a presence/absence matrix of species occurrence in communities as
> below...
>
> Rows are samples, columns are species:
>
> ? ?A ? ?B ? ?C ? D ? ? . ? ? . ? ?. ? ?.
> ? ?1 ? ?0 ? ?1 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
> 0 ? ? 0
> ? ?0 ? ?1 ? ?1 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
> 0 ? ? 0
> ? ?1 ? ?1 ? ?1 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
> 0 ? ? 0
> ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
> 0 ? ? 0
> ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
> 0 ? ? 0
> ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
> 0 ? ? 0
> ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
> 0 ? ? 0
> ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0
> 0 ? ? 0
>
> ...but the number of possible communities increases exponentially with each
> added taxon.
>
> n<-11 ? ? #number of taxa
> sum(for (i in 0:n) choose(i, k = 0:i)) #number of combos
>
> So all possible combinations of 11 taxa taken 1:11 at a time is 2048, all
> combos of 12 taken 1:12 is 4096, 13 taken 1:13 = 8192...etc etc such that
> when I reach about 25 taken 1:25 the number of combos is 33554432 and I get
> a memory error.
>
> I have found that the number of combos of x taxa taken from a pool of n
> creates a very kurtotic unimodal distribution,...
>
> x<-vector("integer",20)
> for (i in 1:20) {x[i]<-choose(20,i)}
> plot(x)
>
> ...but have found that limiting the number of samples for any community
size
> to 1000 is good enough for the further analyses I wish to do.
> My problem lies in sampling all possible combos without having to calculate
> all possible combos. I have tried two methods but both give memory errors
at
> about 25 taxa.
>
> The expand.grid() method:
>
> n <- 11
> toto <- vector("list",n)
> titi <- lapply(toto,function(x) c(0,1))
> tutu <- expand.grid(titi)
>
> The combn() method (a slightly lengthlier function):
>
> samplecommunityD<- function(n,numsamples)
> {
> super<-mat.or.vec(,n)
> for (numspploop in 1:n)
> {
> ?minor<-t(combn(n,numspploop))
> ?if (dim(minor)[1]<numsamples)
> ?{
> ? ?minot<-mat.or.vec(dim(minor)[1],n)
> ? ?for (loopi in 1:dim(minor)[1])
> ? ?{
> ? ? ?for (loopbi in 1:dim(minor)[2])
> ? ? ?{
> ? ? ? ?minot[loopi,minor[loopi,loopbi]] <- 1
> ? ? ?}
> ? ?}
> ? ?super<-rbind(super,minot)
> ? ?rm(minot)
> ?}
> ?else
> ?{
> ? minot<-mat.or.vec(numsamples,n)
> ? for (loopii in 1:numsamples)
> ? {
> ? ? thousand<-sample(dim(minor)[1],numsamples)
> ? ? ? for (loopbii in 1:dim(minor)[2])
> ? ? ? {
> ? ? ? minot[loopii,minor[thousand[loopii],loopbii]] <- 1
> ? ? ? }
> ? }
> ? super<-rbind(super,minot)
> ? rm(minot)
> ?}
> }
> super<-super[!rowSums(super)>n-1&!rowSums(super)<2,]
> return(super)
> }
>
> samplecommunityD(11,1000)
>
>
> So unless anyone knows of another function I could try my next step would
be
> to modify the combn or expand.grid functions to generate subsamples, but
> their coding beyond me at this stage (I'm a 3.5 month newbie). Can
anyone
> identify where in the code I would need to introduce a sampling term or
> skipping sequence?
>
> Thanks for your time
> Jasper
>
> --
> View this message in context:
http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22911399.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Eik Vettorazzi

2009-Apr-06 20:02 UTC

head link

[R] how to subsample all possible combinations of n species taken 1:n at a time?

Hi Jasper,
maybe its not a very R'ish solution but the following could be a 
starting point:
First, notice that every combination you are looking for can be 
represented as an integer in binary notation where each bit stands for a 
specific community.
So looping through all combinations is the same as looping through 
0:(2^n-1) , eg. 7=2^0+2^2, so community 1 and 3 (natural numbering) are 
"set" in the corresponding subset.

#some auxillary functions are needed to extract the bits set in a given 
combination
#get bit returns presence/absence vector of species for given subset x
get.bit<-function(x,n)  x%/%(2^(0:(n-1)))%%2

#index of data columns included in subset x
get.index<-function(x,n)  x%/%(2^(0:(n-1)))%%2==1

n<-15 # number of communities
dta<-matrix(1:(12*n),ncol=n) # some sample data

#looping thru *all* possible combinations of n communities.
#will work, but it is in fact *very*! time consuming for n=25
for (i in 0:(2^n-1))  {
 sub.index<-get.index(i,n)
 # subsetting your dataset using this index and do subset analysis
 # dta[,sub.index]
}

hth



jasper slingsby schrieb:> Hello
>
> I apologise for the length of this entry but please bear with me.
>
> In short:
> I need a way of subsampling communities from all possible communities of n
> taxa taken 1:n at a time without having to calculate all possible
> combinations (because this gives me a memory error - using 
> combn() or expand.grid() at least). Does anyone know of a function? Or can
> you help me edit the 
> combn
> or 
> expand.grid 
> functions to generate subsamples?
>
> In long:
> I have been creating all possible communities of n taxa taken 1:n at a time
> to get a presence/absence matrix of species occurrence in communities as
> below...
>
> Rows are samples, columns are species:
>
>     A    B    C   D     .     .    .    .
>     1    0    1    1    1    0    0    0    1     1     1     1     0     0
> 0     0
>     0    1    1    1    1    0    0    0    1     1     1     1     0     0
> 0     0
>     1    1    1    1    1    0    0    0    1     1     1     1     0     0
> 0     0
>     0    0    0    0    0    1    0    0    1     1     1     1     0     0
> 0     0
>     1    0    0    0    0    1    0    0    1     1     1     1     0     0
> 0     0
>     0    1    0    0    0    1    0    0    1     1     1     1     0     0
> 0     0
>     1    1    0    0    0    1    0    0    1     1     1     1     0     0
> 0     0
>     0    0    1    0    0    1    0    0    1     1     1     1     0     0
> 0     0
>
> ...but the number of possible communities increases exponentially with each
> added taxon. 
>
> n<-11     #number of taxa
> sum(for (i in 0:n) choose(i, k = 0:i)) #number of combos
>
> So all possible combinations of 11 taxa taken 1:11 at a time is 2048, all
> combos of 12 taken 1:12 is 4096, 13 taken 1:13 = 8192...etc etc such that
> when I reach about 25 taken 1:25 the number of combos is 33554432 and I get
> a memory error.
>
> I have found that the number of combos of x taxa taken from a pool of n
> creates a very kurtotic unimodal distribution,... 
>
> x<-vector("integer",20)
> for (i in 1:20) {x[i]<-choose(20,i)}
> plot(x)
>
> ...but have found that limiting the number of samples for any community
size
> to 1000 is good enough for the further analyses I wish to do.
> My problem lies in sampling all possible combos without having to calculate
> all possible combos. I have tried two methods but both give memory errors
at
> about 25 taxa.
>
> The expand.grid() method:
>
> n <- 11 
> toto <- vector("list",n)
> titi <- lapply(toto,function(x) c(0,1))
> tutu <- expand.grid(titi)
>
> The combn() method (a slightly lengthlier function):
>
> samplecommunityD<- function(n,numsamples)
> {
> super<-mat.or.vec(,n)
> for (numspploop in 1:n)
> {
>   minor<-t(combn(n,numspploop))
>   if (dim(minor)[1]<numsamples)
>   {
>     minot<-mat.or.vec(dim(minor)[1],n)
>     for (loopi in 1:dim(minor)[1])
>     {
>       for (loopbi in 1:dim(minor)[2])
>       {
>         minot[loopi,minor[loopi,loopbi]] <- 1
>       }
>     }
>     super<-rbind(super,minot)
>     rm(minot)
>   }
>   else
>  {
>    minot<-mat.or.vec(numsamples,n)
>    for (loopii in 1:numsamples)
>    {
>      thousand<-sample(dim(minor)[1],numsamples)
>        for (loopbii in 1:dim(minor)[2])
>        {
>        minot[loopii,minor[thousand[loopii],loopbii]] <- 1
>        }
>    }
>    super<-rbind(super,minot)
>    rm(minot)
>  }
> }
> super<-super[!rowSums(super)>n-1&!rowSums(super)<2,]
> return(super)
> }
>
> samplecommunityD(11,1000)
>
>
> So unless anyone knows of another function I could try my next step would
be
> to modify the combn or expand.grid functions to generate subsamples, but
> their coding beyond me at this stage (I'm a 3.5 month newbie). Can
anyone
> identify where in the code I would need to introduce a sampling term or
> skipping sequence?
>
> Thanks for your time
> Jasper
>
>

David Katz

2009-Apr-06 23:17 UTC

head link

[R] how to subsample all possible combinations of n species taken 1:n at a time?

If I understand your problem properly, you could just note that selecting 1:n
of n objects is the same as deciding separately whether each one is included
or not. (exclude the case where none are selected).

Take 1000 of these and you are there- except some are duplicates - so
generate extras and eliminate the duplicates, discard the extras.

Something like this (not tested):

p <- 2^(n-1) / (2^n - 1) #all combinations have equal probability - removing
rows with all zeros
result <- matrix(0,1200*n,nrow=1200) #plenty of extras for duplicates
for(i in 1:1200) result[i,] <- rbinom(n,1,p)
result <- subset(result,apply(result,1,sum) > 0) #cases which have at
least
1 species
result <- unique(result)[1:1000,]

Might be interesting to see the effect of varying p on the rest of your
analysis.

Further memory might be saved by using sparse matrices - see the Matrix
package.

David Katz
www.davidkatzconsulting.com

jasper slingsby wrote:> 
> Hello
> 
> I apologise for the length of this entry but please bear with me.
> 
> In short:
> I need a way of subsampling communities from all possible communities of n
> taxa taken 1:n at a time without having to calculate all possible
> combinations (because this gives me a memory error - using 
> combn() or expand.grid() at least). Does anyone know of a function? Or can
> you help me edit the 
> combn
> or 
> expand.grid 
> functions to generate subsamples?
> 
> 
-- 
View this message in context:
http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22919388.html
Sent from the R help mailing list archive at Nabble.com.

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Apr 2009 - how to subsample all possible combinations of n species taken 1:n at a time?

[R] how to subsample all possible combinations of n species taken 1:n at a time?

[R] how to subsample all possible combinations of n species taken 1:n at a time?

[R] how to subsample all possible combinations of n species taken 1:n at a time?

[R] how to subsample all possible combinations of n species taken 1:n at a time?

Apparently Analagous Threads