thr3ads.net - R help - [R] help for a loop procedure [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Serena Corezzola

2011-Jan-27 10:30 UTC

[R] help for a loop procedure

Hello everybody!



I’m trying to define the optimal number of surveys to detect the highest
number of species within a monitoring season/session.

To do this I want to run all the possible combinations between a set of
samples and to calculate the total number of species for each combination of
2, 3, 4 …n samples events, so that at the end I will be able to define which
is the lowest number of samples that I need to obtain the best result.



I’ve already done this operation manually, just to see if it works, but the
point is that some of my datasets have more than 30 samples and more than 35
species, so that the number of combinations will be HUGE!

So here is the question: I need to find a way for R to make all possible
combinations of samples automatically, and then to automatically return the
total number of species in every combination.

I’ve tried to search for a loop script, or something like that. However, I’m
relatively new to R and I don’t know what I need to do… Can anyone help me?



Here I’ve written a simple example of the operations I need to do, just to
make my problem clearer.



My dataset (matrix) has sample events by rows (U1,U2,U3) and detected
species by columns.



U<-read.table("C:\\Documents
\\tre_usc.txt",header=T,row.names=1,sep="\t",dec = ",")



U  # global matrix with 3 samples

SPECIE          Aadi    Aagl    Apap   Aage   Bdia    Beup   Crub    Carc
Cpam

U1                   0          0          0          0          7
0          5          0          1

U2                   0          0          0          0          4
2          1          0          0

U3                   0          0          0          0          0
0          0          0          14



First, I’ve created from this matrix all the subsets based on single
samples,



U1 <- U [c(1), ]

U2 <- U [c(2), ]

U3 <- U [c(3), ]



U1

SPECIE          Aadi    Aagl    Apap   Aage   Bdia    Beup   Crub    Carc
Cpam

U1                   0          0          0          0          7
0          5          0          1



U2

SPECIE          Aadi    Aagl    Apap   Aage   Bdia    Beup   Crub    Carc
Cpam

U2                   0          0          0          0          4
2          1          0          0

Etc…



then I’ve combined them summing each time the values of the chosen lines
(total n° of combination = 4).



U12<-U1+U2

U13<-U1+U3

U23<-U2+U3

U123<-U1+U2+U3



U12

SPECIE          Aadi    Aagl    Apap   Aage   Bdia    Beup   Crub    Carc
Cpam

U12                 0          0          0          0          11
   2
6          0          1

Etc….



Then I’ve applied the command “length” to find the number of species for
every new combination.



length(U12[U12>0])

[1]  4



length(U13[U13>0])

[1] 3

etc…



Now I need to do this with 10 and 32 sample events…….: (



Thanks for your attention!


Serena Corezzola

Centro Nazionale per lo Studio e la Conservazione della Biodiversità
Forestale, “Bosco Fontana” di Verona
Strada Mantova 29
I-46045 MARMIROLO (MN)
Italy

	[[alternative HTML version deleted]]

Petr Savicky

2011-Jan-27 16:30 UTC

head link

[R] help for a loop procedure

On Thu, Jan 27, 2011 at 11:30:37AM +0100, Serena Corezzola
wrote:> Hello everybody!
> 
> 
> 
> I?m trying to define the optimal number of surveys to detect the highest
> number of species within a monitoring season/session.
> 
> To do this I want to run all the possible combinations between a set of
> samples and to calculate the total number of species for each combination
of
> 2, 3, 4 ?n samples events, so that at the end I will be able to define
which
> is the lowest number of samples that I need to obtain the best result.
> 
> 
> 
> I?ve already done this operation manually, just to see if it works, but the
> point is that some of my datasets have more than 30 samples and more than
35
> species, so that the number of combinations will be HUGE!
> 
> So here is the question: I need to find a way for R to make all possible
> combinations of samples automatically, and then to automatically return the
> total number of species in every combination.
> 
> I?ve tried to search for a loop script, or something like that. However,
I?m
> relatively new to R and I don?t know what I need to do? Can anyone help me?
> 
> 
> 
> Here I?ve written a simple example of the operations I need to do, just to
> make my problem clearer.
> 
> 
> 
> My dataset (matrix) has sample events by rows (U1,U2,U3) and detected
> species by columns.
> 
> 
> 
> U<-read.table("C:\\Documents
> \\tre_usc.txt",header=T,row.names=1,sep="\t",dec =
",")
Hello:

For simplicity of preparing a reply, let me include your data
as an R command.

  U <- structure(list(Aadi = c(0L, 0L, 0L), Aagl = c(0L, 0L, 0L),
  Apap = c(0L, 0L, 0L), Aage = c(0L, 0L, 0L), Bdia = c(7L, 4L, 0L), 
  Beup = c(0L, 2L, 0L), Crub = c(5L, 1L, 0L), Carc = c(0L, 0L, 0L), 
  Cpam = c(1L, 0L, 14L)), .Names = c("Aadi", "Aagl",
"Apap", "Aage",
  "Bdia", "Beup", "Crub", "Carc",
"Cpam"), class = "data.frame",
  row.names = c("U1", "U2", "U3"))

     Aadi Aagl Apap Aage Bdia Beup Crub Carc Cpam
  U1    0    0    0    0    7    0    5    0    1
  U2    0    0    0    0    4    2    1    0    0
  U3    0    0    0    0    0    0    0    0   14

> First, I?ve created from this matrix all the subsets based on single
> samples,
> 
> 
> 
> U1 <- U [c(1), ]
> 
> U2 <- U [c(2), ]
> 
> U3 <- U [c(3), ]
>[...] 
 > 
> then I?ve combined them summing each time the values of the chosen lines
> (total n? of combination = 4).
> 
> 
> 
> U12<-U1+U2
> 
> U13<-U1+U3
> 
> U23<-U2+U3
> 
> U123<-U1+U2+U3
> 
[...]> 
> 
> Then I?ve applied the command ?length? to find the number of species for
> every new combination.
> 
> 
> 
> length(U12[U12>0])
> 
> [1]  4
> 
> 
> 
> length(U13[U13>0])
> 
> [1] 3
> 
This can be partially automatized as follows

  UM <- as.matrix(U)
  A <- rbind(
  c(1, 0, 0),
  c(0, 1, 0),
  c(0, 0, 1),
  c(1, 1, 0),
  c(1, 0, 1),
  c(0, 1, 1),
  c(1, 1, 1))
  rownam <- rep("U", times=nrow(A))
  for (i in 1:3) {
  	rownam[A[, i] == 1] <- paste(rownam[A[, i] == 1], i, sep="")
  }
  dimnames(A) <- list(rownam, NULL)
  C <- A %*% UM
  C

       Aadi Aagl Apap Aage Bdia Beup Crub Carc Cpam
  U1      0    0    0    0    7    0    5    0    1
  U2      0    0    0    0    4    2    1    0    0
  U3      0    0    0    0    0    0    0    0   14
  U12     0    0    0    0   11    2    6    0    1
  U13     0    0    0    0    7    0    5    0   15
  U23     0    0    0    0    4    2    1    0   14
  U123    0    0    0    0   11    2    6    0   15

  rowSums(C != 0)

    U1   U2   U3  U12  U13  U23 U123 
     3    3    1    4    3    4    4 
> Now I need to do this with 10 and 32 sample events??.: (
If i understand you correctly, your real table U has 32 rows
and you want to consider all subsets of at most 10 rows. If this
is so, then the number of combinations is

  sum(choose(32, 1:10))
  # [1] 107594212

A matrix of this number of rows and 35 columns requires 30 GB
of memory. How do you want to summarize the results? There may
be a more efficient way to compute the required parameters.

For example, the average number of species, which are contained
in a sum of a random selection of k rows may be computed easily,
since we can consider the columns (species) individually and
for each column, the probability to get a nonzero sum may be
computed without actually constructing all the subsets.

If you need a parameter, which is harder to compute than the
average, it is possible to consider simulation. In this case,
not all subsets would be generated, but a smaller number
of randomly chosen subsets of k rows for a given k.

Petr Savicky.

Petr Savicky

2011-Jan-28 10:19 UTC

head link

[R] help for a loop procedure

On Thu, Jan 27, 2011 at 05:30:15PM +0100, Petr Savicky
wrote:> On Thu, Jan 27, 2011 at 11:30:37AM +0100, Serena Corezzola wrote:
> > Hello everybody!
> > 
> > 
> > 
> > I?m trying to define the optimal number of surveys to detect the
highest
> > number of species within a monitoring season/session.
> > 
[...]
> This can be partially automatized as follows
> 
>   UM <- as.matrix(U)
>   A <- rbind(
>   c(1, 0, 0),
>   c(0, 1, 0),
>   c(0, 0, 1),
>   c(1, 1, 0),
>   c(1, 0, 1),
>   c(0, 1, 1),
>   c(1, 1, 1))
>   rownam <- rep("U", times=nrow(A))
>   for (i in 1:3) {
>   	rownam[A[, i] == 1] <- paste(rownam[A[, i] == 1], i,
sep="")
>   }
>   dimnames(A) <- list(rownam, NULL)
>   C <- A %*% UM
>   C
> 
>        Aadi Aagl Apap Aage Bdia Beup Crub Carc Cpam
>   U1      0    0    0    0    7    0    5    0    1
>   U2      0    0    0    0    4    2    1    0    0
>   U3      0    0    0    0    0    0    0    0   14
>   U12     0    0    0    0   11    2    6    0    1
>   U13     0    0    0    0    7    0    5    0   15
>   U23     0    0    0    0    4    2    1    0   14
>   U123    0    0    0    0   11    2    6    0   15
> 
>   rowSums(C != 0)
> 
>     U1   U2   U3  U12  U13  U23 U123 
>      3    3    1    4    3    4    4 
> 
> Now I need to do this with 10 and 32 sample events??.: (
Hello.

In a previous email, i suggested the code above. However, it may 
be used only for a fixed matrix U. For testing the procedure for
a larger matrix U, matrix A should be generated differently. For a
fixed k, A should have choose(nrow(U), k) rows, nrow(U) columns and
its rows should be all 0,1-vectors with k ones. The following code
may be used, although better ways of computing A probably exist.

  n <- nrow(U)
  k <- 2
  cmb <- combn(n, k)
  A <- matrix(0, nrow=ncol(cmb), ncol=n)
  ind <- cbind(1:nrow(A), 0L)
  for (i in seq.int(length=k)) {
      ind[, 2] <- cmb[i, ]
      A[ind] <- 1
  }
  A

       [,1] [,2] [,3]
  [1,]    1    1    0
  [2,]    1    0    1
  [3,]    0    1    1

  C <- A %*% as.matrix(U)
  rowSums(C != 0)

  [1] 4 3 4

This output corresponds to U12, U13, U23.

If n = 32, then the above may be used for computing the required
counts exactly for a few small values of k. For k up to 10, an
approximation may be more suitable. For example, simulation may
be used, where random subsets are generated using sample(n, k).

Petr Savicky.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Jan 2011 - help for a loop procedure

[R] help for a loop procedure

[R] help for a loop procedure

[R] help for a loop procedure

Seemingly Similar Threads