Benjamin Ward (ENV)
2015-Jun-02 12:26 UTC
[R] Combining multiple probability weights for the sample() function.
Dear R-List, I have a set of possibilities I want to sample from: bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T')) possibilities <- as.matrix(expand.grid(bases))>possibilitiesVar1 Var2 Var3 [1,] "A" "A" "C" [2,] "C" "A" "C" [3,] "A" "G" "C" [4,] "C" "G" "C" [5,] "A" "A" "T" [6,] "C" "A" "T" [7,] "A" "G" "T" [8,] "C" "G" "T" If I want to randomly sample one of these rows. If I do this, I find that it is 25% likely that my choice will have an identical first and last letter (e.g. [1,] "A" "A" "C"). It is also 25% likely that my choice will have an identical first and third letter (e.g. [4,] "C" "G" "C"). It is not likely at all that the second and third letter of my choice could be identical. What I would like to do, is sample one of the rows, but given the constraint that the probability of drawing identical letters 1 and 2 should be 50% or 0.5, and at the same time the probability of drawing identical letters 1 and 3 should be 50%. I am unsure on how to do this, but I know it involves coming up with a modified set of weights for the sample() function. My progress is below, any advice is much appreciated. Best Wishes, Ben Ward, UEA. So I have used the following code to come up with a matrix, which contains weighting according to each criteria: possibilities <- as.matrix(expand.grid(bases)) identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1] == x[3], x[2] == x[3])) prob <- matrix(rep(0, length(identities)), ncol = ncol(identities)) consProb <- apply(identities, 1, function(x){0.5 / length(which(x))}) polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))}) for(i in 1:nrow(identities)){ prob[i, which(identities[i,])] <- consProb[i] prob[i, which(!identities[i,])] <- polProb[i] } rownames(prob) <- c("1==2", "1==3", "2==3") colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse = ", ")) This code gives the following matrix: A, A, C C, A, C A, G, C C, G, C A, A, T C, A, T A, G, T C, G, T 1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333 0.08333333 0.08333333 1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333 0.08333333 0.08333333 2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 Each column is one of the choices from 'possibilities', and each row gives a series of weights based on three different criteria: Row 1, that if it possible from the choices for letter 1 == letter 2, that combined chance be 50%. Row 2, that if it possible from the choices for letter 1 == letter 3, that combined chance be 50%. Row 3, that if it possible from the choices for letter 2 == letter 3, that combined chance be 50%. So: If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,]) repeatedly, I expect about half the choices to contain identical letters 1 and 2. If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,]) repeatedly, I expect about half the choices to contain identical letters 1 and 3. If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,]) repeatedly, I expect about half the choices to contain identical letters 2 and 3. Except that in this case, since it is not possible. Note each row sums to 1. What I would like to do - if it is possible - is combine these three sets of weights into one set, that when used with sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me a list of choices, where ~50% of them contain identical letters 1 and 2, AND ~50% of them contain identical letters 1 and 3, AND ~50% again contain identical letters 2 and 3 (except in this example as it is not possible from the choices). Can multiple probability weightings be combined in such a manner? [[alternative HTML version deleted]]
Adams, Jean
2015-Jun-02 13:57 UTC
[R] Combining multiple probability weights for the sample() function.
Ben, Perhaps I am missing something, but couldn't you simply reduce your possibilities to: possibilities[c(1, 5, 2, 4), ] Var1 Var2 Var3 [1,] "A" "A" "C" [2,] "A" "A" "T" [3,] "C" "A" "C" [4,] "C" "G" "C" If you sample from these four rows you will have a 50% chance that Var1 and Var2 are equal and a 50% chance that Var1 and Var3 are equal. Jean On Tue, Jun 2, 2015 at 7:26 AM, Benjamin Ward (ENV) <B.Ward at uea.ac.uk> wrote:> Dear R-List, > > I have a set of possibilities I want to sample from: > > bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T')) > possibilities <- as.matrix(expand.grid(bases)) > > >possibilities > Var1 Var2 Var3 > [1,] "A" "A" "C" > [2,] "C" "A" "C" > [3,] "A" "G" "C" > [4,] "C" "G" "C" > [5,] "A" "A" "T" > [6,] "C" "A" "T" > [7,] "A" "G" "T" > [8,] "C" "G" "T" > > If I want to randomly sample one of these rows. If I do this, I find that > it is 25% likely that my choice will have an identical first and last > letter (e.g. [1,] "A" "A" "C"). It is also 25% likely that my choice will > have an identical first and third letter (e.g. [4,] "C" "G" "C"). It is > not likely at all that the second and third letter of my choice could be > identical. > > What I would like to do, is sample one of the rows, but given the > constraint that the probability of drawing identical letters 1 and 2 should > be 50% or 0.5, and at the same time the probability of drawing identical > letters 1 and 3 should be 50%. I am unsure on how to do this, but I know it > involves coming up with a modified set of weights for the sample() > function. My progress is below, any advice is much appreciated. > > Best Wishes, > > Ben Ward, UEA. > > > So I have used the following code to come up with a matrix, which contains > weighting according to each criteria: > > possibilities <- as.matrix(expand.grid(bases)) > identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1] > == x[3], x[2] == x[3])) > prob <- matrix(rep(0, length(identities)), ncol = ncol(identities)) > consProb <- apply(identities, 1, function(x){0.5 / length(which(x))}) > polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))}) > for(i in 1:nrow(identities)){ > prob[i, which(identities[i,])] <- consProb[i] > prob[i, which(!identities[i,])] <- polProb[i] > } > rownames(prob) <- c("1==2", "1==3", "2==3") > colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse > ", ")) > > This code gives the following matrix: > > A, A, C C, A, C A, G, C C, G, C > A, A, T C, A, T A, G, T C, G, T > 1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333 > 0.08333333 0.08333333 > 1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333 > 0.08333333 0.08333333 > 2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 > 0.06250000 0.06250000 > > Each column is one of the choices from 'possibilities', and each row gives > a series of weights based on three different criteria: > > Row 1, that if it possible from the choices for letter 1 == letter 2, that > combined chance be 50%. > Row 2, that if it possible from the choices for letter 1 == letter 3, that > combined chance be 50%. > Row 3, that if it possible from the choices for letter 2 == letter 3, that > combined chance be 50%. > > So: > > If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,]) > repeatedly, I expect about half the choices to contain identical letters 1 > and 2. > > If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,]) > repeatedly, I expect about half the choices to contain identical letters 1 > and 3. > > If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,]) > repeatedly, I expect about half the choices to contain identical letters 2 > and 3. Except that in this case, since it is not possible. > > Note each row sums to 1. > > What I would like to do - if it is possible - is combine these three sets > of weights into one set, that when used with > sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me > a list of choices, where ~50% of them contain identical letters 1 and 2, > AND ~50% of them contain identical letters 1 and 3, AND ~50% again contain > identical letters 2 and 3 (except in this example as it is not possible > from the choices). > > Can multiple probability weightings be combined in such a manner? > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Jim Lemon
2015-Jun-03 01:25 UTC
[R] Combining multiple probability weights for the sample() function.
Hi Ben, While Jean's answer looks correct, I think that there is something amiss with your specification of the problem. You have eight combinations in your "possibilities". So if you draw samples "x" where: If p(x = possibilities[1,] | possibilities[5,]) = 0.5 AND p(x = possibilities[2,] | possibilities[4,]) = 0.5 then: p(x = possibilities[3,] | possibilities[6,] | possibilities[7,] | possibilities[8,]) = 0 You state that you want the probability of drawing a triplet with identical second and third bases to be zero, which is ensured by having no such "possibilities" in the set to be sampled. This is the constraint that forces the above, as the two conditions (identical first and second, identical first and third) are disjunct and as the sum of all probabilities cannot exceed one, four "possibilities" must have probabilities of zero. Jim On Tue, Jun 2, 2015 at 11:57 PM, Adams, Jean <jvadams at usgs.gov> wrote:> Ben, > > Perhaps I am missing something, but couldn't you simply reduce your > possibilities to: > > possibilities[c(1, 5, 2, 4), ] > Var1 Var2 Var3 > [1,] "A" "A" "C" > [2,] "A" "A" "T" > [3,] "C" "A" "C" > [4,] "C" "G" "C" > > If you sample from these four rows you will have a 50% chance that Var1 and > Var2 are equal and a 50% chance that Var1 and Var3 are equal. > > Jean > > > On Tue, Jun 2, 2015 at 7:26 AM, Benjamin Ward (ENV) <B.Ward at uea.ac.uk> > wrote: > >> Dear R-List, >> >> I have a set of possibilities I want to sample from: >> >> bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T')) >> possibilities <- as.matrix(expand.grid(bases)) >> >> >possibilities >> Var1 Var2 Var3 >> [1,] "A" "A" "C" >> [2,] "C" "A" "C" >> [3,] "A" "G" "C" >> [4,] "C" "G" "C" >> [5,] "A" "A" "T" >> [6,] "C" "A" "T" >> [7,] "A" "G" "T" >> [8,] "C" "G" "T" >> >> If I want to randomly sample one of these rows. If I do this, I find that >> it is 25% likely that my choice will have an identical first and last >> letter (e.g. [1,] "A" "A" "C"). It is also 25% likely that my choice will >> have an identical first and third letter (e.g. [4,] "C" "G" "C"). It is >> not likely at all that the second and third letter of my choice could be >> identical. >> >> What I would like to do, is sample one of the rows, but given the >> constraint that the probability of drawing identical letters 1 and 2 should >> be 50% or 0.5, and at the same time the probability of drawing identical >> letters 1 and 3 should be 50%. I am unsure on how to do this, but I know it >> involves coming up with a modified set of weights for the sample() >> function. My progress is below, any advice is much appreciated. >> >> Best Wishes, >> >> Ben Ward, UEA. >> >> >> So I have used the following code to come up with a matrix, which contains >> weighting according to each criteria: >> >> possibilities <- as.matrix(expand.grid(bases)) >> identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1] >> == x[3], x[2] == x[3])) >> prob <- matrix(rep(0, length(identities)), ncol = ncol(identities)) >> consProb <- apply(identities, 1, function(x){0.5 / length(which(x))}) >> polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))}) >> for(i in 1:nrow(identities)){ >> prob[i, which(identities[i,])] <- consProb[i] >> prob[i, which(!identities[i,])] <- polProb[i] >> } >> rownames(prob) <- c("1==2", "1==3", "2==3") >> colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse >> ", ")) >> >> This code gives the following matrix: >> >> A, A, C C, A, C A, G, C C, G, C >> A, A, T C, A, T A, G, T C, G, T >> 1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333 >> 0.08333333 0.08333333 >> 1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333 >> 0.08333333 0.08333333 >> 2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 >> 0.06250000 0.06250000 >> >> Each column is one of the choices from 'possibilities', and each row gives >> a series of weights based on three different criteria: >> >> Row 1, that if it possible from the choices for letter 1 == letter 2, that >> combined chance be 50%. >> Row 2, that if it possible from the choices for letter 1 == letter 3, that >> combined chance be 50%. >> Row 3, that if it possible from the choices for letter 2 == letter 3, that >> combined chance be 50%. >> >> So: >> >> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,]) >> repeatedly, I expect about half the choices to contain identical letters 1 >> and 2. >> >> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,]) >> repeatedly, I expect about half the choices to contain identical letters 1 >> and 3. >> >> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,]) >> repeatedly, I expect about half the choices to contain identical letters 2 >> and 3. Except that in this case, since it is not possible. >> >> Note each row sums to 1. >> >> What I would like to do - if it is possible - is combine these three sets >> of weights into one set, that when used with >> sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me >> a list of choices, where ~50% of them contain identical letters 1 and 2, >> AND ~50% of them contain identical letters 1 and 3, AND ~50% again contain >> identical letters 2 and 3 (except in this example as it is not possible >> from the choices). >> >> Can multiple probability weightings be combined in such a manner? >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Boris Steipe
2015-Jun-03 18:26 UTC
[R] Combining multiple probability weights for the sample() function.
If letters 1 and 2 must be equal with p=0.5, and 1 and 3 must be equal with p=0.5, then letter 1 must be the same as either 2 or 3. Therefore: Choose a letter. Make a pair of (letter, (not letter)). Reverse the pair with p = 0.5 Concatenate your letter and the pair. Is that what you need? B. On Jun 2, 2015, at 8:26 AM, Benjamin Ward (ENV) <B.Ward at uea.ac.uk> wrote:> Dear R-List, > > I have a set of possibilities I want to sample from: > > bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T')) > possibilities <- as.matrix(expand.grid(bases)) > >> possibilities > Var1 Var2 Var3 > [1,] "A" "A" "C" > [2,] "C" "A" "C" > [3,] "A" "G" "C" > [4,] "C" "G" "C" > [5,] "A" "A" "T" > [6,] "C" "A" "T" > [7,] "A" "G" "T" > [8,] "C" "G" "T" > > If I want to randomly sample one of these rows. If I do this, I find that it is 25% likely that my choice will have an identical first and last letter (e.g. [1,] "A" "A" "C"). It is also 25% likely that my choice will have an identical first and third letter (e.g. [4,] "C" "G" "C"). It is not likely at all that the second and third letter of my choice could be identical. > > What I would like to do, is sample one of the rows, but given the constraint that the probability of drawing identical letters 1 and 2 should be 50% or 0.5, and at the same time the probability of drawing identical letters 1 and 3 should be 50%. I am unsure on how to do this, but I know it involves coming up with a modified set of weights for the sample() function. My progress is below, any advice is much appreciated. > > Best Wishes, > > Ben Ward, UEA. > > > So I have used the following code to come up with a matrix, which contains weighting according to each criteria: > > possibilities <- as.matrix(expand.grid(bases)) > identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1] == x[3], x[2] == x[3])) > prob <- matrix(rep(0, length(identities)), ncol = ncol(identities)) > consProb <- apply(identities, 1, function(x){0.5 / length(which(x))}) > polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))}) > for(i in 1:nrow(identities)){ > prob[i, which(identities[i,])] <- consProb[i] > prob[i, which(!identities[i,])] <- polProb[i] > } > rownames(prob) <- c("1==2", "1==3", "2==3") > colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse = ", ")) > > This code gives the following matrix: > > A, A, C C, A, C A, G, C C, G, C A, A, T C, A, T A, G, T C, G, T > 1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333 0.08333333 0.08333333 > 1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333 0.08333333 0.08333333 > 2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 > > Each column is one of the choices from 'possibilities', and each row gives a series of weights based on three different criteria: > > Row 1, that if it possible from the choices for letter 1 == letter 2, that combined chance be 50%. > Row 2, that if it possible from the choices for letter 1 == letter 3, that combined chance be 50%. > Row 3, that if it possible from the choices for letter 2 == letter 3, that combined chance be 50%. > > So: > > If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,]) repeatedly, I expect about half the choices to contain identical letters 1 and 2. > > If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,]) repeatedly, I expect about half the choices to contain identical letters 1 and 3. > > If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,]) repeatedly, I expect about half the choices to contain identical letters 2 and 3. Except that in this case, since it is not possible. > > Note each row sums to 1. > > What I would like to do - if it is possible - is combine these three sets of weights into one set, that when used with > sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me a list of choices, where ~50% of them contain identical letters 1 and 2, AND ~50% of them contain identical letters 1 and 3, AND ~50% again contain identical letters 2 and 3 (except in this example as it is not possible from the choices). > > Can multiple probability weightings be combined in such a manner? > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Daniel Nordlund
2015-Jun-03 19:28 UTC
[R] Combining multiple probability weights for the sample() function.
On 6/3/2015 11:26 AM, Boris Steipe wrote:> If letters 1 and 2 must be equal with p=0.5, and 1 and 3 must be equal with p=0.5, then letter 1 must be the same as either 2 or 3. Therefore: > > Choose a letter. > Make a pair of (letter, (not letter)). > Reverse the pair with p = 0.5 > Concatenate your letter and the pair. > > > Is that what you need? > > > B. > > > > On Jun 2, 2015, at 8:26 AM, Benjamin Ward (ENV) <B.Ward at uea.ac.uk> wrote: > >> Dear R-List, >> >> I have a set of possibilities I want to sample from: >> >> bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T')) >> possibilities <- as.matrix(expand.grid(bases)) >> >>> possibilities >> Var1 Var2 Var3 >> [1,] "A" "A" "C" >> [2,] "C" "A" "C" >> [3,] "A" "G" "C" >> [4,] "C" "G" "C" >> [5,] "A" "A" "T" >> [6,] "C" "A" "T" >> [7,] "A" "G" "T" >> [8,] "C" "G" "T" >> >> If I want to randomly sample one of these rows. If I do this, I find that it is 25% likely that my choice will have an identical first and last letter (e.g. [1,] "A" "A" "C"). It is also 25% likely that my choice will have an identical first and third letter (e.g. [4,] "C" "G" "C"). It is not likely at all that the second and third letter of my choice could be identical. >> >> What I would like to do, is sample one of the rows, but given the constraint that the probability of drawing identical letters 1 and 2 should be 50% or 0.5, and at the same time the probability of drawing identical letters 1 and 3 should be 50%. I am unsure on how to do this, but I know it involves coming up with a modified set of weights for the sample() function. My progress is below, any advice is much appreciated. >> >> Best Wishes, >> >> Ben Ward, UEA. >> >> >> So I have used the following code to come up with a matrix, which contains weighting according to each criteria: >> >> possibilities <- as.matrix(expand.grid(bases)) >> identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1] == x[3], x[2] == x[3])) >> prob <- matrix(rep(0, length(identities)), ncol = ncol(identities)) >> consProb <- apply(identities, 1, function(x){0.5 / length(which(x))}) >> polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))}) >> for(i in 1:nrow(identities)){ >> prob[i, which(identities[i,])] <- consProb[i] >> prob[i, which(!identities[i,])] <- polProb[i] >> } >> rownames(prob) <- c("1==2", "1==3", "2==3") >> colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse = ", ")) >> >> This code gives the following matrix: >> >> A, A, C C, A, C A, G, C C, G, C A, A, T C, A, T A, G, T C, G, T >> 1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333 0.08333333 0.08333333 >> 1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333 0.08333333 0.08333333 >> 2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 >> >> Each column is one of the choices from 'possibilities', and each row gives a series of weights based on three different criteria: >> >> Row 1, that if it possible from the choices for letter 1 == letter 2, that combined chance be 50%. >> Row 2, that if it possible from the choices for letter 1 == letter 3, that combined chance be 50%. >> Row 3, that if it possible from the choices for letter 2 == letter 3, that combined chance be 50%. >> >> So: >> >> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,]) repeatedly, I expect about half the choices to contain identical letters 1 and 2. >> >> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,]) repeatedly, I expect about half the choices to contain identical letters 1 and 3. >> >> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,]) repeatedly, I expect about half the choices to contain identical letters 2 and 3. Except that in this case, since it is not possible. >> >> Note each row sums to 1. >> >> What I would like to do - if it is possible - is combine these three sets of weights into one set, that when used with >> sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me a list of choices, where ~50% of them contain identical letters 1 and 2, AND ~50% of them contain identical letters 1 and 3, AND ~50% again contain identical letters 2 and 3 (except in this example as it is not possible from the choices). >> >> Can multiple probability weightings be combined in such a manner? >> >>Ben, If I correctly understand your requirements, you can't do what you are asking. If you only have the eight possibilities that you list, then to get letters 1 and two to match 50% of the time you must select row 1 with probability=.25 and row 5 with probability=.25. To have the first and third letters match 50% of the time you must select rows 2 and 4 each with probability=.25. Those probabilities sum to 1, so you can never select any of the other rows. Am I missing something? Dan -- Daniel Nordlund Bothell, WA USA