Dear R-help, Could someone please try to explain this paradox to me? What is more likely to show up first in a string of coin tosses, "Heads then Tails", or "Heads then Heads"? ##generate 2500 strings of random coin flips ht <- replicate(2500, paste(sample(c("H", "T"), 100, replace = TRUE), collapse = "")) ## find first occurrence of HT mean(regexpr("HT", ht))+1 #mean of HT position, 4 ## find first occurrence of HH mean(regexpr("HH", ht))+1 #mean of HH position, 6 FYI, this is not homework, I have not been in school in years. I saw a similar problem posed in a blog post on the Revolutions R blog, and although I believe the answer, I'm having a hard time figuring out why this should be? Thanks, Erik Iverson
Well, If the first flip is H, then the HT pattern occurs with the first flip in the second run (after however long the 1st run of heads is). If the first flip is T, then the second run will be H's and the HT pattern will be the first flip of the 3rd run. So the HT pattern will occur after 1 or 2 runs (at the beginning of the 2nd or 3rd). On the other hand, the HH pattern can occur after any number of runs (given that the H runs are only 1 long). This means that the HH pattern has a higher probability of being in the right tail of the distribution which will increase the mean. The probability of HH or HT as the 1st pair is the same. Just looking at the first 3 flips, the probability of HH occurring first at flips 2 and 3 has only 1 chance (THH, HHH means that the first HH was at 1 and 2) and therefore has probability 1/8. The probability of HT first occurring at 2 & 3 has 2 options THT or HHT and therefore is twice as likely. Does this help? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Erik Iverson > Sent: Monday, August 31, 2009 1:17 PM > To: r-help at r-project.org > Subject: [R] Offtopic, HT vs. HH in coin flips > > Dear R-help, > > Could someone please try to explain this paradox to me? What is more > likely to show up first in a string of coin tosses, "Heads then Tails", > or "Heads then Heads"? > > ##generate 2500 strings of random coin flips > ht <- replicate(2500, > paste(sample(c("H", "T"), 100, replace = TRUE), > collapse = "")) > > ## find first occurrence of HT > mean(regexpr("HT", ht))+1 #mean of HT position, 4 > > ## find first occurrence of HH > mean(regexpr("HH", ht))+1 #mean of HH position, 6 > > FYI, this is not homework, I have not been in school in years. I saw a > similar problem posed in a blog post on the Revolutions R blog, and > although I believe the answer, I'm having a hard time figuring out why > this should be? > > Thanks, > Erik Iverson > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Case starting with H: Pr= 0.5 H first H second Subcase 1a: Pr= 0.5 * 0.5 = 0.25 ---- H first T second... leads to TH evenually Subcase 1b: Pr = 0.5 * 0.5 = 0.25 ==================Case T first: Pr = 0.5 all subcases lead to TH first -- David. On Aug 31, 2009, at 3:16 PM, Erik Iverson wrote:> Dear R-help, > > Could someone please try to explain this paradox to me? What is more > likely to show up first in a string of coin tosses, "Heads then > Tails", or "Heads then Heads"? > > ##generate 2500 strings of random coin flips > ht <- replicate(2500, > paste(sample(c("H", "T"), 100, replace = TRUE), > collapse = "")) > > ## find first occurrence of HT > mean(regexpr("HT", ht))+1 #mean of HT position, 4 > > ## find first occurrence of HH > mean(regexpr("HH", ht))+1 #mean of HH position, 6 > > FYI, this is not homework, I have not been in school in years. I > saw a similar problem posed in a blog post on the Revolutions R > blog, and although I believe the answer, I'm having a hard time > figuring out why this should be? > > Thanks, > Erik Iverson > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Part of my issue was that I was not answering my original question. "What is more likely to show up first, HT or HH?" The answer to that turns out to be "neither", or "identical chances". ht <- replicate(2500, paste(sample(c("H", "T"), 100, replace = TRUE), collapse = "")) hts <- regexpr("HT", ht) + 1 hhs <- regexpr("HH", ht) + 1 ## which is first? table(hts < hhs) # about 50/50 summary(hts) #mean of 4 summary(hhs) #mean of 6 So, "What is more likely to show up first, HH or HT?" is of course a different question than "Are the expected values of the positions for the first HT or HH the same?" I suppose that's where confusion set in. It seems that if HH appears later in the string on average (i.e., after 6 tosses instead of 4), that the probability of it being first would be lower than HT, which is obviously wrong! A quick graphic that helps show this (you must run the above code first): library(lattice) ht.df <- data.frame(count = c(hts, hhs), type = gl(2, 1250, labels = c("HT", "HH"))) barchart(prop.table(xtabs(~ count + type, data = ht.df)), stack = FALSE, horizontal = FALSE, box.ratio = .8, auto.key = TRUE) Thanks to all those who replied, and also someone sent me the following link off list, it also clears up the confusion: http://www.mit.edu/~emin/writings/coinGame.html Best, Erik -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Erik Iverson Sent: Monday, August 31, 2009 2:17 PM To: r-help at r-project.org Subject: [R] Offtopic, HT vs. HH in coin flips Dear R-help, Could someone please try to explain this paradox to me? What is more likely to show up first in a string of coin tosses, "Heads then Tails", or "Heads then Heads"? ##generate 2500 strings of random coin flips ht <- replicate(2500, paste(sample(c("H", "T"), 100, replace = TRUE), collapse = "")) ## find first occurrence of HT mean(regexpr("HT", ht))+1 #mean of HT position, 4 ## find first occurrence of HH mean(regexpr("HH", ht))+1 #mean of HH position, 6 FYI, this is not homework, I have not been in school in years. I saw a similar problem posed in a blog post on the Revolutions R blog, and although I believe the answer, I'm having a hard time figuring out why this should be? Thanks, Erik Iverson ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 31-Aug-09 19:16:33, Erik Iverson wrote:> Dear R-help, > Could someone please try to explain this paradox to me? What is > more likely to show up first in a string of coin tosses, "Heads > then Tails", or "Heads then Heads"? > >##generate 2500 strings of random coin flips > ht <- replicate(2500, > paste(sample(c("H", "T"), 100, replace = TRUE), > collapse = "")) > >## find first occurrence of HT > mean(regexpr("HT", ht))+1 #mean of HT position, 4 > >## find first occurrence of HH > mean(regexpr("HH", ht))+1 #mean of HH position, 6 > > FYI, this is not homework, I have not been in school in years. > I saw a similar problem posed in a blog post on the Revolutions R > blog, and although I believe the answer, I'm having a hard time > figuring out why this should be? > > Thanks, > Erik IversonBe very careful about the statement of the problem! [1] The probability that "HH" will occur first (i.e. before "HT") is the same as the probability that "HT" will occur first (i.e. before "HH"). [2] However, the probability that the first occurrence of "HT" will be on a given position of the "H" is generally not the same as the probability that the first occurrence of "HH" will be on the same position of the first "H". [1]: At the first occurrence of (either "HH" or "HT"), there is an initial string S, ending in an "H", followed by either an "H" (for "HH") or a "T" (for "HT"). Both are equally likely. So the probability that the first occurrence of (either "HH" or "HT") is an ""HH" is the same as the probability that it is an "HT". [2]: (A) the first occurrence of an "HH" is in a sequence of any collection of "H" and "T" provided there is no "HH" in the sequence, and the last is "H", followed by "H". However, "HT" is allowed to occur in the sequence. But (B) the first occurrence of an "HT" is in a sequence of (zero or more "T") followed by (1 or more "H") followed by "T". This is the only pattern in which "HT" does not occur prior to the final "HT". Similarly, "HH" is allowed to pccur in the sequence. The reason that, in general, the probability of "HH" first occuring at a given position is different from the probability if "HT" first occurring at that position lies in the differences between the number of possible sequences satisfying (A), and the number of possible sequences satisfying (B). The first few cases ("HH" or "HT" first occurring at (k+1), so that the position of the first "H" in "HH" or "HT" is at k) are, with their probabilities: k=1: HH HT 1/4 1/4 K=2: THH HHT THT 1/8 2/8 k=3: TTHH HHHT HTHH THHT TTHT 2/16 3/16 k=4: TTTHH HHHHT THTHH THHHT HTTHH TTHHT TTTHT 3/32 4/32 The "HT" case is simple: P.HT[k] = Prob(1st "HT" at (k+1)) = k/(2^(k+1)) Exercise for the reader: Sum(P.HT) = 1 The "HH" case is more interesting. Experimental scribblings on parer threw up an hypothesis, which I decided to explore in R. Thanks to Gerrit Eichner for suggestion the use of expand.grid()! ## Function to count sequences giving 1st HH on throw k+1 countHH <- function(k){ M <- as.matrix(expand.grid(rep(list(0:1),k))) ix <- (M[,k]==1) ## k must be an H (then k+1 will be H) for(i in (1:(k-1))){ ix<-ix&( !((M[,i]==1)&(M[,i+1]==1)) ) } sum(ix) ## list(Count=sum(ix),Which=M[ix,]) } Now, ignoring the case k=1: HHcounts <- NULL for(i in (2:12)){ HHcounts<-c(HHcounts,countHH(i)) } rbind((3:13),HHcounts) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] # 3 4 5 6 7 8 9 10 11 12 13 #HHcounts 1 2 3 5 8 13 21 34 55 89 144 Lo and Behold, we have a Fibonnaci sequence! Another exercise for the reader ... Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-09 Time: 10:38:58 ------------------------------ XFMail ------------------------------