Wednesday, January 11, 2012

Probability of Consecutive Coin Flips

Given (n) coin flips, what's the probability of getting at least one pair of consecutive heads?

If n = 2, the probability is 1/4. If n = 3, the probability is 3/8 (HHH, HHT, THH). If n = 4, the probability turns out to be 8/16. The denominator of the probability fraction, in its unsimplified form, will be 2^n. The challenge is to find the numerator. For simplify in wording, here are the phrase abbreviations:
  • "at least a pair of consecutive heads" --> "HH" 
  • "no pair of consecutive heads" --> "!HH"
  • "heads" --> "H" and "tails" --> "T"
Let p(n) be the probability that first (n) flips feature HH. This is what we're ultimately looking for. As with many cases in probability, it may be easier to define and use the complimentary probability q(n) as 1-p(n). Moreover, let's define q'(n) as simply the denominator of the unsimplified fraction q(n). So q'(2) = 3, q'(3) = 5, q'(4) = 8, and so on. To get started, take the first values of (n) and list out the possibilities of combinations featuring !HH.

n = 2 n = 3 n = 4
HT HTT HTTT
TH THT THTT
TT TTT TTTT

TTH TTHT

HTH HTHT


TTTH


THTH


HTTH

While the ordering wasn't particularly important at this stage, notice that the first three rows of n=3 was conveniently laid out so that they are simply addition of one more T from the same row under n=2; same for going from n=3 to n=4. Now take a looks specifically at n=3. The first toss can be H or T. For 3 flips featuring !HH:
  • If the 1st flip is T, there are 3 possible combinations (THT, TTT, TTH)
  • If the 1st flip is H, there are 2 possible combinations (HTT, HTH)
The combinations of those 5 combinations happen to be the 5 combinations listed in the column above for n=3. Now start from n=2:
  • For 2 of those combinations ending with T, each will produce 3 further possible combinations in 3rd and 4th flip to keep the !HH property
  • For the remaining combination ending with H, it will produce 2 further possible combinations instead
Add 2*3 + 1*2 = 8, and it is no surprise that q'(4) = 8 as established. How the question remains: how did we just know that for n=2, two combinations ended in T?

The answer actually indirectly lies in q'(3). Observe the column of combinations above for n=2 and n=3. In particular, look at the last two rows. Similarly, compare n=3 and n=4 rows and observe the last three rows of n=4. What are those "extra" rows without something to the left of it in the table? They all end in H, and the rest of the term (without the final H) are exactly the combinations that end in T for the previous column.

Let's formulate some ideas for q'(5). Take q'(3) = 5 to start.
  • Each of those 5 combinations will produce at least two possible combinations (of 4th and 5th flip) to keep the !HH property ongoing at n=5
  • Furthermore, some portion of those 5 combinations will produce a third combination, if it ends in T
How did we how many ends in T? It's q'(4) - q'(3), or the number of those "extra" rows.
  • Summing them together, q'(5) = 2*q'(3) + (q'(4) - q'(3))
  • That simplifies to q'(5) = q'(3) + q'(4)
Suddenly, it looks very familiar. That is exactly the Fibonacci sequence that q'(n) is following, with q'(1) = 2, q'(2) = 3, etc. Precisely, given the initial offset, the Fibonnaci sequence F(n) = q'(n-2) or F(n+2) = q'(n). But recall, we're trying to find p(n), and in the process, defined q(n) and q'(n). To convert q'(n) = q'(n-1) + q'(n-2), recall that q'(n) is simply numerator of q(n), whereby the denominator is 2^n.
  • Multiple each term by either 2^n, 2^(n-1), or 2^(n-2)
  • That ends up being 4*q(n) = 2*q(n-1) + q(n-2)
  • Simplify to q(n) = (1/2)*q(n-1) + (1/4)*q(n-2) with conditions that q(1) = 1, q(2) = 3/4
  • Finally, p(n) = 1 - (1/2)*(1-p(n-1)) - (1/4)*(1-p(n-2))
Alternatively, it may just be more convenient to leave the result in Fibonacci sequence, since those numbers are more easily accessible. In that case, it goes from:
  • Given F(n+2) = q'(n)
  • q(n) = F(n+2) / (2^n)
  • p(n) = 1 - F(n+2) / (2^n)
In either case, q(n) gives the probability of featuring no consecutive heads in the first (n) flips; p(n) gives the probability of having consecutive heads. To give some numerical sense, Java coding was written for the modified Fibonacci method. Beware that given the recursive method, the program will run quite slowly as (n) gets around 40.
  int until = 0;
  System.out.print("n = ? ");
  until = reader.nextInt();
  for(int i=1; i<=until; i++)
  {
      System.out.println(i + ": " + (1-fib(i)));
  }

  public static double fib(int run)
  {
      if(run == 1)
          return 1;
      else if(run == 2)
          return 0.75;
      else
          return (0.5)*fib(run-1) + (0.25)*fib(run-2);
  }

The condensed results are as follows:

n p(n)
1 0.00%
2 25.00%
3 37.50%
4 50.00%
5 59.38%
6 67.19%
7 73.44%
8 78.52%
9 82.62%
10 85.94%
15 95.13%
20 98.31%
25 99.41%
30 99.80%
35 99.93%
40 99.98%
45 99.99%

As expected, the probability quickly climbs up as the number of flips increase, and asymptotically approaches certainty.