Hassett, chapter 2: Counting for probability
2.1 What is probability?
Counting approach for equally likely outcomes, versus relative frequency estimates for unequally likely outcomes.
2.1 Sets, sample spaces, and events
Finite sets: $S = {1,2,3,4,5,6}$
Infinite sets are built. $$ \mathbb N = \left\{ x | x \text{ is a real number and } x>0 \right\} $$
- Sample space
- The set of all possible outcomes of the experiment.
- Event
- A subset of the sample space for an experiment.
Examples:
- Coin toss comes up heads. Sample space $S=\{H,T\}$. Event $E=\{H\}$.
-
At most five of one hundred life policies have death benefit claims in the next year.
Sample space: $S = \{0,1,2,\cdots,100\}$.
Event: $E = \{0,1,2,3,4,5\}$.
2.3 Compound events: set notation
Negation: $\not E$ (in LaTeX, {\sim}
, to pretend it's not a binary
operator)
Or (union): \cup
$A \or B$
And (intersection): \cap
$A \and B$
2.4 Set identities
Unions and intersections are distributive:
$$ \begin{aligned} A \and (B\or C) &= (A\and B) \or (A \and C) \\ A \or (B\and C) &= (A\or B) \and (A \or C) \end{aligned} $$
De Morgan's Laws:
$$ \begin{aligned} \not(A\or B) &= (\not A) \and (\not B) \\ \not(A\and B) &= (\not A) \or (\not B) \end{aligned} $$
2.5 Counting
Partition a set $S$ into $A$ and $\not A$. Let $n(\cdot)$ be the number of elements in some $(\cdot)$. Then
$$ n(\not A) = n(S) - n(A) $$
To find the number of elements in a union, we have to subtract the double-counted elements in the intersection:
$$ n(A\or B) = n(A) + n(B) - n(A\and B) $$
Empty sets: $\emptyset$, $\varnothing$; "mutually exclusive events" means $A\and B = \emptyset$.
Multiplication principle; permutations.
Note that if I have $n$ objects and I only select $r$ of them, the number of permutations is $P(n,r) = n!/(n-r)!$. I recover the familiar $P(n) = n!$ for the case where you take all of the objects, $r=n$.
Combinations and binomial coefficients. The binomial coefficient ("number of combinations") is sometimes written $C(n,r)$.
Note: the permutation button on a calculator is usually labeled "nPr"; the combination or "n choose r" button is labeled "nCr".
For partitions, multinomial coefficients. Suppose twenty people are split into groups of ten, six, and four. The number of ways to do this is
$$ {20\choose 4}{16\choose 6}{10\choose 10} = \frac{20!}{4!\ 16!}\cdot \frac{16!}{6!\ 10!}\cdot \frac{10!}{10!\ 0!} = \frac{20!}{4!\ 6!\ 10!} $$
Exercises
- question 2-27:
- Given the six letters ABCDEF, how many four-letter words can be made with and without repetitions?
With repetitions, $6^4 = 1296$; without, $P(6,4) = 6\cdot5\cdot4\cdot3 = 1296$.
- question 2-39:
- How many ways are there to arrange the letters in the word MISSISSIPPI?
Eleven total letters: one M, four I, four S, two P.
There are eleven places the M can go. That leaves ten places for the four I, $10 \choose 4$ options, and so on. So it's the multinomial: $$ {11\choose 1,4,4,2} = \frac{10!}{1!\ 4!\ 4!\ 2!} = 34\,650. $$
Sample actuarial exam problem
- question 2-47:
-
An auto insurance company has 10,000 policyholders. Each is classified as
- young or old
- male or female
- married or single
Of these, 3000 are young, 4600 are male, 7000 are married. There are 1320 young males, 3010 married males, and 1400 young married persons. Finally, 600 policyholders are young married males.
How many are young, female, and single?
set | symbol | count | complement | count |
---|---|---|---|---|
all | $S$ | 10 000 | ||
young | $Y$ | 3 000 | $\not Y$ | 7 000 |
male | $M$ | 4 600 | $\not M$ | 5 400 |
wed | $W$ | 7 000 | $\not W$ | 3 000 |
young and male | $Y\and M$ | 1320 | ||
married and male | $W\and M$ | 3010 | ||
young and married | $Y\and W$ | 1400 | ||
young, married, male | $Y\and W\and M$ | 600 |
Boy, the Venn diagram really is the way to approach this. Start from the center and work my way out.
set | arithmetic | count |
---|---|---|
$Y\and W \and M$ | 600 | 600 |
$Y\and M\and(\not W)$ | 1320 - 600 | 720 |
$W\and M\and(\not Y)$ | 3010 - 600 | 2410 |
$Y\and W\and(\not M)$ | 1400 - 600 | 800 |
$Y\and \not(M\or W)$ | 3000 - (600 + 720 + 800) | 880 |
$M\and \not(Y\or W)$ | 4600 - (600 + 720 + 2410) | 870 |
$W\and \not(Y\or M)$ | 7000 - (600 + 800 + 2410) | 3190 |
That's annoying: I don't have 10,000 total people. What have I missed?
Some hours later ...
That problem failed my sanity check because it's overdetermined. If you satisfy all of the conditions on the subsets, you can't end up with 10,000 policies. Proving this was the case (rather than me being confused, or repeatedly making the same arithmetic mistake) just ate up a great big chunk of time. So let's write down what I did.
I wanted to generate sets that had these properties, so I wrote a little Python class:
import itertools class FillableSet (set): """Extend the builtin ``set`` to populate itself up to a fixed size. Without an explicit source, it will populated with positive integers. With ``exclude``, ignore some elements that may or may not appear in ``source``. """ def __init__(self, size, *, source=None, exclude=[-1]): self.size = size self.fill(source=source, exclude=exclude) def fill(self, *, source=None, exclude=[]): if source is None: source = itertools.count(1) for s in source: if len(self) >= self.size: break if s in exclude: continue self.add(s) print("Length:", len(self)) if self: print("Largest element:", max(list(self)))
The FillableSet
will populate itself with members of the source
(or with consecutive integers). So now we generate the sets. Without
loss of generality, we'll make the young people first:
>>> y = FillableSet(3000) Length: 3000 Largest element: 3000
Now let's generate the set of male people. That'll be the union of young male people and non-young male people, $$M = (M\and Y) \or (M\and\not Y).$$ So first we'll generate some young male people.
>>> y_m = FillableSet(1320, source=y) Length: 1320 Largest element: 1320 >>> m = FillableSet(4600, source=y_m); m.fill(exclude=y) Length: 1320 Largest element: 1320 Length: 4600 Largest element: 6280
Now we need to generate married people. Some of them are young and male:
>>> y_m_w = FillableSet( 600, source=y_m) Length: 600 Largest element: 600
Some of them are not young, and some of them are not male, so let's generate those. The expressions are $$ M\and W = M\and W \and (Y\or\not Y),$$
>>> m_w = FillableSet(3010, source=y_m_w) ; m_w.fill(exclude=y) Length: 600 Largest element: 600 Length: 3010 Largest element: 5410
and $$ W\and Y = (M \or \not M)\and W \and Y.$$
>>> w_y = FillableSet(3010, source=y_m_w) ; w_y.fill(exclude=m) Length: 600 Largest element: 600 Length: 3010 Largest element: 7010
Now we just have to fill out $W$, the married people. We have three subsets of $W$, and we add new people until we have the size we want.
>>> w = FillableSet(7000, source=[]) Length: 0 >>> for source in y_m_w, w_y, m_w: ... w.fill(source=source) ... Length: 600 Largest element: 600 Length: 1400 Largest element: 2120 Length: 3810 Largest element: 5410 >>> w.fill(exclude=set.union(y,m)) Length: 7000 Largest element: 9470
That reproduces our list of intersections:
set | size |
---|---|
y | 3000 |
m | 4600 |
w | 7000 |
y $\cap$ m | 1320 |
m $\cap$ w | 3010 |
y $\cap$ w | 1400 |
y $\cap$ m $\cap$ w | 600 |
y $\cup$ m $\cup$ w | 9470 |
But the 10,000 policyholders are not there. The problem is overdetermined: we have one set too many.