Hassett, chapter 2: Counting for probability

2.1 What is probability?
2.1 Sets, sample spaces, and events
2.3 Compound events: set notation
2.4 Set identities
2.5 Counting
Exercises
- Sample actuarial exam problem
  - Some hours later ...

2.1 What is probability?

Counting approach for equally likely outcomes, versus relative frequency estimates for unequally likely outcomes.

2.1 Sets, sample spaces, and events

Finite sets: $S = {1,2,3,4,5,6}$

Infinite sets are built. $$ \mathbb N = \left\{ x | x \text{ is a real number and } x>0 \right\} $$

Sample space: The set of all possible outcomes of the experiment.

If you die, your insurance company might face financial
strain

Event: A subset of the sample space for an experiment.

Examples:

Coin toss comes up heads. Sample space $S=\{H,T\}$. Event $E=\{H\}$.
At most five of one hundred life policies have death benefit claims in the next year.

Sample space: $S = \{0,1,2,\cdots,100\}$.
Event: $E = \{0,1,2,3,4,5\}$.

2.3 Compound events: set notation

Negation: $\not E$ (in LaTeX, {\sim}, to pretend it's not a binary operator)
Or (union): \cup $A \or B$
And (intersection): \cap $A \and B$

2.4 Set identities

Unions and intersections are distributive:

$$ \begin{aligned} A \and (B\or C) &= (A\and B) \or (A \and C) \\ A \or (B\and C) &= (A\or B) \and (A \or C) \end{aligned} $$

De Morgan's Laws:

$$ \begin{aligned} \not(A\or B) &= (\not A) \and (\not B) \\ \not(A\and B) &= (\not A) \or (\not B) \end{aligned} $$

2.5 Counting

Partition a set $S$ into $A$ and $\not A$. Let $n(\cdot)$ be the number of elements in some $(\cdot)$. Then

$$ n(\not A) = n(S) - n(A) $$

To find the number of elements in a union, we have to subtract the double-counted elements in the intersection:

$$ n(A\or B) = n(A) + n(B) - n(A\and B) $$

Empty sets: $\emptyset$, $\varnothing$; "mutually exclusive events" means $A\and B = \emptyset$.

Multiplication principle; permutations.

Note that if I have $n$ objects and I only select $r$ of them, the number of permutations is $P(n,r) = n!/(n-r)!$. I recover the familiar $P(n) = n!$ for the case where you take all of the objects, $r=n$.

Combinations and binomial coefficients. The binomial coefficient ("number of combinations") is sometimes written $C(n,r)$.

Note: the permutation button on a calculator is usually labeled "nPr"; the combination or "n choose r" button is labeled "nCr".

For partitions, multinomial coefficients. Suppose twenty people are split into groups of ten, six, and four. The number of ways to do this is

$$ {20\choose 4}{16\choose 6}{10\choose 10} = \frac{20!}{4!\ 16!}\cdot \frac{16!}{6!\ 10!}\cdot \frac{10!}{10!\ 0!} = \frac{20!}{4!\ 6!\ 10!} $$

Exercises

question 2-27:: Given the six letters ABCDEF, how many four-letter words can be made with and without repetitions?

With repetitions, $6^4 = 1296$; without, $P(6,4) = 6\cdot5\cdot4\cdot3 = 1296$.

question 2-39:: How many ways are there to arrange the letters in the word MISSISSIPPI?

Eleven total letters: one M, four I, four S, two P.

There are eleven places the M can go. That leaves ten places for the four I, $10 \choose 4$ options, and so on. So it's the multinomial: $$ {11\choose 1,4,4,2} = \frac{10!}{1!\ 4!\ 4!\ 2!} = 34\,650. $$

Sample actuarial exam problem

question 2-47:

An auto insurance company has 10,000 policyholders. Each is classified as

young or old
male or female
married or single

Of these, 3000 are young, 4600 are male, 7000 are married. There are 1320 young males, 3010 married males, and 1400 young married persons. Finally, 600 policyholders are young married males.

How many are young, female, and single?

set	symbol	count	complement	count
all	$S$	10 000
young	$Y$	3 000	$\not Y$	7 000
male	$M$	4 600	$\not M$	5 400
wed	$W$	7 000	$\not W$	3 000
young and male	$Y\and M$	1320
married and male	$W\and M$	3010
young and married	$Y\and W$	1400
young, married, male	$Y\and W\and M$	600

Boy, the Venn diagram really is the way to approach this. Start from the center and work my way out.

set	arithmetic	count
$Y\and W \and M$	600	600
$Y\and M\and(\not W)$	1320 - 600	720
$W\and M\and(\not Y)$	3010 - 600	2410
$Y\and W\and(\not M)$	1400 - 600	800
$Y\and \not(M\or W)$	3000 - (600 + 720 + 800)	880
$M\and \not(Y\or W)$	4600 - (600 + 720 + 2410)	870
$W\and \not(Y\or M)$	7000 - (600 + 800 + 2410)	3190

That's annoying: I don't have 10,000 total people. What have I missed?

Some hours later ...

That problem failed my sanity check because it's overdetermined. If you satisfy all of the conditions on the subsets, you can't end up with 10,000 policies. Proving this was the case (rather than me being confused, or repeatedly making the same arithmetic mistake) just ate up a great big chunk of time. So let's write down what I did.

I wanted to generate sets that had these properties, so I wrote a little Python class:

import itertools
class FillableSet (set):
    """Extend the builtin ``set`` to populate itself up to a fixed size.
    Without an explicit source, it will populated with positive integers.
    With ``exclude``, ignore some elements that may or may not appear in ``source``.
    """
    def __init__(self, size, *, source=None, exclude=[-1]):
        self.size = size
        self.fill(source=source, exclude=exclude)

    def fill(self, *, source=None, exclude=[]):
        if source is None:
            source = itertools.count(1)
        for s in source:
            if len(self) >= self.size:
                break
            if s in exclude:
                continue
            self.add(s)
        print("Length:", len(self))
        if self:
            print("Largest element:", max(list(self)))

The FillableSet will populate itself with members of the source (or with consecutive integers). So now we generate the sets. Without loss of generality, we'll make the young people first:

>>> y = FillableSet(3000)
Length: 3000
Largest element: 3000

Now let's generate the set of male people. That'll be the union of young male people and non-young male people, $$M = (M\and Y) \or (M\and\not Y).$$ So first we'll generate some young male people.

>>> y_m = FillableSet(1320, source=y)
Length: 1320
Largest element: 1320
>>> m = FillableSet(4600, source=y_m); m.fill(exclude=y)
Length: 1320
Largest element: 1320
Length: 4600
Largest element: 6280

Now we need to generate married people. Some of them are young and male:

>>> y_m_w = FillableSet( 600, source=y_m)
Length: 600
Largest element: 600

Some of them are not young, and some of them are not male, so let's generate those. The expressions are $$ M\and W = M\and W \and (Y\or\not Y),$$

>>> m_w   = FillableSet(3010, source=y_m_w) ; m_w.fill(exclude=y)
Length: 600
Largest element: 600
Length: 3010
Largest element: 5410

and $$ W\and Y = (M \or \not M)\and W \and Y.$$

>>> w_y   = FillableSet(3010, source=y_m_w) ; w_y.fill(exclude=m)
Length: 600
Largest element: 600
Length: 3010
Largest element: 7010

Now we just have to fill out $W$, the married people. We have three subsets of $W$, and we add new people until we have the size we want.

>>> w = FillableSet(7000, source=[])
Length: 0
>>> for source in y_m_w, w_y, m_w:
...     w.fill(source=source)
...
Length: 600
Largest element: 600
Length: 1400
Largest element: 2120
Length: 3810
Largest element: 5410
>>> w.fill(exclude=set.union(y,m))
Length: 7000
Largest element: 9470

That reproduces our list of intersections:

set	size
y	3000
m	4600
w	7000
y $\cap$ m	1320
m $\cap$ w	3010
y $\cap$ w	1400
y $\cap$ m $\cap$ w	600
y $\cup$ m $\cup$ w	9470

But the 10,000 policyholders are not there. The problem is overdetermined: we have one set too many.