Hassett, chapter 10: Multivariate distributions

10.1 Joint distributions for discrete random variables
- Marginal distributions
10.2 (exclude) Joint distributions for continuous random variables
10.3 Conditional distributions
- Conditional expected value
10.4 Independence for random variables
10.5 The multinomial distribution
Problems

10.1 Joint distributions for discrete random variables

Consider an investor who owns two things $x$ and $y$. In a year their values have these probabilities:

$y$ $x=90$ $x=100$ $x=110$

0 0.05 0.27 0.18

10 0.15 0.33 0.02

Joint probability is $p(x,y) = P(X=x,Y=y)$.

$y$	$x=90$	$x=100$	$x=110$
0	0.05	0.27	0.18
10	0.15	0.33	0.02

An example where the probability distribution is a formula:

Suppose $X$ is accidents per day in town $A$, and $Y$ is accidents per day in town $B$. Perhaps

$$\begin{aligned} p(x,y) &= \frac{e^{-2}}{x!y!} & \text{for } x,y \in \mathbb N \end{aligned}$$

The probability there's on accident in $A$ and two in $B$ is

$$\begin{aligned} p(1,2) &= \frac{e^{-2}}{1!2!} \approx 0.068 \end{aligned}$$

Note that this is the product of two independent Poisson distributions with mean $\lambda=1$.

Marginal distributions

From the joint distributions we can find the individual ones. Starting from the table above,

$$\begin{aligned} P(X=90) &= P(X=90 | Y=0) + P(X=90 | Y=10) \\ &= 0.05 + 0.15 = 0.20 \end{aligned}$$

We can add sums if we like to find the "marginal distributions":

$y$	$x=90$	$x=100$	$x=110$	$p(y)$
0	0.05	0.27	0.18	0.50
10	0.15	0.33	0.02	0.50
$p(x)$	0.20	0.60	0.20

$$\begin{aligned} p_X(x) &= \sum_y p(x,y) \\ p_Y(y) &= \sum_x p(x,y) \\ \end{aligned}$$

10.2 (exclude) Joint distributions for continuous random variables

For single variables, continuous distributions obey

Non-negative, $f(x) \geq 0$ for all $x$
Normalized, $\int_{-\infty}^\infty f(x) \mathrm dx = 1 $
$f=\frac{\mathrm dP}{\mathrm dx}$, or $P(a<X<b) = \int_a^b f(x)\mathrm dx$

Similarly,

Non-negative, $f(x,y) \geq 0$ for all $x,y$
Normalized, $\int_{-\infty}^\infty \int_{-\infty}^\infty f(x) \ \mathrm dx\ \mathrm dy = 1 $
$f=\frac{\mathrm d^2P}{\mathrm dx \ \mathrm dy}$, or $$\begin{aligned} P(a<X<b, c<Y<d) &= \int_a^b\mathrm dx \int_c^d\mathrm dy\ f(x,y) \end{aligned}$$

Marginal distributions are

$$\begin{aligned} f_X(x) &= \int_{-\infty}^\infty \mathrm dy\ f(x,y) \\ f_Y(y) &= \int_{-\infty}^\infty \mathrm dx\ f(x,y) \\ \end{aligned}$$

10.3 Conditional distributions

Discrete conditionals.

Let's look at our table again:

$y$ $x=90$ $x=100$ $x=110$ $p(y)$

0 0.05 0.27 0.18 0.50

10 0.15 0.33 0.02 0.50

$p(x)$ 0.20 0.60 0.20

If $Y=0$, we can find $$\begin{aligned} P(X | Y+0) &= \frac{P(X \and (Y=0))}{P(Y=0)} \\ &= \frac{P(\text{top row})}{0.50} \end{aligned}$$

expanding to

$y$ $x=90$ $x=100$ $x=110$ $p(y)$

0 0.10 0.54 0.36 1.00

$y$	$x=90$	$x=100$	$x=110$	$p(y)$
0	0.10	0.54	0.36	1.00

Likewise for continuous probability distributions:

Let X be sick leave hours last year and $Y$ be sick leave hours this year, with $$\begin{aligned} f(x,y) &= 2 - 1.2x - 0.8y \\ f_X(x) &= 1.6 - 1.2x \\ f_Y(y) &= 1.4 - 0.8y \\ \end{aligned}$$ over $x,y$ within $[0,1]$.

So $$\begin{aligned} f(x|y) &= \frac{f(x,y)}{f(y)} &&= \frac{2 - 1.2x - 0.8y}{1.4 - 0.8y} \\ f(y|x) &= \frac{f(x,y)}{f(x)} &&= \frac{2 - 1.2x - 0.8y}{1.6-1.2x} \end{aligned}$$

Now, to find probabilities, do I have to integrals with polynomials in the denominator? Not necessarily:

$$\begin{aligned} P(Y<0.40 | X=0.10) &= \int_0^{0.40} \mathrm dy \left( \frac{2-1.2\cdot0.10 - 0.8y}{1.6 - 1.2\cdot0.10} \right) \\&= \int_0^{0.40} \mathrm dy \left( \frac{1.88 - 0.8y}{1.48} \right) \end{aligned}$$

I'm pretty sure that you don't have to put the integral outside of the fraction: Bayes' rule is about probabilities, not about probability densities. Suppose that $x$ and $y$ have different dimensions so that

$f=\frac{\mathrm dP}{\mathrm dx\ \mathrm dy}$ has dimension $[xy]^{-1}$
$f_X = \int\mathrm dy\ \frac{\mathrm dP}{\mathrm dx\ \mathrm dy}$ has dimension $[x]^{-1}$, as you'd expect, and similar with $f_Y$.

What is the dimension of the conditional probability?

$$\begin{aligned} f(x|y) &= \frac{f(x,y)}{f(y)} \\ [f(x|y)] &= \frac{ [xy]^{-1}}{ [y]^{-1}} = [x]^{-1} \end{aligned}$$

It needs to be $[x]^{-1}$, because we want to integrate only over $x$ again.

Let's look at the probability-based definition.

$$\begin{aligned} P(a<X<b | c<Y<d ) &= \frac{ P(a<X<b , c<Y<d ) } {P(c<Y<d)} \\&= \frac{ \left(\int_a^b \mathrm dx \int_c^d \mathrm dy\right) \ f(x,y) }{ \left(\int_{-\infty}^{\infty} \mathrm dx \int_c^d \mathrm dy\right) \ f(x,y) } \\&= \frac{ \left(\int_a^b \mathrm dx' \int_c^d \mathrm dy'\right) \ f(x',y') }{ \left(\int_{-\infty}^{\infty} \mathrm dx' \int_c^d \mathrm dy'\right) \ f(x',y') } \end{aligned}$$

I guess I'm leaning towards the cumulative distribution,

$$\begin{aligned} f(x|c<Y<d) &= \frac{\mathrm d}{\mathrm dx} P(X<x|c<Y<d) \\&= \frac{ \left(\int_a^x \mathrm dx' \int_c^d \mathrm dy'\right) \ f(x',y') }{ \left(\int_{-\infty}^{\infty} \mathrm dx' \int_c^d \mathrm dy'\right) \ f(x',y') } \end{aligned}$$

So there is are integrals in both the numerator and the denominator.

In the limit $d-c\to0$, a well-behaved probability function is going to do the same thing in both the numerator and the denominator. There's probably some fancy way to write this in terms of delta functions.

Anyway. I was supposed to skip this part.

Conditional expected value

We have

$$\begin{aligned} E(Y|X=x) = \sum_y y\cdot p(y|x) \text{, etc.} \\ \end{aligned}$$

10.4 Independence for random variables

They're independent if

$$\begin{aligned} p(x,y) &= p_X(x)\cdot p_Y(y) \end{aligned}$$

in which case

$$\begin{aligned} p(x|y) &= p_X(x) & p(y|x) &= p_Y(y) \end{aligned}$$

which comes trivially from Bayes' Theorem.

Likewise independence means

$$\begin{aligned} f(x|y) &= f_X(x) \text{, and vice-versa.} \end{aligned}$$

10.5 The multinomial distribution

The number of partitions of $n$ objects into $k$ groups of size $n_1,n_2,\cdots,n_k$ is given by

$$\begin{aligned} {n \choose n_1, n_2, \cdots n_k} &= \frac{ n! }{n_1!\, n_2! \cdots n_k!} \end{aligned}$$

Then the probability of a particular partition size is

$$\begin{aligned} P(X_1 = n_1 \and X_2 = n_2 \and \cdots \and X_k = n_k ) &= {n \choose n_1, n_2, \cdots n_k} p_1^{n_1} p_2^{n_2} \cdots p_k^{n_k} \end{aligned}$$

We have to have $\sum_i p_i = 0$ for the probabilities $p_i$ that an item from the $i$-th group is picked. The binomial distribution is a special case, where there's one group that we care about and the other group is those we don't.