Hassett, chapter 11: Applying multivariate distributions

11.1 Distributions of functions of two random variables
- Sums of exponential variables are gamma-distributed
- Minimum of two random variables
11.2 Expected values of functions of random variables
11.3 (exclude) Moment generating functions for sums of independent random variables
11.4 The sum of more than two random variables
11.5 Double expectation theorem
11.6 Applying the double expectation theorem
Problems

11.1 Distributions of functions of two random variables

Consider the random variable functions

$$ X+Y \quad X-Y \quad \min(X,Y) \quad \max(X,Y) $$

For sums, we're basically adding up all of the probabilities

$$\begin{aligned} p_S(s) &= \sum_x p(x, s-x) \\ &= \sum_x p_X(x) \cdot p(s-x | x) \\ &= \sum_x p_X(x) \cdot p_Y(s-x) \text{(if independent)} \end{aligned}$$

For independent continuous variables, likewise

$$\begin{aligned} f_S(s) &= \int_{-\infty}^{\infty} f_X(x)\cdot f_Y(s-x) \mathrm dx \end{aligned}$$

Sums of exponential variables are gamma-distributed

Let's look at the waiting times between accidents in two towns. The p.d.f. and the marginal density functions are

$$\begin{aligned} f(x,y) &= e^{-x-y} \\ f_X(x) &= e^{-x} \\ f_Y(y) &= e^{-y} \\ \end{aligned}$$

Let's show (as was done in the previous chapter, but I skipped it) that $X$ and $Y$ are independent. We have the marginal probability

$$\begin{aligned} f_Y(y) &= \int_0^\infty \mathrm dx\ e^{-x-y} \\ &= e^{-y} \cdot \left(1-0\right) \text{, as assumed above} \end{aligned}$$

and the conditional probability

$$\begin{aligned} f(x|y) &= \frac{f(x,y)}{f(y)} \\ &= \frac{e^{-x-y}}{e^{-y}} = f_X(x) \end{aligned}$$

and vice-versa. So they're independent. Now we want to find the density function for $X+Y$.

$$\begin{aligned} f(X+Y) &= \int_0^\infty \mathrm dx\ f_X(x) \cdot f_Y(s-x) \\&= \int_0^\infty \mathrm dx\ e^{-x} \cdot e^{-(s-x)} \\&= \int_0^\infty \mathrm dx\ e^{-s} \qquad ??? \end{aligned}$$

Oh, this is subtle: because the probability density for a negative waiting time is zero, I should have changed the limits of the integral. Let's write that again, more clearly.

$$\begin{aligned} f(x,y) &= \begin{cases} e^{-x-y} & (x,y) \geq 0 \\ 0 & x<0 \text{ or } y < 0 \\ \end{cases} \\ f_X(x) &= \begin{cases} e^{-x} & x > 0 \\ 0 & x < 0 \end{cases} \\ f_Y(y) &= \begin{cases} e^{-y} & y > 0 \\ 0 & y < 0 \end{cases} \end{aligned}$$

So now we have the joint probability distribution for the sum of

$$\begin{aligned} f(X+Y) &= \int_0^\infty \mathrm dx\ f_X(x) \cdot f_Y(s-x) \\&= \int_0^s \mathrm dx\ e^{-x} \cdot e^{-(s-x)} + \int_s^\infty \mathrm dx\ e^{-x} \cdot 0 \\&= e^{-s} \int_0^s \mathrm dx\ 1 \\&= s e^{-s} \end{aligned}$$

Because $X$ and $Y$ were exponentially distributed with $\beta=1$, their sum $X+Y$ should be gamma-distributed with $(\alpha,\beta) = (2,1)$. Remember that the gamma distribution has density

$$\begin{aligned} f_\Gamma(x) &= \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}. \end{aligned}$$

I can sort of see how to get this relation in general from induction, but I shouldn't spend the time on it.

Minimum of two random variables

Now let's look at $\min(X+Y)$ for exponential distributions. Remember the cumulative and survival functions

$$\begin{aligned} F(t) &= &P(X \leq t) &= 1-e^{-\beta t} \\ S(t) &= &P(X > t) &= e^{-\beta t} \\ \end{aligned}$$

Suppose $X$ and $Y$ are exponentially distributed with parameters $\beta,\lambda$. The probability that $\min(X,Y)$ is at least some time $t$ is the probability that both $X$ and $Y$ have survived through $t$. That is

$$\begin{aligned} S(t) &= P(\min{X,Y} > t) \\ &= P(X > t \and Y > t) \\ &= P(X > t) \cdot P(Y > t), \text{ by independence} \\ &= S(X) \cdot S(Y) \\ &= e^{-\beta t}e^{-\lambda t} = e^{-(\beta+\lambda)t} \end{aligned}$$

This survival functions means that the minimum time is exponentially distributed with parameter $\beta+\lambda$.

This procedure works for the minimum of any two (independent) random variables. The survival obeys

$$\begin{aligned} S_\text{min}(t) & = S(X) \cdot S(Y) \end{aligned}$$

because both independent variables must survive. The survival function for the minimum might correspond to a known distribution. Likewise, the distribution for the maximum of two variables can be found by preserving the c.d.f.,

$$\begin{aligned} F_\text{max}(t) &= F_X(t) \cdot F_Y(t) \end{aligned}$$

11.2 Expected values of functions of random variables

We don't need to know distributions to find expectation values, because we can do

$$\begin{aligned} E[ g(X,Y) ] &= \sum_{x,y} g(x,y) \cdot p(x,y) \end{aligned}$$

or its continuous equivalent.

Because expectation values are linear,

$$\begin{aligned} E(X+Y) &= E(X) + E(Y) \end{aligned}$$

Products don't work like that, unless $X$ and $Y$ are independent.

Covariance

The covariance is

$$\begin{aligned} \text{Cov} (X,Y) &= E[ (X-\mu_x) \cdot (Y-\mu_Y) ] \end{aligned}$$

with positive and negative associations/correlations having the usual meaning.

An alternative definition is

$$\begin{aligned} \text{Cov}(X,Y) &= E(XY) - E(X)\cdot E(Y) \end{aligned}$$

This formulation makes it clear (based on the statements above) that, if $X$ and $Y$ are independent, their covariance will be zero.

The variance of the sum depends on the covariance:

$$\begin{aligned} V(X+Y) &= V(X) + V(Y) + 2\cdot\text{Cov}(X+Y) \end{aligned}$$

Some properties:

The covariance is symmetric: $\text{Cov}(X,Y) = \text{Cov}(Y,X)$.
The covariance of a random variable with itself is its variance.
The covariance of a random variable with a constant is zero.
Scaling either random variable scales the covariance: $$ \text{Cov}(aX,bY) = ab\cdot\text{Cov}(X,Y).$$
Covariances are distributive: $$\text{Cov}(X,Y+Z) = \text{Cov}(X,Y) + \text{Cov}(X,Z).$$

Correlation coefficients

The correlation coefficient is

$$\begin{aligned} \rho_{XY} &= \frac{\text{Cov}(X,Y)}{\sigma_X\sigma_Y} \\&+ \frac{\text{Cov}(X,Y)}{\sqrt{V(X)\cdot V(Y)} \end{aligned}$$

Variables which are linearly related have a unit correlation coefficient:

$$\begin{aligned} \rho_{XY} &= \frac{\text{Cov}(X, aX+b)}{\sigma_X\sigma_{aX+b}} \\&= \frac{\text{Cov}(X,aX) + \text{Cov}(X,b)}{\sigma_X(|a|\sigma_{aX+b})} \\&= \frac{a\cdot V(X) + 0}{|a| \sigma_X^2} \\&= \pm 1, \text{ depending on the sign of } a. \end{aligned}$$

(exclude) Bivariate normal distribution

Correlated normal variables can have

$$\begin{aligned} f(x,y) &= \frac{1}{2\pi\sigma_x\sigma_y\sqrt{1-\rho^2}} e^{ \frac{-1}{2(1-\rho^2)} \left[ \left( \frac{x-\mu_x}{\sigma_x}\right)^2 - 2\rho \left( \frac{x-\mu_x}{\sigma_x}\right) \left( \frac{y-\mu_y}{\sigma_y}\right) + \left( \frac{y-\mu_y}{\sigma_y}\right)^2 \right] } \end{aligned}$$

11.3 (exclude) Moment generating functions for sums of independent random variables

I don't care about moment-generating functions.

11.4 The sum of more than two random variables

Sums of different distributions

Poisson $\to$ Poisson

If $X_1,X_2,\cdots,X_n$ are independent Poisson random variables with parameters $\lambda_1,\lambda_2,\cdots,\lambda_n$, then their sum $\sum_i X_i$ is Poisson distributed with parameter $\sum_i\lambda_i$.

Geometric $\to$ negative binomial

The sum of $n$ i.i.d. geometric random variables with success probability $p$ is a negative binomial random variable with the same $p$ and $r=n$.

Normal $\to$ normal

If the $X_i$ are independent normal variables with means $\mu_i$ and variances $\sigma_i^2$, then the sum $\sum_i X_i$ has mean $\sum_i\mu_i$ and variance $\sum_i \sigma_i ^2$.

Exponential $\to$ gamma

If the $X_i$ are i.i.d. exponential random variables with parameter $\beta$, their sum $\sum_i X_i$ is a gamma random variable with $\alpha=n$ and the same $\beta$.

Mean and variance of a multiple sums

In a triple sum, the pairwise terms all enter the covariance twice.

$$\begin{aligned} E(X+Y+Z) &= E(X) + E(Y) + E(Z) \\ V(X+Y+Z) &= V(X) + V(Y) + V(Z) \\& \quad + 2\times\left( \text{Cov}(X,Y) + \text{Cov}(X,Z) + \text{Cov}(Y,Z) \right) \end{aligned}$$

In fact, that's true no matter how many terms are in the sum.

$$\begin{aligned} E\left( \sum_i X_i \right) &= \sum_i E(X_i) \\ V\left( \sum_i X_i \right) &= \sum_i V(X_i) + 2\sum_{i<j} \text{Cov}(X_i,X_j) \end{aligned}$$

Central limit theorem

Big sums are normally distributed.

11.5 Double expectation theorem

The expectation value of a conditional probability is just the expectation value of the variable back again:

$$\begin{aligned} E[ E(X|Y) ] &= E(X) \end{aligned}$$}

The conditional variance is

$$\begin{aligned} V(X|Y=y) &= E(X^2 | Y=y) - \left( E(X|Y=y) \right)^2 \end{aligned}$$

Eventually this cute thing happens:

$$\begin{aligned} V(X) &= E[ V(X|Y) ] + V[ E(X|Y) ] \\ V(Y) &= E[ V(Y|X) ] + V[ E(Y|X) ] \\ \end{aligned}$$