M HYPE SPLASH
// news

Expected value is a linear operator? Under what conditions is median also a linear operator?

By Emma Valentine
$\begingroup$

I have always taken for granted that expected value is a linear operator. For any random variables $X$ and $Y$: $E(aX + bY) = aE(X) + bE(Y)$. Can anyone point me to a rigorous proof of this?

Also, I know that generally median $Med()$ is not a linear operator, meaning $Med(aX + bY)$ might not be equal to $a Med(X) + b Med(Y)$. Are there absolute criteria / rules when $Med$ is a linear operator, and when it is not?

$\endgroup$ 4

10 Answers

$\begingroup$

First question: By definition, on a probability space ($\Omega,\mathcal{F},P$), the expected value of a random variable $X:\Omega\to \mathbb{R}$ is defined as $$E(X)=\int_\Omega X(\omega) dP(\omega).$$ Note that it is only well-defined if the integral converges absolutely, i.e. $$\int_\Omega |X(\omega)| dP(\omega)<\infty$$

(The integrals above are Lebesgue integrals. If $X$ is discrete, $P$ is the point measure and this integral turns into a sum.)

Therefore, if the expected values of both $X$ and $Y$ exist, then $E(X+Y)$ exists (triangle inequality) and, for $a,b\in \mathbb{R}$, we can use the linearity of the Lebesgue integral to conclude $$E(aX+bY)=\int_\Omega aX +bYdP=a\int_\Omega X dP+b\int_\Omega YdP=aE(X)+bE(Y).$$

$\endgroup$ 4 $\begingroup$

EXTENDED/REVISED ANSWER

Some general points concerning the second question. By definition, $m$ is a median of $X$ if ${\rm P}(X \ge m) \geq 1/2$ and ${\rm P}(X \le m) \geq 1/2$. While a median is uniquely determined for any common example of a continuous random variable, it is not uniquely determined in general. For example, any number $m \in [-1,1]$ is a median for random variable $X$ with ${\rm P}(X=1) = {\rm P}(X=-1) = 1/2$. Hence my previous answer to this question (see below), where I assumed that $m(X)=0$ since $X$ is symmetric, should be revised. This is done simply as follows. We define $X$ and $Y$ exactly as before, and introduce another random variable $\tilde X$ defined to be equal to $X$ with probability $1-1/n$ and to $0$ with probability $1/n$. It is immediately checked that the symmetric random variables $\tilde X$ and $Y$ have a unique median, equal to $0$. Thus $m(\tilde X) + m(Y) = 0$, as required. On the other hand, one easily verifies that ${\rm P}(\tilde X + Y = 1) \to 3/4$ as $n \to \infty$ (cf. my previous answer), which implies that $\tilde X + Y$ has a unique median, equal to $1$. So, $m(\tilde X + Y) \neq m(\tilde X) + m(Y)$, as required.

In view of this example, we now give a counterexample for the case where $X$ and $Y$ are independent. Let $X$ and $Y$ be i.i.d. random variables with common probability mass function given by $p(2)= p(-1) = \frac{1}{2}(1 - \frac{1}{n})$, $p(0) = \frac{1}{n}$. Then, $X$ and $Y$ have a unique median, equal to $0$. On the other hand, one verifies that both ${\rm P}(X+Y \geq 1)$ and ${\rm P}(X+Y \leq 1)$ tend to $3/4$ as $n \to \infty$; hence, $X + Y$ has a unique median, equal to $1$. So, $m(X + Y) \neq m(X) + m(Y)$, as required.

In view of the preceding examples, we finally consider the case where $X$ and $Y$ are both symmetric and independent. Assuming both $X$ and $Y$ have a unique median, it must obviously be equal to $0$. For any fixed numbers $a$ and $b$, $aX + bY$ is also symmetric. Moreover, $aX + bY$ has a unique median, equal to $0$. This can be carried out straightforwardly, upon observing that ${\rm P}(X \in (-\varepsilon,\varepsilon), Y \in (-\varepsilon,\varepsilon)) > 0$ for any $\varepsilon > 0$. Hence, $m(aX+bY)=am(X)+bm(Y)=0$. From this, it easy to establish the following generalization. Suppose that $X$ and $Y$ are independent and symmetric around arbitrary points, say $m_1$ and $m_2$, respectively. Assume that both $X$ and $Y$ have a unique median (these medians are necessarily given by $m(X)=m_1$ and $m(Y)=m_2$). Then, for any fixed numbers $a$ and $b$, $m(aX+bY)$ has a unique median, equal to $am(X)+bm(Y)=am_1+bm_2$.

PREVIOUS ANSWER

For the second question, let us show that even if $X$ and $Y$ are symmetric random variables, then $m(X+Y)$ might be different from $m(X)+m(Y)$ (where $m$ denotes median). Suppose that ${\rm P}(X=1) = {\rm P}(X=-1) = 1/2$; hence $X$ is symmetric. Define $Y$ as follows: if $X=1$ then $Y=0$, whereas if $X=-1$ then $Y=2$ or $-2$ with probability $1/2$ each. Then, ${\rm P}(Y=0)=1/2$, ${\rm P}(Y=2)=1/4$, and ${\rm P}(Y=-2)=1/4$. Hence $Y$ is symmetric and, in turn, $m(X)+m(Y)=0+0=0$. However, ${\rm P}(X+Y=1) = 3/4$ (and ${\rm P}(X+Y=-3) = 1/4)$. In particular, $m(X+Y)=1$ (since ${\rm P}(X+Y=1) \geq 1/2$).

$\endgroup$ 3 $\begingroup$

An intuitive and highly informal (do not look for math rigor here) argument for the linearity of the expectation operator, heavily inspired by the formulation of Meyer and Rubinfeld and somehow restating Troy's argument in discrete terms.

  • Let us have a set of possible states of the world $S$.
  • Let us have a mapping $Pr : S \rightarrow [0,1]$, that associate every state of the world $s\in S$ with a probability $Pr(s)$ (with usuals conditions on probability mappings applying to $Pr(\cdot)$).
  • Let $R_1$ and $R_2$ be random variables, which take real values in each and every state of the world $s\in S$.
  • Let $T$, another random variable taking values in $s\in S$, be defined by $T = R_1 + R_2$ .

Then we have:

$$E[T] = \sum_{s\in S} T(s) Pr(s)$$ $$~~~~~~~~~~~~~~~~~~~~~~~~~~= \sum_{s\in S} [R_1(s) + R_2(s)] Pr(s)$$ $$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \sum_{s\in S} [R_1(s) Pr(s)] + \sum_{s\in S} [R_2(s) Pr(s)]$$ $$~~~~~~~~~~~~= E[R_1] + E[R_2]$$

$\endgroup$ $\begingroup$

Intuitive (and somewhat informal) justification for the linearity of expectation. Suppose that random variables $X$ and $Y$ have expectation ${\rm E}(X)$ and ${\rm E}(Y)$, respectively. Suppose also that $(X_n,Y_n)$ is a sequence of i.i.d. random vectors having the (joint) distribution of $(X,Y)$. Then, in particular, $X_n$, $Y_n$, and $aX_n + bY_n$ are sequences of i.i.d. random variables having the distribution of $X$, $Y$, and $aX + bY$, respectively. Thus, from $$ a{\rm E}(X) + b{\rm E}(Y) \stackrel{{\rm a.s.}}{=} a\mathop {\lim }\limits_{n \to \infty } \frac{{\sum\nolimits_{i = 1}^n {X_i } }}{n} + b\mathop {\lim }\limits_{n \to \infty } \frac{{\sum\nolimits_{i = 1}^n {Y_i } }}{n}, $$ we conclude that $$ {\rm E}(aX + bY) \stackrel{{\rm a.s.}}{=} \mathop {\lim }\limits_{n \to \infty } \frac{{\sum\nolimits_{i = 1}^n {(aX_i + bY_i )} }}{n} \stackrel{{\rm a.s.}}{=} a{\rm E}(X) + b{\rm E}(Y), $$
where $\stackrel{{\rm a.s.}}{=}$ stands for `almost surely equal', and where we have used the strong law of large numbers, namely, if $Z_n$ is a sequence of i.i.d. random variables having the distribution of a random variable $Z$ with finite expectation, then ${\rm E}(Z) \stackrel{{\rm a.s.}}{=} \mathop {\lim }\limits_{n \to \infty } \frac{{\sum\nolimits_{i = 1}^n {Z_i } }}{n}$.

$\endgroup$ $\begingroup$

HINT: Use the definition of $$ E(X) = \sum\limits_{x} x P(X)$$

$\endgroup$ 2 $\begingroup$

If you follow the definition of Troy's answer, then you get the answer just because you defined the expectation that way. I think the following is why such definition makes sense. Let's just consider the discrete case. A random variable $X$ is "modeled" by a probability density function $p_{X} : \mathbb{N} \rightarrow \mathbb{R}$ such that $p_{X} \geq 0$ and $\sum_{k \in \mathbb{N}} p_{X}(k) = 1$.

That is, we write $P(X = k) := p_{X}(k)$ and interpret the left-hand side in any probabilistic situation. Suppose that we only consider $X$ such that $\sum_{k=0}^{\infty}kp_{X}(k)$ is convergent. We define $E(X) := \sum_{k=0}^{\infty}kp_{X}(k)$.

Say $X$ and $Y$ (i.e., $p_{X}, p_{Y}$) are given. We naturally want to "define" $p_{X+Y}(k) := P(X = 0, Y = k) + P(X = 1, Y = k-1) + \cdots + P(X = 0, Y = k)$. However, unless $X$ and $Y$ are independent, the terms on the right-hand side are not defined yet. Suppose that we consider any fucntion $p : \mathbb{N} \times \mathbb{N} \rightarrow \mathbb{R}$ such that

  • $p \geq 0$;
  • $\sum_{m, n \in \mathbb{N}}p(m, n)$;
  • $\sum_{n=0}^{\infty}p(m, n) = p_{X}(m)$ $\sum_{m=0}^{\infty}p(m, n) = p_{Y}(n)$.

Now, define $P(X = m, Y = n) := p(m, n)$.

Assuming all the convergences, we follow our definitions to compute:

$E(X + Y) = \sum_{k=0}^{\infty}kp_{X+Y}(k) = \sum_{k=0}^{\infty}\sum_{m + n = k}(m + n)p(m, n) = \sum_{k=0}^{\infty}\sum_{m + n = k}mp(m, n) + \sum_{k=0}^{\infty}\sum_{m + n = k}np(m, n) = \sum_{m, n \in \mathbb{N}}mp(m, n) + \sum_{m, n \in \mathbb{N}}np(m, n) = \sum_{m=0}^{\infty}m\sum_{n=0}^{\infty}p(m, n) + \sum_{n=0}^{\infty}n\sum_{m=0}^{\infty}p(m, n) = \sum_{m=0}^{\infty}m\sum_{n=0}^{\infty}p(m, n) + \sum_{n=0}^{\infty}n\sum_{m=0}^{\infty}p(m, n) = \sum_{m=0}^{\infty}mp_{X}(m) + \sum_{n=0}^{\infty}np_{Y}(n) = E(X) + E(Y).$

That is no matter how we define $P(X = m, Y = n)$, as long as it satisfies the three natrual axioms above, we get the additivity of expectation.

Now, instead of showing $E(aX) = aE(X)$, we show $E(g(X)) = \sum_{k=0}^{\infty}g(k)p_{X}(k)$. It is natural to define $P(g(X) = k) := \sum_{k \in g^{-1}(j)}p_{X}(k)$. We now compute:

$\sum_{k=0}^{\infty}g(k)p_{X}(k) = \sum_{j=0}^{\infty}\sum_{k \in g^{-1}(j)}^{\infty}g(k)p_{X}(k) = \sum_{j=0}^{\infty}\sum_{k \in g^{-1}(j)}^{\infty}jp_{X}(k) = \sum_{j=0}^{\infty}j\sum_{k \in g^{-1}(j)}^{\infty}p_{X}(k) = \sum_{j=0}^{\infty}jP(g(X) = j) = E(g(X)).$

$\endgroup$ $\begingroup$

In the discrete case, page 14 of the MIT opencourseware lecture notes on "Fundamentals of Probability", lecture 6, provides a proof,

$$ \mathbb{E}[aX + bY] = \sum_{x,y}(ax + by)p_{X,Y}(x,y) \\ = \sum_x \sum_y (ax p_{X,Y}(x, y)) + \sum_x \sum_y(by p_{X,Y}(x,y))\\ = \sum_x \left( ax \sum_y (p_{X,Y}(x, y)) \right) + \sum_y \left( by \sum_x p_{X,Y}(x,y) \right) \\ = a\mathbb{E}[X] + b\mathbb{E}[Y] $$

$\endgroup$ $\begingroup$

For the second question, we want to show that $\text{Med}(aX+bY)$ might not equal $a \text{Med}(X)+ b \text{Med}(Y)$. Suppose $f(x)$ is the density function of $X$ and $g(y)$ is the density function of $Y$. Let $m_1$ be the median of $X$ and $m_2$ be the median of $Y$. Then $m_1$ is the value for which $\int_{0}^{m_1} f(x) \ dx = 0.5$. Likewise, $m_2$ is the value for which $\int_{0}^{m_2} g(y) \ dy = 0.5$. I guess you could use this fact to show that in general, $\text{Med}(aX+bY) \neq a \text{Med}(X)+ b \text{Med}(Y)$.

$\endgroup$ $\begingroup$

For part a. You can use the definition of E(X+Y), that is, use integral notation to represent it as (X+Y)the joint pdf of X and Y under the condition that X and Y are both continuous variable. Then you can split it as X joint pdf + Yjoint pdf, after that, you can arrange the order for each one, say, for Xjoint pdf, integrating dy and then dx will let you put x outside the inside integral. Follow this, the inside integral will give you the marginal pdf for X, by definition, the integral of x* marginal pdf of x will give you E(X).

$\endgroup$ $\begingroup$

I would venture that expectation being a linear operator is somewhat in the nature of an axiom—that each generalization to more complex sorts of distributions would have to satisfy this axiom to be considered sensible.

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy