M HYPE SPLASH
// news

How to understand $E(XY)$ intuitively

By Emily Wilson
$\begingroup$

I have no trouble understanding $\displaystyle E(X)=\int xf(x)\,dx $ and $\displaystyle E(Y)=\int y f(y)\,dy$

As each $x$ multiplies the corresponding $f(x)$ and we take the integral of it to calculate the sum. (same as $y$)

However when it comes to $E(XY)$, the formula becomes $\displaystyle E(XY) = \iint xy f_{X,Y}(x,y)\,dxdy$.

I cant seem to wrap my head around when there are more than one random variable. Is there any way to understand it intuitively?

$\endgroup$ 2

3 Answers

$\begingroup$

I have no trouble understanding $E(X)=\int xf(x)dx $ and $E(Y)=\int y f(y)dy$

Actually, your formulas should read \begin{align} E[X] &= \int_{-\infty}^\infty x f_X(x)\,\mathrm dx = \int_{-\infty}^\infty t f_X(t)\,\mathrm dt \tag{1}\\ E[Y] &= \int_{-\infty}^\infty y f_Y(y)\,\mathrm dy = \int_{-\infty}^\infty u f_Y(u)\,\mathrm du\tag{2} \end{align} where you need to understand that the $f_X(\cdot)$ in $(1)$ is not the same as the $f_Y(\cdot)$ in $(2)$, and the second integral in each case is written to emphasize a fact that is often forgotten by beginners: what letter we use to denote the variable of integration is unimportant: an $X$ does not need to be associated with an $x$ or a $Y$ with a $y$.

Turning to the question asked, you are generalizing the wrong definition/formula when you jump from $(1)$ and $(2)$ to $E(XY) = \int \int xy f_{X,Y}(x,y)dxdy$. Since you are willing to accept $(1)$ as the definition of $E[X]$, consider a random variable $Z = g(X)$ where $g(\cdot)$ is some real function of a real variable. What is $E[Z]$? Well, the density of $Z$ is $f_Z(z)$ and so applying the definition, we can immediately say that $$E[Z] = \int_{-\infty}^\infty z f_Z(z)\,\mathrm dz = \int_{-\infty}^\infty v f_Z(v)\,\mathrm dv \tag{3}$$ In order to compute $E[Z]$ via $(3)$, we need to know $f_Z(z)$ or to first determine $f_Z(z)$ from the knowledge that $Z = g(X)$ and the known $f_X(x)$. Fortunately, the two-step process of first finding $f_Z(z)$ and then carrying out the integration in $(3)$ can be combined into a single operation via a result known as the Law of the Unconscious Statistician (LOTUS for short) which asserts that the value of $E[Z]$ which is, by definition, the value of the integral on the right side of $(3)$ happens to equal the value of a different integral:

$$\text{LOTUS:}\qquad E[Z] = E[g(X)] = \int_{-\infty}^\infty g(x)f_X(x)\,\mathrm dx.\tag{4}$$

This is a theorem that can be proved (and the proof is not an easy exercise that can be left to a beginner at this level of explanation to write up for him/herself). The somewhat pejorative adjective unconscious in the name of the theorem is because some people take $(4)$ to be the definition of $E[g(X)]$ quite unconscious of the fact that there already exists a standard definition (viz. $(3)$) of $E[g(X)]=E[Z]$ and that the assertion that the right sides of $(3)$ and $(4)$ have the same value is something that must be proven.


With that as prologue, the formula that you should generalize is $(4)$ and not $(1)$. If $X$ and $Y$ are jointly continuous random variables with density function $f_{X,Y}(x,y)$, then gLOTUS says that for any real-valued function $h(x,y)$ of two variables, the expected value of $Z = h(X,Y)$ happens to equal the value of a double integral, viz.,$$\text{gLOTUS:}~ E[h(X,Y)] = \int_{-\infty}^\infty \int_{-\infty}^\infty h(x,y)f_{X,Y}(x,y) \,\mathrm dx\, \mathrm dy = \int_{-\infty}^\infty \int_{-\infty}^\infty h(x,y)f_{X,Y}(x,y) \,\mathrm dy\, \mathrm dx.\tag{5}$$ As special cases of $(5)$, note that if $h(X,Y) = X$ is aprojection map, we have that \begin{align} E[h(X,Y)] &= E[X]\\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty xf_{X,Y}(x,y) \,\mathrm dy\, \mathrm dx\\ &= \int_{-\infty}^\infty x\cdot \left[\int_{-\infty}^\infty f_{X,Y}(x,y) \,\mathrm dy\right]\, \mathrm dx\tag{6}\\ &= \int_{-\infty}^\infty x f_X(x)\,\mathrm dx \tag{7} \end{align} where the last step follows upon recognizing the inner integral in $(6)$ as the one that is used to find the marginal density $f_X(x)$ of $X$ from the joint density $f_{X,Y})x,y)$, and similarly for $h(X,Y) = Y$.

Applying gLOTUS to $h(X,Y) = XY$, we have the result that puzzles you:

$$E[XY] = \int_{-\infty}^\infty \int_{-\infty}^\infty xyf_{X,Y}(x,y) \,\mathrm dy\, \mathrm dx = \int_{-\infty}^\infty \int_{-\infty}^\infty xyf_{X,Y}(x,y) \,\mathrm dx\, \mathrm dy\tag{8}$$

Yes, $XY=Z$ is a random variable $Z$ in its own right and thedefinition of $E[Z] = E[XY]$ is just $(3)$, but gLOTUS allows us to bypass the step of pre-computing $f_Z(z)$ but use instead the double integral in $(8)$.


In writing the above, I have avoided much of the fine details in the other answers in favor of a broad-brush approach that might provide more intuition as to why $(6)$ is correct.

$\endgroup$ 4 $\begingroup$

I guess "intuitively" means without using the term measure, etc?

Firstly, $X$ is a function on a set of event $\Omega$, and E(X) means $\int_{\Omega}X(\omega)\,{\rm d}P(\omega)$ by definition. $P$ is the probability measure (sorry, just here) we consider. The important thing is that you integrate over $\Omega$.

Now, you wrote $E(X)=\int xf(x)\,{\rm d}x$. In general, for a nice function $h$ you have $E(h(X))=\int h(x)f(x)\,{\rm d}x$ Note that now you are integrating over $\mathbb{R}$, the range of $X$. This is a formula one must show, typically an exercise. (For your future interest: actually one shows $E(h(X))=\int h(x)\,{\rm d}\mu_X(x)$, where ${\rm d}\mu_X(x)$ is the distribution of $X$, and do some discussions on the density $f(x)$)

Consider $E(XY)$. Again we have the formula of the same type, $E(XY)=\int_{\Omega}X(\omega)Y(\omega)\,{\rm d}P(\omega)$.

Finally, we want to obtain the formula for $E(XY)$. It is important to note $XY$ is actually a random variable constructed as $g(X,Y)$ with $g(x,y):=xy$. They are actually a $\mathbb{R}^2$-valued random variable plugged into a deterministic $\mathbb{R}$-valued function. Accordingly, when we want to go to "the world of the range of the random variable", we consider the range of the random variables $(X,Y)$. What we have is $$ E(g(X,Y))=\int_{\Omega}g(X(\omega),Y(\omega))\,{\rm d}P(\omega)=\int_{\mathbb{R}^2}g(x,y)f_{XY}(x,y)dxdy, $$ (if $(X,Y)$ has the density). In particular, $$ E(XY)=\int_{\mathbb{R}^2}xyf_{XY}(x,y)dxdy. $$ The $f_{XY}$ part incorporates the "dependence" of $X,Y$. Again, we are treating $(X,Y)$ as a single $\mathbb{R}^2$-valued random variable, not $X$ and $Y$ separately. It might be helpful to consider the case $X$ and $Y$ are independent. In this case you have $f_{XY}(x,y)=f_X(x)f_Y(y)$, so it simply the integration in each variable.

$\endgroup$ 4 $\begingroup$

Please do look at the discrete example in Wikipedia; here is a little more in direct answer to your question.

Consider tossing two fair dice to get realizations of independent random variables $X$ and $Y$. Most games use the sum of the numbers $X + Y$. It is easy to see that $E(X) = E(Y) = 3.5$ and $$E(X+Y) = E(X) + E(Y) = 3.5 + 3.5 = 7.$$

However, it would be possible to invent a game based on $XY$. Then, using independence, we have $$E(XY) = \sum_x \sum_y xyf_{X,y}(x,y) = \sum_x \sum_y xyf_X(x)f_Y(y) = \sum_x xf_X(x)\sum_y yf_Y(y) = E(X)E(Y) = 3.5^2 = 12.25.$$

Here is a simple demo in R based on a million rolls of 2 dice:

 m = 10^6; xy = numeric(m) for (i in 1:m) { dice = sample(1:6, 2, repl=T) xy[i] = prod(dice)} mean(xy) ## 12.25026

For a very similar example where $X$ and $Y$ are not independent, consider drawing two balls WITHOUT replacement from an urn with balls numbered from 1 through 6 to get realizations of random variables $X$ and $Y$. It still makes sense to consider the random variable $XY$ and its expectation $E(XY).$ There are 30 points in the sample space, each with probability 1/30, and each with its value of $XY$. Summing all 30 values divided by 1/30, you could get $E(XY).$

 Table of values of XY X: 1 2 3 4 5 6 Y 1: 2 3 4 5 6 2: 2 6 8 10 12 3: 3 6 12 15 18 4: 4 8 12 20 24 5: 5 10 15 20 30 6: 6 12 18 24 30

Here $E(X) = E(Y) = 3.5$ again, but $E(XY) = 11.67 \ne E(X)E(Y).$ The covariance of $X$ and $Y$ is $Cov(X,Y) = E(XY) - E(X)E(Y) < 0.$ The covariance is negative, roughly speaking because diagonal outcomes, including $(1,1)$ and $(6,6),$ are not possible in sampling without replacement.

An analogous simulation in R, using (default) sampling without replacement.

 m = 10^6; xy = x = y = numeric(m) for (i in 1:m) { dice = sample(1:6, 2) xy[i] = prod(dice) x[i] = dice[1]; y[i] = dice[2] } mean(x); mean(y); mean(xy); cov(x,y) ## 3.499053 # approx E(X) ## 3.500338 # approx E(Y) ## 11.66516 # apporx E(XY) ## -0.5827058 # approx Cov(X, Y)
$\endgroup$ 6

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy