Finding the regression line given the mean, correlation and standard deviation of $x$ and $y$.
So we have $100$ observations for $(x, y)$. The mean of $x$ is $1.06$, and for $y$ it is $3$. The standard deviation is $0.52$ for $x$ and for $y$ it is $1.13$. the correlation between $x$ and $y$ is $0.89$.
In the question we are told to:
• Estimate the linear regression line of the regression of $Y$ on $X$ and the standard deviation of the errors.
• estimate the regression line when we regress $X$ as dependent variable on $Y$ and obtain an estimate of the standard deviation of the errors.
• Are the two regression lines the same? If not, then explain why not.
• For the regression of $Y$ on $X$, suppose that we wish to predict the dependent variable $y$ at $x = x^* = 0.7$. Obtain the prediction, as well as the standard error of the prediction.
• Obtain the standard deviation of the prediction error and hence obtain a $95\%$ prediction interval for $y$ for the the given $x = x*.$
Now I thought we were supposed to generate $100$ points of data assuming $x$ and $y$ had a normal distribution with the given means and standard deviations, and then use stata to regress and find the prediction interval, etc
But I was told this was not the case by the lecturer, and was wondering if there was a way to solve this another way? I'm thinking some kind of derivation/calculations using the above info, but I have no idea where to start.
$\endgroup$ 02 Answers
$\begingroup$I've found estimates for $B_1$ and $B_0$ from modifying the formula used for their estimation; the numerator can be turned into $n \times cov(x,y)$; and we can find $cov(x,y)$ given $corr(x,y)$ and std of x and y.
Problem now is how to find the standard deviation of the errors and the prediction errors.
$\endgroup$ $\begingroup$Using matrix notation, you get
$ \hat{\beta} = (X'X)^{-1}X'Y \\ Var(\hat{\beta}) =(X'X)^{-1}X'\sigma_{y}^{2}X(X'X)^{-1} = (X'X)^{-1}\sigma_{y}^{2}$
So for the simple linear regression, this will be
\begin{align} X &= (j, x) \quad X'=(j,x)'\\ X'X &= \begin{bmatrix} n & \sum_{k=1}^{n} x_{k} \\ \sum_{k=1}^{n} x_{k} & \sum_{k=1}^{n} x_{k}^{2} \\ \end{bmatrix} \\ (X'X)^{-1}\sigma_{y}^{2} &= \begin{bmatrix} \sum_{k=1}^{n} x_{k}^{2} & -\sum_{k=1}^{n} x_{k} \\ -\sum_{k=1}^{n} x_{k} & n \\ \end{bmatrix}\frac{\sigma_{y}^{2}}{n\sum_{k=1}^{n} x_{k}^{2} -(\sum_{k=1}^{n} x_{k})^{2}} \\&= \begin{bmatrix} \sum_{k=1}^{n} x_{k}^{2} & -\sum_{k=1}^{n} x_{k} \\ -\sum_{k=1}^{n} x_{k} & n \\ \end{bmatrix}\frac{\sigma_{y}^{2}}{n\sum_{k=1}^{n} x_{k}^{2} -(n\bar{X})^{2}} \\ \quad \\ &= \begin{bmatrix} \frac{\sigma_{y}^{2}\sum_{k=1}^{n} x_{k}^{2}}{S^{2}_{X}} & -\frac{\sum_{k=1}^{n} x_{k}}{nS^{2}_{X}} \\ -\frac{\sum_{k=1}^{n} x_{k}}{nS^{2}_{X}} & \frac{\sigma_{y}^{2}}{S^{2}_{X}} \\ \end{bmatrix}\\ &=\begin{bmatrix} \frac{\sigma_{y}^{2}(S^{2}_{X}+n\bar{X}^{2})}{nS^{2}_{X}} & -\frac{\sum_{k=1}^{n} x_{k}}{nS^{2}_{X}} \\ -\frac{\sum_{k=1}^{n} x_{k}}{nS^{2}_{X}} & \frac{\sigma_{y}^{2}}{S^{2}_{X}} \\ \end{bmatrix}\\ \end{align}
So if you have the s.d. for X, you just need to find $\sum_{k=1}^{n} x_{k}^{2} $ and you will have everything needed to calculate the estimate variance.
Just manipulate the variance formula a little to get it: $\sum_{k=1}^{n} x_{k}^{2} = \sigma_{X}^{2} +n\bar{X}^{2}$
If you are predicting in-sample, you get $\frac{\sum_{k=1}^{n} x_{k}^{2}}{S^{2}_{X}} + \frac{\sigma^{2}_{Y}}{n} $
If you are predicting beyond sample range, you get $\frac{\sum_{k=1}^{n} x_{k}^{2}}{S^{2}_{X}} + \frac{\sigma^{2}_{Y}}{n} + \sigma^{2}_{Y} $
$\endgroup$