M HYPE SPLASH
// news

Calculating uncertainty in standard deviation

By Sarah Scott
$\begingroup$

I have a distribution with literally an infinite number of potential data points. I need the standard deviation. I generate about a hundred points and take the standard deviation of the points. This gives a hopefully good approximation of the true standard deviation, but it won't, of course, be exact. How do I estimate the uncertainty in the standard deviation? This seems like a very basic question, but web searching hasn't provided any solution. If I missed it somehow, my apologies.

$\endgroup$ 4

4 Answers

$\begingroup$

If you want to find out the uncertainty or standard error (SE) in the standard deviation of a chosen sample, then you can simply use $SE(\sigma) = \frac{\sigma}{\sqrt{2N - 2}}$, where $N$ is the number of data points in your sample.

Hope that helps!

$\endgroup$ 1 $\begingroup$

the answer to OP's question depends on whether or not the mean of the distribution is known. if the mean is known ( for example if you know that the mean of you sampled population should eventually average out to be zero) than the problem is a little different, not by much but I did not do the research to find out to what extent, [4] might help. I am assuming the mean is not known.

so you have a sample of 100 values, for which you don't know the mean or variance. you can calculate the unbiased variance estimator:[1]$$S^2 = variance\ estimator = \frac{1}{n-1}\sum_i\left(x_i- \frac{\sum x}{n}\right)^2 = \frac{1}{n(n-1)}\sum_{i,j}\frac{(x_i-x_j)^2}{2}$$

but you also want to know how accurate this estimation of the sample variance is. so in other words you want the variance of the variance estimator. $Var\left(S^2\right)$this is shown in [2] to be:$$Var\left(S^2\right)=\frac{1}{n}\left(\mu_4-\frac{n-3}{n-1}\mu_2^2\right)$$

$$where\ \ \mu_k := E[(X-E[X])^k]$$($\mu_k$ are the centered moments) and so you get:$$\sigma^2:=\mu_2 = S^2 \pm \sqrt{\frac{1}{n}\left(\mu_4-\frac{n-3}{n-1}\mu_2^2\right)}$$but regrettably this is not given as a function of you're data points (it's a function of $\mu_4,\mu_2$ both of which are unknown), what you really want is an unbiased estimator for $Var\left(S^2\right)$. I couldn't completely find the right way to achieve this. unbiased estimators of nonlinear function are in general not easy to find (in this case I think it's probably impossible) so as far as I know you will have to deal with some bias. in attempt to minimise this bias you could just find good estimators for $\mu_4,\mu_2$, and plug them in to$\sqrt{\frac{1}{n}\left(\mu_4-\frac{n-3}{n-1}\mu_2^2\right)}$and ignore the bias that arises from the nonliniearity. the unbiased estimators for centered moments ($\mu_4,\mu_2$) are called the H-statistics, they are pretty easy to find online or in books and are not too complex to calculate. for my uses the H-statistic for $\mu_4$ is a pretty terrible expression [3], and as I already said, using it is not without bias, so what i decided to do was assume Xi are close enough to gaussian so that $\mu_4=3\mu_2^2$ and thus I got:$$Var\left(S^2\right)= \frac{1}{n}\left(\mu_4-\frac{n-3}{n-1}\mu_2^2\right)= \frac{1}{n}\left(3\mu_2^2-\frac{n-3}{n-1}\mu_2^2\right)= \frac{1}{n}\left(3-\frac{n-3}{n-1}\right)\mu_2^2= \frac{1}{n}\left(\frac{2n}{n-1}\right)\mu_2^2= \frac{2\mu_2^2}{n-1}$$

and so now (assuming $\mu_4=3\mu_2^2$):$$\sigma^2:=\mu_2 = S^2 \pm \sqrt{\frac{2}{n-1}} \sigma^2\approx S^2 \pm \sqrt{\frac{2}{n-1}} S^2$$

to finish up, OP asked for the uncertainty in S and not in $S^2$. so if you use propagation of uncertainty [5] to evaluate how the uncertainty is affected by taking the square root:

($SE$ stands for Standard Error)$$SE[\sqrt{Y}]\approx\frac{1}{2\sqrt{E[Y]}}SE[Y]$$$$\sigma = S \pm \frac{1}{2\sqrt{S^2}}\sqrt{\frac{2}{n-1}}S^2= S \pm \frac{S}{\sqrt{2n-2}}$$

which matches the other answers.

references:

[1] - A few properties of sample variance By Eric Benhamou

[2] - Variance of Simple Variance By Eungchun Cho & Moon Jung Cho

[3] - WolframMathWorld h-Staatistic

[4] - StatLect Point estimation of the variance

[5] - Wikipedia Propagation of uncertainty 26/09/2020

$\endgroup$ 2 $\begingroup$

If you're allowed to take that sample repeatedly, it's basically bootstrapping.

Procedure:

  1. Draw 100 points

  2. Calculate standard deviation

  3. Repeat Steps 1 & 2 a lot of times (empirically, I've found 5-10,000 to be enough), keeping track of the results of step 2.

  4. Examine the distribution of estimates from Step 2 with whatever tools you'd like -- histograms, sample moments, etc.

$\endgroup$ $\begingroup$

This is pretty standard and can be answered by searching "Confidence interval of a standard deviation." Here are the steps:

Step 1) Pick a confidence level. The confidence level is the probability of your interval estimate containing the actual population standard deviation. Common choices for confidence levels are 90%, 95%, 99%. I'll work through the steps for a 90% confidence interval.

Step 2) Use a chi-squared distribution to find the left and right critical values $\chi^2_L, \chi^2_R$ for your chosen confidence level. The degrees of freedom are the sample size minus one, in this case, $99$. For your example, the critical values for 90% confidence would be approximately $\chi^2_L = 77.93$, $\chi^2_R = 124.32$

Step 3) Use your sample standard deviation $s$ and sample size $n$ to find the left and right endpoints of the confidence interval for the population standard deviation $\sigma$ via the formula: $$s\sqrt{ \frac{n-1}{\chi^2_R}} < \sigma < s\sqrt{ \frac{n-1}{\chi^2_L}}.$$ In your example, whatever your value for $s$ was, you can be 90% confident that the true value of $\sigma$ is between $s \sqrt{ \frac{99}{124.32}} = 0.892s$ on the low end, and $s \sqrt{ \frac{99}{77.93}} = 1.127s$ on the high end.

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy