Local quadratic approximation
I wanted to implement some penalized regression parameter estimation algorithm by Fan&Li (, section 3.3, [1]), but cannot catch the idea of some details. More specifically, I don't quite understand how local quadratic approximation is derived.
Assume that $p(x)$ is concave penalty function s.t. $p(0)=0$ and $p$ is not differentiable at origin. It is stated that given initial value $x_0$ we can approximate
$\left[p(|x|)\right]'=p'(|x|)\text{sgn}(x)\approx \lbrace{p'(|x_0|)/|x_0|\rbrace}x$, $(1)$
when $x\neq 0$ (actually it should be that both $x,x_0\neq 0$, shouldn't it?) and the approximation leads to
$p(|x|)\approx p(|x_0|)+\tfrac{1}{2}\lbrace{p'(|x_0|)/|x_0|)\rbrace}(x^2-x_0^2)$, for $x\approx x_0$. $(2)$
I think that $(1)$ follows from assumption that if $x\approx x_0$ then
$p'(|x|)\text{sgn}(x)/x\approx p'(|x_0|)/|x_0|$.
But how one gets to $(2)$? Why the second term in Taylor approximation is lost and how is second derivative approximated?
[1]: Jianqing Fan, Runze Li. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, December 2001, vol 96 no 456, pp.1348-1360.
$\endgroup$ 01 Answer
$\begingroup$(1) follows from a case analysis, according to whether $x$ is positive or negative: thus $$\text{sign}(x) = {x \over |x|}$$ if $x \ne 0$. Applying this, we find $$p'(|x|) \cdot \text{sign}(x) \approx p'(|x|) \times {x \over |x|} \approx p'(|x_0|) \times {x \over |x_0|},$$ which justifies (1).
(2) follows from a first-order Taylor series approximation for $x^2-x_0^2$. Notice that if $x \approx x_0$, we have $$x^2 - x_0^2 \approx 2 (x-x_0) x_0.$$ (Why? Because when $\epsilon \approx 0$, we have $(x_0+\epsilon)^2 \approx x_0^2 + 2 \epsilon x_0$, thus $(x_0 + \epsilon)^2 - x_0^2 \approx 2 \epsilon x_0$.) In other words, $$x-x_0 \approx {x^2 - x_0^2 \over 2x_0} \approx {x^2 - x_0^2 \over 2x}.$$ Now we can use this to derive (2). By a Taylor series expansion, we know $$p(|x|) \approx p(|x_0|) + (x-x_0) \cdot p'(|x_0|) \cdot \text{sign}(x_0).$$ Plug in (1), and we find $$p(|x|) \approx p(|x_0|) + (x-x_0) \times p'(|x_0|) \times {x \over |x_0|}.$$ Finally, plug in our approximation for $x-x_0$ and find $$p(|x|) \approx p(|x_0|) + {x^2 - x_0^2 \over 2x} \times p'(|x_0|) \times {x \over |x_0|},$$ which simplifies to $$p(|x|) \approx p(|x_0|) + {x^2 - x_0^2 \over 2 |x_0|} \times p'(|x_0|),$$ which matches what you listed in (2).
In other words, we're still approximating $p(|x|)$ using only a first-order Taylor series approximation, not a second-order Taylor series approximation.
Yes, it's ugly, and there's probably a cleaner derivation.
$\endgroup$