The Derivative of a Linear Operator
Why is the derivative (d/dx) thought of as a linear operator instead of a function of functions?
if we take the derivative of some function f(x) (d/dx(f(x))), then we get a new function f’(x).
This makes me think that the d/dx is a mapping of one set of functions to another set of functions.
However d/dx is considered to be a linear operator. If I understand this correctly, that means we have to convert the function we are taking the derivative of into a vector that represents it. The linear operator then maps the vector to another vector which represents a new polynomial.
Why do we do this? It seems overly complicated, like we’re adding steps they don’t need to be there. Is there some reason we can’t just consider d/dx to be a function that maps one set of functions to another?
$\endgroup$3 Answers
$\begingroup$"Operator", "map", and "transformation" are all words we use to speak about functions in particular settings. Yes, differentiation is a function from a set of (nice enough) functions to a set of functions. For instance, a function from polynomials with real coefficients to polynomials with real coefficients.
"Convert into a vector that represents it" is a phrase that doesn't really make sense. Functions are (often) vectors. The set of functions from $\Bbb R$ to $\Bbb R$ with addition and scalar multiplication defined the usual way is a vector space, and all functions in that space are vectors. The subset of those functions which are differentiable is also a vector space (a subspace).
The fact that we call it a linear operator carries implications about how it behaves with respect to addition and multiplications by constants. It is still at its core a function, in much the same way a square is a rectangle.
We mathematicians often put different names to the same things. Some times because it's valuable to have a conceptual distinction in the absence of a formal one, some times just because of conventions dating decades or centuries back. Some times the fact that things with different names are the same (or very close to it) is an important theorem (like the fundamental theorem of calculus: integration is antidifferentiation). In this case, my guess is that it's a mix of the first two.
$\endgroup$ 4 $\begingroup$In a more general setting, when we are dealing with functions between normed spaces: $f: (V_1, ||.||_1) \to (V_2, ||.||_2)$, then the (Fréchet) derivative of $f$ at a point $a \in V_1$ is a linear operator $A: (V_1, ||.||_1) \to (V_2, ||.||_2)$ if$$\lim_{||h||_1 \to 0} \frac{||f(a+h)-f(a)-A(h)||_2}{||h||_1}=0$$Applying this definition to the "casual" functions (i.e. $\mathbb{R} \to \mathbb{R}$, with the absolute value norm), we will get back something like the "casual" derivative. For example, take $f: x \mapsto x^2$, and let $a \mathbb{R}$. Then the Fréchet derivative of $f$ at $a$ is $A: h \mapsto 2ah$, because$$(a+h)^2-a^2=2ah+h^2$$So$$\frac{(a+h)^2-a^2-2ah}{h}=\frac{h^2}{h}=h\to 0$$
$\endgroup$ $\begingroup$Are you comfortable with the derivative when the target space is $\mathbb R$ rather than $\mathbb R^m$? That is, if $f: \mathbb R^n \rightarrow \mathbb R$ is a function, then the derivative of $f$ is obtained by combining the the derivatives of the $n$ maps
$$x_i \mapsto f(x_1, ... , x_{i-1}, x_i, x_{i+1}, ... , x_n)$$where each of these maps is from $\mathbb R$ to $\mathbb R$. In other words, the derivative of $f$ is represented by a row vector
$$D(f) = \begin{pmatrix} \frac{\partial f}{\partial x_1} & \cdots &\frac{\partial f}{\partial x_n} \end{pmatrix} \tag{1} $$
In the general case, when $f$ is a function from $\mathbb R^n$ to $\mathbb R^m$, we may write
$$f(x_1, ... , x_n) = (f_1(x_1, ... , x_n), ... , f_m(x_1, ... , x_n))$$
for $m$ functions $f_i: \mathbb R^n \rightarrow \mathbb R$. The derivative of $f$ is then obtained by combining these $m$ rows as in (1):
$$D(f) := \begin{pmatrix} D(f_1) & D(f_2) & \cdots & D(f_m) \end{pmatrix}$$
This $m$ by $n$ matrix of course represents, for each point $x \in \mathbb R^n$, a linear transformation (as all matrices do), and when $f$ is nice enough, this linear transformation is a good linear approximation of $f$ at a given point.
The usual definition of the derivative of $f$ at a point as a linear map from $\mathbb R^n$ to $\mathbb R^m$ is made to avoid the use of explicit coordinates.
$\endgroup$