# Total derivative

In mathematics, the total derivative of a function ${\displaystyle f}$ is the best linear approximation of the value of the function with respect to its arguments. Unlike partial derivatives, the total derivative approximates the function with respect to all of its arguments, not just a single one. In many situations, this is the same as considering all partial derivatives simultaneously. The term "total derivative" is primarily used when ${\displaystyle f}$ is a function of several variables, because when ${\displaystyle f}$ is a function of a single variable, the total derivative is the same as the derivative of the function.[1]:198–203

"Total derivative" is sometimes also used as a synonym for the material derivative in fluid mechanics.

## The total derivative as a linear map

Let ${\displaystyle U\subseteq \mathbf {R} ^{n}}$  be an open subset. Then a function ${\displaystyle f:U\rightarrow \mathbf {R} ^{m}}$  is said to be (totally) differentiable at a point ${\displaystyle a\in U}$  if there exists a linear transformation ${\displaystyle df_{a}:\mathbf {R} ^{n}\rightarrow \mathbf {R} ^{m}}$  such that

${\displaystyle \lim _{x\rightarrow a}{\frac {\|f(x)-f(a)-df_{a}(x-a)\|}{\|x-a\|}}=0.}$

The linear map ${\displaystyle df_{a}}$  is called the (total) derivative or (total) differential of ${\displaystyle f}$  at ${\displaystyle a}$ . Other notations for the total derivative include ${\displaystyle D_{a}f}$  and ${\displaystyle Df(a)}$ . A function is (totally) differentiable if its total derivative exists at every point in its domain.

Conceptually, the definition of the total derivative expresses the idea that ${\displaystyle df_{a}}$  is the best linear approximation to ${\displaystyle f}$  at the point ${\displaystyle a}$ . This can be made precise by quantifying the error in the linear approximation determined by ${\displaystyle df_{a}}$ . To do so, write

${\displaystyle f(a+h)=f(a)+df_{a}(h)+\varepsilon (h),}$

where ${\displaystyle \varepsilon (h)}$  equals the error in the approximation. To say that the derivative of ${\displaystyle f}$  at ${\displaystyle a}$  is ${\displaystyle df_{a}}$  is equivalent to the statement

${\displaystyle \varepsilon (h)=o(\lVert h\rVert ),}$

where ${\displaystyle o}$  is little-o notation and indicates that ${\displaystyle \varepsilon (h)}$  is much smaller than ${\displaystyle \lVert h\rVert }$  as ${\displaystyle h\to 0}$ . The total derivative ${\displaystyle df_{a}}$  is the unique linear transformation for which the error term is this small, and this is the sense in which it is the best linear approximation to ${\displaystyle f}$ .

The function ${\displaystyle f}$  is differentiable if and only if each of its components ${\displaystyle f_{i}\colon U\to \mathbf {R} }$  is differentiable, so when studying total derivatives, it is often possible to work one coordinate at a time in the codomain. However, the same is not true of the coordinates in the domain. It is true that if ${\displaystyle f}$  is differentiable at ${\displaystyle a}$ , then each partial derivative ${\displaystyle \partial f/\partial x_{i}}$  exists at ${\displaystyle a}$ . The converse is false: It can happen that all of the partial derivatives of ${\displaystyle f}$  at ${\displaystyle a}$  exist, but ${\displaystyle f}$  is not differentiable at ${\displaystyle a}$ . This means that the function is very "rough" at ${\displaystyle a}$ , to such an extreme that its behavior cannot be adequately described by its behavior in the coordinate directions. When ${\displaystyle f}$  is not so rough, this cannot happen. More precisely, if all the partial derivatives of ${\displaystyle f}$  at ${\displaystyle a}$  exist and are continuous in a neighborhood of ${\displaystyle a}$ , then ${\displaystyle f}$  is differentiable at ${\displaystyle a}$ . When this happens, then in addition, the total derivative of ${\displaystyle f}$  is the linear transformation corresponding to the Jacobian matrix of partial derivatives at that point.[2]

## The total derivative as a differential form

When the function under consideration is real-valued, the total derivative can be recast using differential forms. For example, suppose that ${\displaystyle f\colon \mathbf {R} ^{n}\to \mathbf {R} }$  is a differentiable function of variables ${\displaystyle x_{1},\ldots ,x_{n}}$ . The total derivative of ${\displaystyle f}$  at ${\displaystyle a}$  may be written in terms of its Jacobian matrix, which in this instance simplifies to the gradient:

${\displaystyle df_{a}={\begin{pmatrix}{\frac {\partial f}{\partial x_{1}}},&\cdots &,&{\frac {\partial f}{\partial x_{n}}}\end{pmatrix}}.}$

The linear approximation property of the total derivative implies that if

${\displaystyle \Delta x={\begin{pmatrix}\Delta x_{1},&\cdots &,&\Delta x_{n}\end{pmatrix}}^{T}}$

is a small vector (where the ${\displaystyle T}$  denotes transpose, so that this vector is a column vector), then

${\displaystyle f(a+\Delta x)-f(a)\approx df_{a}(\Delta x)=\sum _{i=1}^{n}{\frac {\partial f}{\partial x_{i}}}\Delta x_{i}.}$

Heuristically, this suggests that if ${\displaystyle dx_{1},\ldots ,dx_{n}}$  are infinitesimal increments in the coordinate directions, then

${\displaystyle df_{a}(x)=\sum _{i=1}^{n}{\frac {\partial f}{\partial x_{i}}}(a)dx_{i}.}$

The theory of differential forms is one way to give a precise meaning to infinitesimal increments such as ${\displaystyle dx_{i}}$ . In this theory, ${\displaystyle dx_{i}}$  is a linear functional on the vector space ${\displaystyle \mathbf {R} ^{n}}$ . Evaluating ${\displaystyle dx_{i}}$  at a vector ${\displaystyle h}$  in ${\displaystyle \mathbf {R} ^{n}}$  measures how much ${\displaystyle h}$  points in the ${\displaystyle i}$ th coordinate direction. The total derivative ${\displaystyle df_{a}}$  is a linear combination of linear functionals and hence is itself a linear functional. The evaluation ${\displaystyle df_{a}(h)}$  measures how much ${\displaystyle h}$  points in the direction determined by ${\displaystyle f}$  at ${\displaystyle a}$ , and this direction is the gradient. This point of view makes the total derivative an instance of the exterior derivative.

Suppose now that ${\displaystyle f}$  is a vector-valued function, that is, ${\displaystyle f\colon \mathbf {R} ^{n}\to \mathbf {R} ^{m}}$ . In this case, the components ${\displaystyle f_{i}}$  of ${\displaystyle f}$  are real-valued functions, so they have associated differential forms ${\displaystyle df_{i}}$ . The total derivative ${\displaystyle df}$  amalgamates these forms into a single object and is therefore an instance of a vector-valued differential form.

## The chain rule for total derivatives

The chain rule has a particularly elegant statement in terms of total derivatives. It says that, for two functions ${\displaystyle f}$  and ${\displaystyle g}$ , the total derivative of the composite ${\displaystyle g\circ f}$  at ${\displaystyle a}$  satisfies

${\displaystyle d(g\circ f)_{a}=dg_{f(a)}\circ df_{a}.}$

If the total derivatives of ${\displaystyle f}$  and ${\displaystyle g}$  are identified with their Jacobian matrices, then the composite on the right-hand side is simply matrix multiplication. This is enormously useful in applications, as it makes it possible to account for essentially arbitrary dependencies among the arguments of a composite function.

### Example: Differentiation with direct dependencies

Suppose that f is a function of two variables, x and y. If these two variables are independent, so that the domain of f is ${\displaystyle \mathbf {R} ^{2}}$ , then the behavior of f may be understood in terms of its partial derivatives in the x and y directions. However, in some situations, x and y may be dependent. For example, it might happen that f is constrained to a curve ${\displaystyle y=y(x)}$ . In this case, we are actually interested in the behavior of the composite function ${\displaystyle f(x,y(x))}$ . The partial derivative of f with respect to x does not give the true rate of change of f with respect to changing x because changing x necessarily changes y. However, the chain rule for the total derivative takes such dependencies into account. Write ${\displaystyle \gamma (x)=(x,y(x))}$ . Then, the chain rule says

${\displaystyle d(f\circ \gamma )_{x_{0}}=df_{(x_{0},y(x_{0}))}\circ d\gamma _{x_{0}}.}$

By expressing the total derivative using Jacobian matrices, this becomes:

${\displaystyle {\frac {df(x,y(x))}{dx}}(x_{0})={\frac {\partial f}{\partial x}}(x_{0},y(x_{0}))\cdot {\frac {\partial x}{\partial x}}(x_{0})+{\frac {\partial f}{\partial y}}(x_{0},y(x_{0}))\cdot {\frac {\partial y}{\partial x}}(x_{0}).}$

Suppressing the evaluation at ${\displaystyle x_{0}}$  for legibility, we may also write this as

${\displaystyle {\frac {df(x,y(x))}{dx}}={\frac {\partial f}{\partial x}}{\frac {\partial x}{\partial x}}+{\frac {\partial f}{\partial y}}{\frac {\partial y}{\partial x}}.}$

This gives a straightforward formula for the derivative of ${\displaystyle f(x,y(x))}$  in terms of the partial derivatives of ${\displaystyle f}$  and the derivative of ${\displaystyle y(x)}$ .

For example, suppose

${\displaystyle f(x,y)=xy.}$

The rate of change of f with respect to x is usually the partial derivative of f with respect to x; in this case,

${\displaystyle {\frac {\partial f}{\partial x}}=y.}$

However, if y depends on x, the partial derivative does not give the true rate of change of f as x changes because the partial derivative assumes that y is fixed. Suppose we are constrained to the line

${\displaystyle y=x.}$

Then

${\displaystyle f(x,y)=f(x,x)=x^{2},}$

and the total derivative of f with respect to x is

${\displaystyle {\frac {df}{dx}}=2x,}$

which we see is not equal to the partial derivative ${\displaystyle \partial f/\partial x}$ . Instead of immediately substituting for y in terms of x, however, we can also use the chain rule as above:

${\displaystyle {\frac {df}{dx}}={\frac {\partial f}{\partial x}}+{\frac {\partial f}{\partial y}}{\frac {dy}{dx}}=y+x\cdot 1=x+y=2x.}$

### Example: Differentiation with indirect dependencies

While one can often perform substitutions to eliminate indirect dependencies, the chain rule provides for a more efficient and general technique. Suppose ${\displaystyle L(t,x_{1},\dots ,x_{n})}$  is a function of time ${\displaystyle t}$  and ${\displaystyle n}$  variables ${\displaystyle x_{i}}$  which themselves depend on time. Then, the time derivative of ${\displaystyle L}$  is

${\displaystyle {\frac {dL}{dt}}={\frac {d}{dt}}L{\bigl (}t,x_{1}(t),\ldots ,x_{n}(t){\bigr )}.}$

The chain rule expresses this derivative in terms of the partial derivatives of ${\displaystyle L}$  and the time derivatives of the functions ${\displaystyle x_{i}}$ :

${\displaystyle {\frac {dL}{dt}}={\frac {\partial L}{\partial t}}+\sum _{i=1}^{n}{\frac {\partial L}{\partial x_{i}}}{\frac {dx_{i}}{dt}}={\biggl (}{\frac {\partial }{\partial t}}+\sum _{i=1}^{n}{\frac {dx_{i}}{dt}}{\frac {\partial }{\partial x_{i}}}{\biggr )}(L).}$

This expression is often used in physics for a gauge transformation of the Lagrangian, as two Lagrangians that differ only by the total time derivative of a function of time and the ${\displaystyle n}$  generalized coordinates lead to the same equations of motion. An interesting example concerns the resolution of causality concerning the Wheeler–Feynman time-symmetric theory. The operator in brackets (in the final expression above) is also called the total derivative operator (with respect to ${\displaystyle t}$ ).

For example, the total derivative of ${\displaystyle f(x(t),y(t))}$  is

${\displaystyle {\frac {df}{dt}}={\partial f \over \partial x}{dx \over dt}+{\partial f \over \partial y}{dy \over dt}.}$

Here there is no ${\displaystyle \partial f/\partial t}$  term since ${\displaystyle f}$  itself does not depend on the independent variable ${\displaystyle t}$  directly.

## Total differential equation

A total differential equation is a differential equation expressed in terms of total derivatives. Since the exterior derivative is coordinate-free, in a sense that can be given a technical meaning, such equations are intrinsic and geometric.

## Application to equation systems

In economics, it is common for the total derivative to arise in the context of a system of equations.[1]:pp. 217–220 For example, a simple supply-demand system might specify the quantity q of a product demanded as a function D of its price p and consumers' income I, the latter being an exogenous variable, and might specify the quantity supplied by producers as a function S of its price and two exogenous resource cost variables r and w. The resulting system of equations

${\displaystyle q=D(p,I),}$
${\displaystyle q=S(p,r,w),}$

determines the market equilibrium values of the variables p and q. The total derivative ${\displaystyle dp/dr}$  of p with respect to r, for example, gives the sign and magnitude of the reaction of the market price to the exogenous variable r. In the indicated system, there are a total of six possible total derivatives, also known in this context as comparative static derivatives: dp / dr, dp / dw, dp / dI, dq / dr, dq / dw, and dq / dI. The total derivatives are found by totally differentiating the system of equations, dividing through by, say dr, treating dq / dr and dp / dr as the unknowns, setting dI = dw = 0, and solving the two totally differentiated equations simultaneously, typically by using Cramer's rule.

## References

1. ^ a b Chiang, Alpha C. (1984). Fundamental Methods of Mathematical Economics (Third ed.). McGraw-Hill. ISBN 0-07-010813-7.
2. ^ Abraham, Ralph; Marsden, J. E.; Ratiu, Tudor (2012). Manifolds, Tensor Analysis, and Applications. Springer Science & Business Media. p. 78.
• A. D. Polyanin and V. F. Zaitsev, Handbook of Exact Solutions for Ordinary Differential Equations (2nd edition), Chapman & Hall/CRC Press, Boca Raton, 2003. ISBN 1-58488-297-2
• From thesaurus.maths.org total derivative