# Smooth maximum

In mathematics, a smooth maximum of an indexed family x1, ..., xn of numbers is a differentiable approximation to the maximum function

${\displaystyle x_{1},\ldots ,x_{n}\mapsto \max(x_{1},\ldots ,x_{n}),}$

and the concept of smooth minimum is similarly defined.

For large positive values of the parameter ${\displaystyle \alpha >0}$, the following formulation is one smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

${\displaystyle {\mathcal {S}}_{\alpha }(x_{1},\ldots ,x_{n})={\frac {\sum _{i=1}^{n}x_{i}e^{\alpha x_{i}}}{\sum _{i=1}^{n}e^{\alpha x_{i}}}}}$

${\displaystyle {\mathcal {S}}_{\alpha }}$ has the following properties:

1. ${\displaystyle {\mathcal {S}}_{\alpha }\to \max }$ as ${\displaystyle \alpha \to \infty }$
2. ${\displaystyle {\mathcal {S}}_{0}}$ is the average of its inputs
3. ${\displaystyle {\mathcal {S}}_{\alpha }\to \min }$ as ${\displaystyle \alpha \to -\infty }$

The gradient of ${\displaystyle {\mathcal {S}}_{\alpha }}$ is closely related to softmax and is given by

${\displaystyle \nabla _{x_{i}}{\mathcal {S}}_{\alpha }(x_{1},\ldots ,x_{n})={\frac {e^{\alpha x_{i}}}{\sum _{j=1}^{n}e^{\alpha x_{j}}}}[1+\alpha (x_{i}-{\mathcal {S}}_{\alpha }(x_{1},\ldots ,x_{n}))].}$

This makes the softmax function useful for optimization techniques that use gradient descent.

Another formulation is:

${\displaystyle g(x_{1},\ldots ,x_{n})=\log(\exp(x_{1})+\ldots +\exp(x_{n})-(n-1))}$

The ${\displaystyle (n-1)}$ term corrects for the fact that ${\displaystyle \exp(0)=1}$ by canceling out all but one zero exponential