Open main menu

Wikipedia β

Hyperparameter (machine learning)

In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training.

Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. For instance, LASSO is an algorithm that adds a regularization hyperparameter to ordinary least squares regression, which has to be set before estimating the parameters through the training algorithm.



The time required to train and test a model can depend upon the choice of its hyperparameters.[1] An inherent stochasticity in learning directly implies that the empirical hyperparameter performance is not necessarily its true performance.[1] A hyperparameter is usually of continuous or integer type, leading to mixed-type optimization problems.[1] The existence of some hyperparameters is conditional upon the value of others, e.g. the size of each hidden layer in a neural network can be conditional upon the number of layers.[1]

Most performance variation can be attributed to just a few hyperparameters.[2][1][3] For an LSTM, while the learning rate followed by the network size are its most crucial hyperparameters,[4] others (namely, batching and momentum) have no significant effect on its performance.[5]


Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data.[1] The objective function takes a tuple of hyperparameters and returns the associated loss.[1]

See alsoEdit