Free energy principle

The free energy principle is a theory in cognitive science that attempts to explain how living and non-living systems remain in non-equilibrium steady-states by restricting themselves to a limited number of states. It establishes that systems minimise a free energy function of their internal states (not to be confused with thermodynamic free energy), which entail beliefs about hidden states in their environment. The implicit minimisation of free energy is formally related to variational Bayesian methods and was originally introduced by Karl Friston as an explanation for embodied perception in neuroscience,[1] where it is also known as active inference.

The free energy principle describes the behaviour of a given system by modeling it through a Markov blanket that tries to minimize the difference between their model of the world and their sense and associated perception. This difference can be described as "surprise" and is minimized by continuous correction of the world model of the system. As such, the principle is based on the Bayesian idea of the brain as an “inference engine.” Friston added a second route to minimization: action. By actively changing the world into the expected state, systems can also minimize the free energy of the system. Friston assumes this to be the principle of all biological reaction.[2] Friston also believes his principle applies to mental disorders as well as to artificial intelligence. AI implementations based on the active inference principle have shown advantages over other methods.[2]

The free energy principle has been criticized for being very difficult to understand, even for experts[3] and the mathematical consistency of the theory have been questioned by recent studies.[4][5] Discussions of the principle have also been criticized as invoking metaphysical assumptions far removed from a testable scientific prediction, making the principle unfalsifiable.[6] In a 2018 interview, Friston acknowledged that the free energy principle is not properly falsifiable: "the free energy principle is what it is — a principle. Like Hamilton's principle of stationary action, it cannot be falsified. It cannot be disproven. In fact, there’s not much you can do with it, unless you ask whether measurable systems conform to the principle."[7]


The notion that self-organising biological systems – like a cell or brain – can be understood as minimising variational free energy is based upon Helmholtz’s work on unconscious inference[8] and subsequent treatments in psychology[9] and machine learning.[10] Variational free energy is a function of observations and a probability density over their hidden causes. This variational density is defined in relation to a probabilistic model that generates predicted observations from hypothesized causes. In this setting, free energy provides an approximation to Bayesian model evidence.[11] Therefore, its minimisation can be seen as a Bayesian inference process. When a system actively makes observations to minimise free energy, it implicitly performs active inference and maximises the evidence for its model of the world.

However, free energy is also an upper bound on the self-information of outcomes, where the long-term average of surprise is entropy. This means that if a system acts to minimise free energy, it will implicitly place an upper bound on the entropy of the outcomes – or sensory states – it samples.[12][6][better source needed]

Relationship to other theoriesEdit

Active inference is closely related to the good regulator theorem[13] and related accounts of self-organisation,[14][15] such as self-assembly, pattern formation, autopoiesis[16] and practopoiesis.[17] It addresses the themes considered in cybernetics, synergetics[18] and embodied cognition. Because free energy can be expressed as the expected energy of observations under the variational density minus its entropy, it is also related to the maximum entropy principle.[19] Finally, because the time average of energy is action, the principle of minimum variational free energy is a principle of least action.


Definition (continuous formulation): Active inference rests on the tuple  

  • A sample space   – from which random fluctuations   are drawn
  • Hidden or external states   – that cause sensory states and depend on action
  • Sensory states   – a probabilistic mapping from action and hidden states
  • Action   – that depends on sensory and internal states
  • Internal states   – that cause action and depend on sensory states
  • Generative density   – over sensory and hidden states under a generative model  
  • Variational density   – over hidden states   that is parameterised by internal states  

Action and perceptionEdit

The objective is to maximise model evidence   or minimise surprise  . This generally involves an intractable marginalisation over hidden states, so surprise is replaced with an upper variational free energy bound.[10] However, this means that internal states must also minimise free energy, because free energy is a function of sensory and internal states:


This induces a dual minimisation with respect to action and internal states that correspond to action and perception respectively.

Free energy minimisationEdit

Free energy minimisation and self-organisationEdit

Free energy minimisation has been proposed as a hallmark of self-organising systems when cast as random dynamical systems.[20] This formulation rests on a Markov blanket (comprising action and sensory states) that separates internal and external states. If internal states and action minimise free energy, then they place an upper bound on the entropy of sensory states:


This is because – under ergodic assumptions – the long-term average of surprise is entropy. This bound resists a natural tendency to disorder – of the sort associated with the second law of thermodynamics and the fluctuation theorem. However, formulating a unifying principle for the life sciences in terms of concepts from statistical physics, such as random dynamical system, non-equilibrium steady state and ergodicity, places substantial constraints on the theoretical and empirical study of biological systems with the risk of obscuring all features that make biological systems interesting kinds of self-organizing systems.[21]

Free energy minimisation and Bayesian inferenceEdit

All Bayesian inference can be cast in terms of free energy minimisation[22][failed verification]. When free energy is minimised with respect to internal states, the Kullback–Leibler divergence between the variational and posterior density over hidden states is minimised. This corresponds to approximate Bayesian inference – when the form of the variational density is fixed – and exact Bayesian inference otherwise. Free energy minimisation therefore provides a generic description of Bayesian inference and filtering (e.g., Kalman filtering). It is also used in Bayesian model selection, where free energy can be usefully decomposed into complexity and accuracy:


Models with minimum free energy provide an accurate explanation of data, under complexity costs (c.f., Occam's razor and more formal treatments of computational costs[23]). Here, complexity is the divergence between the variational density and prior beliefs about hidden states (i.e., the effective degrees of freedom used to explain the data).

Free energy minimisation and thermodynamicsEdit

Variational free energy is an information-theoretic functional and is distinct from thermodynamic (Helmholtz) free energy.[24] However, the complexity term of variational free energy shares the same fixed point as Helmholtz free energy (under the assumption the system is thermodynamically closed but not isolated). This is because if sensory perturbations are suspended (for a suitably long period of time), complexity is minimised (because accuracy can be neglected). At this point, the system is at equilibrium and internal states minimise Helmholtz free energy, by the principle of minimum energy.[25]

Free energy minimisation and information theoryEdit

Free energy minimisation is equivalent to maximising the mutual information between sensory states and internal states that parameterise the variational density (for a fixed entropy variational density).[12][better source needed] This relates free energy minimization to the principle of minimum redundancy[26] and related treatments using information theory to describe optimal behaviour.[27][28]

Free energy minimisation in neuroscienceEdit

Free energy minimisation provides a useful way to formulate normative (Bayes optimal) models of neuronal inference and learning under uncertainty[29] and therefore subscribes to the Bayesian brain hypothesis.[30] The neuronal processes described by free energy minimisation depend on the nature of hidden states:   that can comprise time-dependent variables, time-invariant parameters and the precision (inverse variance or temperature) of random fluctuations. Minimising variables, parameters, and precision correspond to inference, learning, and the encoding of uncertainty, respectively.

Perceptual inference and categorisationEdit

Free energy minimisation formalises the notion of unconscious inference in perception[8][10] and provides a normative (Bayesian) theory of neuronal processing. The associated process theory of neuronal dynamics is based on minimising free energy through gradient descent. This corresponds to generalised Bayesian filtering (where ~ denotes a variable in generalised coordinates of motion and   is a derivative matrix operator):[31]


Usually, the generative models that define free energy are non-linear and hierarchical (like cortical hierarchies in the brain). Special cases of generalised filtering include Kalman filtering, which is formally equivalent to predictive coding[32] – a popular metaphor for message passing in the brain. Under hierarchical models, predictive coding involves the recurrent exchange of ascending (bottom-up) prediction errors and descending (top-down) predictions[33] that is consistent with the anatomy and physiology of sensory[34] and motor systems.[35]

Perceptual learning and memoryEdit

In predictive coding, optimising model parameters through a gradient descent on the time integral of free energy (free action) reduces to associative or Hebbian plasticity and is associated with synaptic plasticity in the brain.

Perceptual precision, attention and salienceEdit

Optimizing the precision parameters corresponds to optimizing the gain of prediction errors (c.f., Kalman gain). In neuronally plausible implementations of predictive coding,[33] this corresponds to optimizing the excitability of superficial pyramidal cells and has been interpreted in terms of attentional gain.[36]

Simulation of the results achieved from a selective attention task carried out by the Bayesian reformulation of the SAIM entitled PE-SAIM in multiple objects environment. The graphs show the time course of the activation for the FOA and the two template units in the Knowledge Network.

Concerning the top-down vs bottom-up controversy that has been addressed as a major open problem of attention, a computational model has succeeded in illustrating the circulatory nature of reciprocation between top-down and bottom-up mechanisms. Using an established emergent model of attention, namely, SAIM, the authors suggested a model called PE-SAIM that – in contrast to the standard version – approaches the selective attention from a top-down stance. The model takes into account the forwarding prediction errors sent to the same level or a level above to minimize the energy function indicating the difference between data and its cause or – in other words – between the generative model and posterior. To enhance validity, they also incorporated the neural competition between the stimuli in their model. A notable feature of this model is the reformulation of the free energy function only in terms of prediction errors during the task performance:


where   is the total energy function of the neural networks entail, and   is the prediction error between the generative model (prior) and posterior changing over time.[37] Comparing the two models reveals a notable similarity between their respective results while also highlighting a remarkable discrepancy, whereby – in the standard version of the SAIM – the model's focus is mainly upon the excitatory connections, whereas in the PE-SAIM, the inhibitory connections are leveraged to make an inference. The model has also proved to be fit to predict the EEG and fMRI data drawn from human experiments with high precision. In the same vein, Yahya et al. also applied the free energy principle to propose a computational model for template matching in covert selective visual attention that mostly relies on SAIM.[38] According to this study, the total free energy of the whole state-space is reached by inserting top-down signals in the original neural networks, whereby we derive a dynamical system comprising both feed-forward and backward prediction error.

Active inferenceEdit

When gradient descent is applied to action  , motor control can be understood in terms of classical reflex arcs that are engaged by descending (corticospinal) predictions. This provides a formalism that generalizes the equilibrium point solution – to the degrees of freedom problem[39] – to movement trajectories.

Active inference and optimal controlEdit

Active inference is related to optimal control by replacing value or cost-to-go functions with prior beliefs about state transitions or flow.[40] This exploits the close connection between Bayesian filtering and the solution to the Bellman equation. However, active inference starts with (priors over) flow   that are specified with scalar   and vector   value functions of state space (c.f., the Helmholtz decomposition). Here,   is the amplitude of random fluctuations and cost is  . The priors over flow   induce a prior over states   that is the solution to the appropriate forward Kolmogorov equations.[41] In contrast, optimal control optimises the flow, given a cost function, under the assumption that   (i.e., the flow is curl free or has detailed balance). Usually, this entails solving backward Kolmogorov equations.[42]

Active inference and optimal decision (game) theoryEdit

Optimal decision problems (usually formulated as partially observable Markov decision processes) are treated within active inference by absorbing utility functions into prior beliefs. In this setting, states that have a high utility (low cost) are states an agent expects to occupy. By equipping the generative model with hidden states that model control, policies (control sequences) that minimise variational free energy lead to high utility states.[43]

Neurobiologically, neuromodulators such as dopamine are considered to report the precision of prediction errors by modulating the gain of principal cells encoding prediction error.[44] This is closely related to – but formally distinct from – the role of dopamine in reporting prediction errors per se[45] and related computational accounts.[46]

Active inference and cognitive neuroscienceEdit

Active inference has been used to address a range of issues in cognitive neuroscience, brain function and neuropsychiatry, including action observation,[47] mirror neurons,[48] saccades and visual search,[49][50] eye movements,[51] sleep,[52] illusions,[53] attention,[36] action selection,[44] consciousness,[54][55] hysteria[56] and psychosis.[57] Explanations of action in active inference often depend on the idea that the brain has 'stubborn predictions' that it cannot update, leading to actions that cause these predictions to come true.[58]

See alsoEdit


  1. ^ Friston, Karl; Kilner, James; Harrison, Lee (2006). "A free energy principle for the brain" (PDF). Journal of Physiology-Paris. Elsevier BV. 100 (1–3): 70–87. doi:10.1016/j.jphysparis.2006.10.001. ISSN 0928-4257. PMID 17097864. S2CID 637885.
  2. ^ a b Shaun Raviv: The Genius Neuroscientist Who Might Hold the Key to True AI. In: Wired, 13. November 2018
  3. ^ Freed, Peter (2010). "Research Digest". Neuropsychoanalysis. Informa UK Limited. 12 (1): 103–106. doi:10.1080/15294145.2010.10773634. ISSN 1529-4145. S2CID 220306712.
  4. ^ Aguilera, Miguel; Millidge, Beren; Tschantz, Alexander; Buckley, Christopher L (2022). "How particular is the physics of the free energy principle?". Physics of Life Reviews. 40: 24–50. arXiv:2105.11203. Bibcode:2022PhLRv..40...24A. doi:10.1016/j.plrev.2021.11.001. PMC 8902446. PMID 34895862.
  5. ^ Biehl, Martin; Pollock, Felix; Kanai, Ryota (2021). "A Technical Critique of Some Parts of the Free Energy Principle". Entropy. 23 (3): 293. Bibcode:2021Entrp..23..293B. doi:10.3390/e23030293. PMC 7997279. PMID 33673663.
  6. ^ a b Colombo, Matteo; Wright, Cory (2018-09-10). "First principles in the life sciences: the free-energy principle, organicism, and mechanism". Synthese. Springer Science and Business Media LLC. 198: 3463–3488. doi:10.1007/s11229-018-01932-w. ISSN 0039-7857.
  7. ^ Friston, Karl (2018). "Of woodlice and men: A Bayesian account of cognition, life and consciousness. An interview with Karl Friston (by Martin Fortier & Daniel Friedman)". ALIUS Bulletin. 2: 17–43.
  8. ^ a b Helmholtz, H. (1866/1962). Concerning the perceptions in general. In Treatise on physiological optics (J. Southall, Trans., 3rd ed., Vol. III). New York: Dover. Available at
  9. ^ Gregory, R. L. (1980-07-08). "Perceptions as hypotheses". Philosophical Transactions of the Royal Society of London. B, Biological Sciences. The Royal Society. 290 (1038): 181–197. Bibcode:1980RSPTB.290..181G. doi:10.1098/rstb.1980.0090. ISSN 0080-4622. JSTOR 2395424. PMID 6106237.
  10. ^ a b c Dayan, Peter; Hinton, Geoffrey E.; Neal, Radford M.; Zemel, Richard S. (1995). "The Helmholtz Machine" (PDF). Neural Computation. MIT Press - Journals. 7 (5): 889–904. doi:10.1162/neco.1995.7.5.889. hdl:21.11116/0000-0002-D6D3-E. ISSN 0899-7667. PMID 7584891. S2CID 1890561.
  11. ^ Beal, M. J. (2003). Variational Algorithms for Approximate Bayesian Inference. Ph.D. Thesis, University College London.
  12. ^ a b Karl, Friston (2012-10-31). "A Free Energy Principle for Biological Systems" (PDF). Entropy. MDPI AG. 14 (11): 2100–2121. Bibcode:2012Entrp..14.2100K. doi:10.3390/e14112100. ISSN 1099-4300. PMC 3510653. PMID 23204829.
  13. ^ Conant, R. C., & Ashby, R. W. (1970). Every Good Regulator of a system must be a model of that system. Int. J. Systems Sci. , 1 (2), 89–97.
  14. ^ Kauffman, S. (1993). The Origins of Order: Self-Organization and Selection in Evolution. Oxford: Oxford University Press.
  15. ^ Nicolis, G., & Prigogine, I. (1977). Self-organization in non-equilibrium systems. New York: John Wiley.
  16. ^ Maturana, H. R., & Varela, F. (1980). Autopoiesis: the organization of the living. In V. F. Maturana HR (Ed.), Autopoiesis and Cognition. Dordrecht, Netherlands: Reidel.
  17. ^ Nikolić, D. (2015). Practopoiesis: Or how life fosters a mind. Journal of theoretical biology, 373, 40-61.
  18. ^ Haken, H. (1983). Synergetics: An introduction. Non-equilibrium phase transition and self-organisation in physics, chemistry and biology (3rd ed.). Berlin: Springer Verlag.
  19. ^ Jaynes, E. T. (1957). Information Theory and Statistical Mechanics. Physical Review Series II, 106 (4), 620–30.
  20. ^ Crauel, H., & Flandoli, F. (1994). Attractors for random dynamical systems. Probab Theory Relat Fields, 100, 365–393.
  21. ^ Colombo, M., Palacios, P. Non-equilibrium thermodynamics and the free energy principle in biology. Biol Philos 36, 41 (2021).
  22. ^ Roweis, S., & Ghahramani, Z. (1999). A unifying review of linear Gaussian models. Neural Computat. , 11 (2), 305–45. doi:10.1162/089976699300016674
  23. ^ Ortega, P. A., & Braun, D. A. (2012). Thermodynamics as a theory of decision-making with information processing costs. Proceedings of the Royal Society A, vol. 469, no. 2153 (20120683) .
  24. ^ Evans, D. J. (2003). A non-equilibrium free energy theorem for deterministic systems. Molecular Physics, 101, 15551–4.
  25. ^ Jarzynski, C. (1997). Nonequilibrium equality for free energy differences. Phys. Rev. Lett., 78, 2690.
  26. ^ Barlow, H. (1961). Possible principles underlying the transformations of sensory messages Archived 2012-06-03 at the Wayback Machine. In W. Rosenblith (Ed.), Sensory Communication (pp. 217-34). Cambridge, MA: MIT Press.
  27. ^ Linsker, R. (1990). Perceptual neural organization: some approaches based on network models and information theory. Annu Rev Neurosci. , 13, 257–81.
  28. ^ Bialek, W., Nemenman, I., & Tishby, N. (2001). Predictability, complexity, and learning. Neural Computat., 13 (11), 2409–63.
  29. ^ Friston, K. (2010). The free-energy principle: a unified brain theory? Nat Rev Neurosci. , 11 (2), 127–38.
  30. ^ Knill, D. C., & Pouget, A. (2004). The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci., 27 (12), 712–9.
  31. ^ Friston, K., Stephan, K., Li, B., & Daunizeau, J. (2010). Generalised Filtering. Mathematical Problems in Engineering, vol., 2010, 621670
  32. ^ Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci. , 2 (1), 79–87.
  33. ^ a b Mumford, D. (1992). On the computational architecture of the neocortex. II. Biol. Cybern. , 66, 241–51.
  34. ^ Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). Canonical microcircuits for predictive coding. Neuron , 76 (4), 695–711.
  35. ^ Adams, R. A., Shipp, S., & Friston, K. J. (2013). Predictions not commands: active inference in the motor system. Brain Struct Funct. , 218 (3), 611–43
  36. ^ a b Feldman, H., & Friston, K. J. (2010). Attention, uncertainty, and free-energy. Frontiers in Human Neuroscience, 4, 215.
  37. ^ Abadi K.A., Yahya K., Amini M., Heinke D. & Friston, K. J. (2019). Excitatory versus inhibitory feedback in Bayesian formulations of scene construction. 16 R. Soc. Interface
  38. ^ Yahya K., Fard P.R., & Friston, K.J. (2014). [DOI: 10.1007/s10339-013-0597-6 A free energy approach to visual attention: a connectionist model]. Cogn Process (2014) 15:107.
  39. ^ Feldman, A. G., & Levin, M. F. (1995). The origin and use of positional frames of reference in motor control. Behav Brain Sci. , 18, 723–806.
  40. ^ Friston, K., (2011). What is optimal about motor control?. Neuron, 72(3), 488–98.
  41. ^ Friston, K., & Ao, P. (2012). Free-energy, value and attractors. Computational and mathematical methods in medicine, 2012, 937860.
  42. ^ Kappen, H., (2005). Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment, 11, p. P11011.
  43. ^ Friston, K., Samothrakis, S. & Montague, R., (2012). Active inference and agency: optimal control without cost functions. Biol. Cybernetics, 106(8–9), 523–41.
  44. ^ a b Friston, K. J. Shiner T, FitzGerald T, Galea JM, Adams R, Brown H, Dolan RJ, Moran R, Stephan KE, Bestmann S. (2012). Dopamine, affordance and active inference. PLoS Comput. Biol., 8(1), p. e1002327.
  45. ^ Fiorillo, C. D., Tobler, P. N. & Schultz, W., (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299(5614), 1898–902.
  46. ^ Frank, M. J., (2005). Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. J Cogn Neurosci., Jan, 1, 51–72.
  47. ^ Friston, K., Mattout, J. & Kilner, J., (2011). Action understanding and active inference. Biol Cybern., 104, 137–160.
  48. ^ Kilner, J. M., Friston, K. J. & Frith, C. D., (2007). Predictive coding: an account of the mirror neuron system. Cogn Process., 8(3), pp. 159–66.
  49. ^ Friston, K., Adams, R. A., Perrinet, L. & Breakspear, M., (2012). Perceptions as hypotheses: saccades as experiments. Front Psychol., 3, 151.
  50. ^ Mirza, M., Adams, R., Mathys, C., Friston, K. (2018). Human visual exploration reduces uncertainty about the sensed world. PLoS One, 13(1): e0190429
  51. ^ Perrinet L, Adams R, Friston, K. Active inference, eye movements and oculomotor delays. Biological Cybernetics, 108(6):777-801, 2014.
  52. ^ Hobson, J. A. & Friston, K. J., (2012). Waking and dreaming consciousness: Neurobiological and functional considerations. Prog Neurobiol, 98(1), pp. 82–98.
  53. ^ Brown, H., & Friston, K. J. (2012). Free-energy and illusions: the cornsweet effect. Front Psychol , 3, 43.
  54. ^ Rudrauf, David; Bennequin, Daniel; Granic, Isabela; Landini, Gregory; Friston, Karl; Williford, Kenneth (2017-09-07). "A mathematical model of embodied consciousness" (PDF). Journal of Theoretical Biology. 428: 106–131. Bibcode:2017JThBi.428..106R. doi:10.1016/j.jtbi.2017.05.032. ISSN 0022-5193. PMID 28554611.
  55. ^ K, Williford; D, Bennequin; K, Friston; D, Rudrauf (2018-12-17). "The Projective Consciousness Model and Phenomenal Selfhood". Frontiers in Psychology. 9: 2571. doi:10.3389/fpsyg.2018.02571. PMC 6304424. PMID 30618988.
  56. ^ Edwards, M. J., Adams, R. A., Brown, H., Pareés, I., & Friston, K. J. (2012). A Bayesian account of 'hysteria'. Brain , 135(Pt 11):3495–512.
  57. ^ Adams RA, Perrinet LU, Friston K. (2012). Smooth pursuit and visual occlusion: active inference and oculomotor control in schizophrenia. PLoS One. , 12;7(10):e47502
  58. ^ Yon, Daniel; Lange, Floris P. de; Press, Clare (2019-01-01). "The Predictive Brain as a Stubborn Scientist". Trends in Cognitive Sciences. 23 (1): 6–8. doi:10.1016/j.tics.2018.10.003. ISSN 1364-6613. PMID 30429054. S2CID 53280000.

External linksEdit