Surprisal

Wikipedia Entry

Title: Surprisal Analysis This article is about the information-theoretic technique. For other uses, see Shannon Theorem, and Information theory. Surprisal Analysis is an information-theoretical analysis technique that integrates and applies principles of thermodymamics and maximal entropy. Surprisal Analysis is capable of relating the underlying microscopic properties to the macroscopic bulk properties of a system. SA has already been applied to a spectrum of disciplines including engineering, physics, chemistry and biomedical engineering. Recently, SA has been extended to characterize the state of living cells, specifically monitoring and characterizing biological processes in real time using transcriptional data.

History: Surprisal Analysis was formulated at the Hebrew University of Jerusalem as a joint effort between Raphael David Levine, Richard Barry Bernstein and Avinoam Ben-Shaul in 1972. Levine and colleagues had recognized a need to better understand the dynamics of non-equilibrium systems, particularly of small systems, that are not seemingly applicable to thermodynamic reasoning ^[1]. Alhassid and Levine first applied surprisal analysis in nuclear physics, to characterize the distribution of products in heavy ion reactions. Since its formulation, surprisal analysis has become a critical tool for the analysis of reaction dynamics and is an official IUPAC term ^[2].

Application: Maximum entropy methods are at the core of a new view of scientific inference, allowing analysis and interpretation of large and sometimes noisy data. Surprisal analysis extends principles of maximal entropy and of thermodynamics, where both equilibrium thermodynamics and statistical mechanics are assumed to be inferences processes. This enables surprisal analysis to be an effective method of information quantification and compaction and of providing an unbiased characterization of systems. Surprisal analysis is particularly useful to characterize and understand dynamics in small systems, where energy fluxes otherwise negligible in large systems, heavily influence system behavior.

Foremost, surprisal analysis identifies the state of a system when it reaches its maximal entropy, or thermodynamic equilibrium. This is known as balance state of the system because once a system reaches its maximal entropy, it can no longer initiates or participates in spontaneous processes. Following the determination of the balanced state, surprisal analysis then characterizes all the states in which the system deviates away from the balance state. These deviations are caused by constraints; these constraints on the system prevent the system from reaching its maximal entropy. Surprisal analysis is applied to both identify and characterize these constraints. In terms of the constraints, the probability P(n) of an event n is quantified by

 		  (1) P(n)=P⁰(n)exp[-∑_α*λ_α*G_α*(n)]

Here P⁰(n) is the probability of the event n in the balanced state. It is usually called the “prior probability” be cause it is the probability of an event (n) prior to any constraints. The surprisal itself is defined as

                (2) Surprisal=-Ln (P(n)/P⁰(n) =Σ_α*λ_α*G_α(n)]

The surprisal equals the sum over the constraints and is a measure of the deviation from the balanced state. These deviations are ranked on the degree of deviation from the balance state and ordered on the most to least influential to the system. This ranking is provided through the use of Lagrange Multipliers. The most important constraint and usually the constraint sufficient to characterize a system exhibit the largest lagrange multiplier. The multiplier for constraint α is denoted above as λ_α . The larger multiplier identifies the influential the constraints to the system. The event variable G_α(n) is the value of the constraint α for the event n. Using the method of Lagrange Multipliers ^[3] requires that the prior probability, P⁰(n) , and the nature of the constraints are experimentally identified. A numerical algorithm for determining Lagrange multipliers has been introduced by Agmon et al ^[4]. Recently, singular value decomposition and principal component analysis of the surprisal was utilized to identify constraints on biological systems, extending surprisal analysis to better understanding biological dynamics as shown in the figure.

Surprisal Analysis in Physical Sciences. Surprisal was first introduced to better understand the specificity of energy release and selectivity of energy requirements of elementary chemical reactions ^[5]. This gave rise to a series of new experiments which demonstrated that in elementary reactions, the nascent products could be probed and that the energy is preferentially released and not statistically distributed ^[6]. Surprisal analysis was initially applied to characterize a small three molecule system that did not seemingly conform to principles of thermodynamics and a single dominant constraint was identified that was sufficient to describe the dynamic behavior of the three molecule system. Similar results were then observed in nuclear reactions, where differential states with varying energy partitioning are possible. Often chemical reactions require energy to overcome an activation barrier. Surprisal analysis is applicable to such applications as well ^[7]. Later, surprisal analysis was extended to mesoscopic systems, bulk systems ^[8]and to dynamical processes ^[9]>.

Surprisal Analysis in Biology and Biomedical Sciences: Recently, surprisal analysis was extended to better characterize and understand cellular processes ^[10], see figure, biological phenomena and human disease with reference to personalized diagnostics. Surprisal Analysis was first utilized to identify genes implicated in the balance state of cells in vitro; the genes present in the balance state were genes directly responsible for the maintenance of cellular homeostasis ^[11].

See Also: Information Theory, Singular Value Decomposition, Principal Component Analysis, Entropy, Maximal Entropy

References edit

^ Reference 1
^ Reference 2
^ Reference 5
^ Reference 3
^ Reference 1
^ Reference 1
^ Reference 4
^ Reference 5
^ Reference 6
^ Reference 7
^ Reference 8

Reference 1. Levine RD. “Molecular Reaction Dynamics” Cambridge University Press (2005) Reference 2. Agmon N, Alhassid Y, Levine RD. “An algorithm for finding the distribution of maximal entropy”. Journal of Computational Physics, 30:250-258, (1979). PDF Reference 3. Levine RD, Bernstein RB. “Energy Disposal and Energy Consumption in Elementary Chemical Relations: The Information Theoretic Approach.” Acc. Chem. Res. 7, 393 (1974). Reference 4. Levine RD. “Information Theory Approach to Molecular Reaction Dynamics.” Ann. Rev. Phys. Chem. 29, 59 (1978). Reference 5. Levine RD. “An Information Theoretic Approach to Inversion Problems.” J. Phys. A13, 91 (1980). Reference 6. Remacle F, Levine RD. “Maximal Entropy Spectral Fluctuations and the Sampling of Phase Space”, J. Chem. Phys. 99, 2383 (1993). Reference 7. Remacle F, Kravchenko-Balasha N, Levitzki A, Levine RD. “Information-theoretic analysis of phenotype changes in early stages of carcinogenesis.” PNAS 107, 10324-29 (2010). Reference 8. Kravchenko-Balasha N, et al. “On a fundamental structure of gene networks in living cells.” PNAS 109(12):4702-4707 (2012)./>

[1] Reference 1

[2] Reference 2

[3] Reference 5

[4] Reference 3

[5] Reference 1

[6] Reference 1

[7] Reference 4

[8] Reference 5

[9] Reference 6

[10] Reference 7

[11] Reference 8

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]