Open main menu

Wikipedia β

"Transition matrix" redirects here. For "transition matrix" in the sense of changing the basis of a coordinate vector, see Change of basis.

In mathematics, a stochastic matrix (also termed probability matrix, transition matrix,[1] substitution matrix, or Markov matrix) is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It has found use throughout a wide variety of scientific fields, including probability theory, statistics, mathematical finance and linear algebra, as well as computer science and population genetics. The stochastic matrix was first developed by Andrey Markov at the beginning of the 20th century. There are several different definitions and types of stochastic matrices:

A right stochastic matrix is a real square matrix, with each row summing to 1.
A left stochastic matrix is a real square matrix, with each column summing to 1.
A doubly stochastic matrix is a square matrix of nonnegative real numbers with each row and column summing to 1.

In the same vein, one may define stochastic vector (also called probability vector) as a vector whose elements are nonnegative real numbers which sum to 1. Thus, each row of a right stochastic matrix (or column of a left stochastic matrix) is a stochastic vector.

A common convention in English language mathematics literature is to use row vectors of probabilities and right stochastic matrices rather than column vectors of probabilities and left stochastic matrices; this article follows that convention.



Andrey Markov in 1886

The stochastic matrix was developed alongside the Markov chain by Andrey Markov, a Russian mathematician and professor at St. Petersburg University who first published on the topic in 1906.[2] His initial intended uses were for linguistic analysis and other mathematical subjects like card shuffling, but both Markov chains and matrices rapidly found use in other fields[2][3] Markov was persecuted by the Soviet Union and escaped political punishment at his death in 1922.[4]

Stochastic matrices were further developed by scholars like Andrey Kolmogorov, who expanded their possibilities by allowing for continuous-time Markov processes.[5] By the 1950s, stochastic matrices had begun to be used outside of their original mathematical fields, and articles appeared in the fields of econometrics,[6] and circuit theory.[7] In the 1960s, stochastic matrices appeared in an even wider variety of scientific works, from behavioral science[8] to geology[9][10] to residential planning.[11] In addition, much mathematical work was also done through these decades to improve the range of uses and functionality of the stochastic matrix and Markovian processes more generally.

From the 1970s to present, stochastic matrices have found use in almost every field that requires formal analysis, from structural science[12] to medical diagnosis[13] to personnel management.[14] In addition, stochastic matrices have found wide use in land change modeling, usually under the term Markov matrix.

Definition and propertiesEdit

A stochastic matrix describes a Markov chain   over a finite state space S with cardinality  .

If the probability of moving from   to   in one time step is  , the stochastic matrix   is given by using   as the   row and   column element, e.g.,


Since the total of transition probability from a state   to all other states must be 1, this matrix is a right stochastic matrix, so that


The product of two right stochastic matrices is also right stochastic. In particular, the  -th power   of a right stochastic matrix   is also right stochastic. The probability of transitioning from   to   in two steps is then given by the   element of the square of  :


In general the probability transition of going from any state to another state in a finite Markov chain given by the matrix   in k steps is given by  .

An initial distribution is given as a row vector.

A stationary probability vector   is defined as a distribution, written as a row vector, that does not change under application of the transition matrix; that is, it is defined as a probability distribution on the set   which is also a row eigenvector of the probability matrix, associated with eigenvalue 1:


The right spectral radius of every right stochastic matrix is clearly at most 1. Additionally, every right stochastic matrix has an obvious column eigenvector associated to the eigenvalue 1: The vector  , whose coordinates are all equal to 1. As left and right eigenvalues of a square matrix are the same, every stochastic matrix has, at least, a row eigenvector associated to the eigenvalue 1 and the largest absolute value of all its eigenvalues is also 1. Finally, the Brouwer Fixed Point Theorem (applied to the compact convex set of all probability distributions of the finite set  ) implies that there is some left eigenvector which is also a stationary probability vector.

On the other hand, the Perron–Frobenius theorem also ensures that every irreducible stochastic matrix has such a stationary vector, and that the largest absolute value of an eigenvalue is always 1. However, this theorem cannot be applied directly to such matrices because they need not be irreducible.

In general, there may be several such vectors. However, for a matrix with strictly positive entries (or, more generally, for an irreducible aperiodic stochastic matrix), this vector is unique and can be computed by observing that for any   we have the following limit,


where   is the   element of the row vector  . Among other things, this says that the long-term probability of being in a state   is independent of the initial state  . That both of these computations give the same stationary vector is a form of an ergodic theorem, which is generally true in a wide variety of dissipative dynamical systems: the system evolves, over time, to a stationary state.

Intuitively, a stochastic matrix represents a Markov chain; the application of the stochastic matrix to a probability distribution redistributes the probability mass of the original distribution while preserving its total mass. If this process is applied repeatedly, the distribution converges to a stationary distribution for the Markov chain.

Example: the cat and mouseEdit

Suppose you have a timer and a row of five adjacent boxes, with a cat in the first box and a mouse in the fifth box at time zero. The cat and the mouse both jump to a random adjacent box when the timer advances. E.g. if the cat is in the second box and the mouse in the fourth one, the probability is one fourth that the cat will be in the first box and the mouse in the fifth after the timer advances. If the cat is in the first box and the mouse in the fifth one, the probability is one that the cat will be in box two and the mouse will be in box four after the timer advances. The cat eats the mouse if both end up in the same box, at which time the game ends. The random variable K gives the number of time steps the mouse stays in the game.

The Markov chain that represents this game contains the following five states specified by the combination of positions (cat,mouse). Note that while a naive enumeration of states would list 25 states, many are impossible either because the mouse can never have a lower index than the cat (as that would mean the mouse occupied the cat's box and survived to move past it), or because the sum of the two indices will always have even parity. In addition, the 3 possible states that lead to the mouse's death are combined into one:

  • State 1: (1,3)
  • State 2: (1,5)
  • State 3: (2,4)
  • State 4: (3,5)
  • State 5: game over: (2,2), (3,3) & (4,4).

We use a stochastic matrix,   (below), to represent the transition probabilities of this system (rows and columns in this matrix are indexed by the possible states listed above, with the pre-transition state as the row and post-transition state as the column). For instance, starting from state 1 - 1st row - it is impossible for the system to stay in this state, so  ; the system also cannot transition to state 2 - because the cat would have stayed in the same box - so  , and by a similar argument for the mouse,  . Transitions to states 3 or 5 are allowed, and thus   .


Long-term averagesEdit

No matter what the initial state, the cat will eventually catch the mouse (with probability 1) and a stationary state π = (0,0,0,0,1) is approached as a limit. To compute the long-term average or expected value of a stochastic variable Y, for each state Sj and time tk there is a contribution of Yj,k·P(S=Sj,t=tk). Survival can be treated as a binary variable with Y=1 for a surviving state and Y=0 for the terminated state. The states with Y=0 do not contribute to the long-term average.

Phase-type representationEdit

The survival function of the mouse. The mouse will survive at least the first time step.

As State 5 is an absorbing state, the distribution of time to absorption is discrete phase-type distributed. Suppose the system starts in state 2, represented by the vector  . The states where the mouse has perished don't contribute to the survival average so state five can be ignored. The initial state and transition matrix can be reduced to,



 ; and  

where   is the identity matrix, and   represents a column matrix of all ones that acts as a sum over states.

Since each state is occupied for one step of time the expected time of the mouse's survival is just the sum of the probability of occupation over all surviving states and steps in time,


Higher order moments are given by


Use in land change modelingEdit

Transition matrices are commonly used in Geographic Information Systems and remote sensing applications to quantitatively estimate the rate of change in a land type category. In land change modeling, the stochastic matrix can be used to show the transition potential of one land category to another and create quantitative comparisons between land types, locations of study, and time intervals. The output shows the conditional probability that one land type will change into another land type within the specified time interval. The diagonal values in the transition probabilities file represent persistence of land type change. In land change modeling, one can multiply the transition potential values by the number of pixels at the first time point in a specified time interval to derive the absolute pixel count for that land cover type at the end of the specified time interval.[15] Another useful application of the stochastic matrix in land change modeling is predicting future land change, which helps to envision future land change, plan future land development, and notice warning signs in land change. It is helpful to have an understanding of the current background of land changes in the area of study in predicting future land change. Variations in transition probabilities of one land type to another are often caused by historical, political, economic, or biological changes in the research site of interest.[15]

Evaluation and Calibration In land change modeling, one can use Correct Rejections, False Alarms, Misses, and Hits evaluate model output. The Correct Rejection explains no reference land type change and no diagnostic change in land type. The False Alarm explains no reference land type change, but diagnostic change in land type. The Miss explains reference land type change, but no diagnostic change in land type. The Hit explains reference land type change, and diagnostic change in land type.[16]

See alsoEdit


  1. ^ Asmussen, S. R. (2003). "Markov Chains". Applied Probability and Queues. Stochastic Modelling and Applied Probability. 51. pp. 3–8. doi:10.1007/0-387-21525-5_1. ISBN 978-0-387-00211-8. 
  2. ^ a b Hayes, Brian (2013). "First links in the Markov chain". American Scientist101 (2): 92–96.
  3. ^ Charles Miller Grinstead; James Laurie Snell (1997). Introduction to Probability. American Mathematical Soc. pp. 464–466. ISBN 978-0-8218-0749-1.
  4. ^ Gely P. Basharin, Amy N. Langville, Valeriy A. Naumov, The Life and Work of A. A. Markov, page 6.
  5. ^ Kendall, D. G.; Batchelor, G. K.; Bingham, N. H.; Hayman, W. K.; Hyland, J. M. E.; Lorentz, G. G.; Moffatt, H. K.; Parry, W.; Razborov, A. A.; Robinson, C. A.; Whittle, P. (1990). "Andrei Nikolaevich Kolmogorov (1903–1987)". Bulletin of the London Mathematical Society22 (1): 33. doi:10.1112/blms/22.1.31. ISSN 0024-6093.
  6. ^ Solow, Robert (1952-01-01). "On the Structure of Linear Models". Econometrica. 20 (1): 29–46. doi:10.2307/1907805. 
  7. ^ Sittler, R. (1956-12-01). "Systems Analysis of Discrete Markov Processes". IRE Transactions on Circuit Theory. 3 (4): 257–266. doi:10.1109/TCT.1956.1086324. ISSN 0096-2007. 
  8. ^ Evans, Selby (1967-07-01). "Vargus 7: Computed patterns from markov processes". Behavioral Science. 12 (4): 323–328. doi:10.1002/bs.3830120407. ISSN 1099-1743. 
  9. ^ Gingerich, P. D. (1969-01-01). "Markov analysis of cyclic alluvial sediments". Journal of Sedimentary Research. 39 (1): 330–332. doi:10.1306/74d71c4e-2b21-11d7-8648000102c1865d. ISSN 1527-1404. 
  10. ^ Krumbein, W. C.; Dacey, Michael F. (1969-03-01). "Markov chains and embedded Markov chains in geology". Journal of the International Association for Mathematical Geology. 1 (1): 79–96. doi:10.1007/BF02047072. ISSN 0020-5958. 
  11. ^ Wolfe, Harry B. (1967-05-01). "Models for Conditioning Aging of Residential Structures". Journal of the American Institute of Planners. 33 (3): 192–196. doi:10.1080/01944366708977915. ISSN 0002-8991. 
  12. ^ "A Markov matrix for fatigue load simulation and rainflow range evaluation - ScienceDirect". Retrieved 2017-05-05. 
  13. ^ Beck, J.Robert; Pauker, Stephen G. (1983-12-01). "The Markov Process in Medical Prognosis". Medical Decision Making. 3 (4): 419–458. doi:10.1177/0272989X8300300403. ISSN 0272-989X. 
  14. ^ Gotz, Glenn A.; McCall, John J. (1983-03-01). "Sequential Analysis of the Stay/Leave Decision: U.S. Air Force Officers". Management Science. 29 (3): 335–351. doi:10.1287/mnsc.29.3.335. ISSN 0025-1909. 
  15. ^ a b Takada, Takenori; et al. (2010). [file:///C:/Users/Sophia/Downloads/TakadaEtAl2010LandscapeEcology_YearlyTransitionMatrix%20(7).pdf "Derivation of a yearly transition probability matrix for land-use dynamics and its applications"] Check |url= value (help) (PDF). Landscape and Urban Planning. 106.1: 562. 
  16. ^ Pontius, Robert Gilmore; et al. (2013). [file:///C:/Users/Sophia/Downloads/pontius_parmentier_2014_le%20(4).pdf "Recommendations for using the relative operating characteristic (ROC)"] Check |url= value (help) (PDF). ISPRS International Journal of Geo-Information. 2.3: 367–382. 
  • G. Latouche, V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic Modeling, 1st edition. Chapter 2: PH Distributions; ASA SIAM, 1999.