A de Bruijn diagram is a basic structure which can be applied in various ways. It describes all the possible ways that two sequences of symbols, or windows, can overlap if they are displaced with respect to each other; just the relationship between the neighborhoods of two successive cells in a linear cellular automaton. Selecting a subset of the diagram by retaining or discarding selected links discloses states of a given period, gliders, and other information. An intermediate approach would assign probabilities to the links rather than selecting them. As an extreme case, a probability of zero would exclude a link, a probability of one would surely include it, giving other values in the same interval intermediate interpretations.
The connectivity matrix of a graph is readily adapted to show the probabilities of the links; one has only to define
The rules of matrix multiplication coincide with the rules for compound probabilities if it is desired to calculate the probability of forming chains of several links. Even the de Bruijn matrix itself is readily converted into a probability matrix by replacing its non-zero elements by the value , making the choice of all paths equally likely.
Just as probabilities can be associated with the links of a de Bruijn diagram, they also can be associated with the nodes; to conserve the probabilistic interpretation of matrix multiplication the nodal probabilities might be used to form a vector, with matrix equations describing the linkages between nodes. A vector of probabilities would have non-negative real components---with a unit sum, if it happens that all the different components comprise a complete set of alternatives.
Any matrix intended to preserve each and every such column vector must necessarily have columns of unit sum, as can be seen by testing the matrix on unit vectors. Such a matrix is called a stochastic matrix. Likewise the preservation of unit row sums would require the matrix to have unit row sums. Sometimes the conditions for rows and columns must be met simultaneously, for which the adjective doubly stochastic would be used.
In terms of the de Bruijn diagram, if the incoming probabilities at a node were required to have a unit sum, the probabilistic de Bruijn matrix would have to be column-stochastic. This property would refer to the rows if the outgoing probabilities were constrained. Expressing either constraint in matrix form guarantees a unit eigenvalue, with a probability vector as the equilibrium eigenvector of opposite handedness. These and certain other conclusions characterize stochastic matrices, all of which information can be found in any one of several recent textbooks[41,13,103] which can be consulted for descriptions and proofs of the results.
If the links in an n-stage de Bruijn matrix are replaced by their corresponding n-block probabilities, the row or column probabilities do not sum to 1, but rather to the -block probabilities; this suggests dividing each column by its column sum to get a stochastic matrix; alternatively the rows could be divided by the row sums. Generally these are two distinct choices and result in two different de Bruijn matrices. For right extension, we define
for left extension,
Among other things, these denominators compensate for the overlapping between blocks when de Bruijn matrices are multiplied.
A stochastic de Bruijn matrix (of either handedness) can be used to generate a set of n-block probabilities which will satisfy the Kolmogorov consistency conditions. Suppose that the rows are indexed by the -block sequences, so that writing one of the non-zero matrix elements as identifies it as belonging to the axth row and the xbth column. Suppose further that the matrix is column stochastic. Then the row is an eigenvector belonging to unit eigenvalue, and we may call the components of the corresponding column eigenvector. By rewriting
in the form
we see that the elements are row sums of a new matrix whose elements are defined implicitly by the equation
Since we have only renamed the elements , the matrix M is still column stochastic; since has the same denominator throughout a given column we have
when rewritten in the form
we have the other half of the consistency condition.