The stochastic de Bruijn matrices suggest one way to obtain a set of
n-block probabilities satisfying the Kolmogorov consistency
conditions, at least for -block probabilities in terms of
n-block probabilities. All that is required is to fill in the matrix
elements of the de Bruijn matrix with positive numbers, taking care not
to exceed, say, the unit column sum. The choice is relatively arbitrary
until the selection of the final element of the column, which must be
chosen to complete the sum. For k symbols, this means that only
of the
non-zero elements lack any freedom of choice;
the remaining
elements are then parameters to be chosen.
Once this is done, the equilibrium eigenvector is to be determined and
each of the columns of the stochastic matrix is to be multiplied by the
corresponding component, to produce the n-block probabilities. The
components of the eigenvector are the -block probabilities; the
probabilities for smaller blocks follow from summing longer blocks.
It is interesting to observe that once consistency has been established
for -block probabilities relative to n-block probabilities,
the consistency of all smaller blocks follows. Starting with n-blocks,
the inductive step consists of verifying consistency for
-blocks:
and to arrive at successively shorter blocks. In similar fashion it follows that the probability of any block up to length n is simply the sum of the probabilities of all the possible extensions which can be derived by the same route; for example,
Given the interrelation between n-blocks and all the shorter blocks,
some consideration can be given to using the probabilities of short
blocks as parameters. It is not possible to use only the
probabilities of short blocks as parameters; were it otherwise there
would be no need to to build up a theory of block probabilities.
Nevertheless it is possible to attempt the maximum usage of the
probabilities of short blocks. For example,
can all be chosen before the value of
is fixed in order
to make the complete sum equal to 1---a total of k-1 parameters. Or,
of a total of k parameters, one--
--has been excluded.
Then, among the 2-blocks,
can
be chosen before
is needed to complete a sum of
Alternatively, we can consider that
is excluded as a parameter,
as is
when the left extensions are considered. Going on to
3-blocks, we see that all the probabilities
or
are
excluded. Continuing through n-blocks, the total number of parameters
comprising probabilities in which 0 is neither an initial nor a
terminal symbol is
which can be summed to give the value just the same as
the number of parameters required to make a stochastic de Bruijn matrix.
Even when full use is made of the short blocks to define parameters the fraction of parameters which are n-block probabilities is
which reduces to 1/2 for a binary sequence, larger and
approaching 100% for others.