Regular expressions represent sequences of symbols---letters taken from some alphabet. When the sequence is relatively short, the sequence is readily perceived and its properties may be examined. The notation is especially nice for summarizing cycles which can be repeated an arbitrary number of times, since a single star signifies this infinite collection of alternatives. When the regular expression is complicated, and particularly when it subsumes several distinct cycles at the same time, it rapidly becomes more complicated, both to write down completely and to comprehend once it has been written.
Although difficult of typography, the diagram representing a regular expression is preferable for expressions of moderate complexity, although eventually the many crossing lines and increased density of nodes take their toll, and the figure can no longer be comprehended.
Another representation has some advantages, although it is
typographically even more difficult to deal with. To begin with,
suppose that the symbols are chosen to be consecutive numbers, just as
we are accustomed to assuming when dealing with automata. If a decimal
point were to be placed in front of a finite sequence, it could be
interpreted as a number with a positional representation in base k if
there were k symbols running from 0 to Finite sequences would
be finite decimals, infinite sequences could be interpreted after the
fashion of real numbers. The principal difference is that in base 10
and
are ordinarily considered to be the
same real number; but have to be understood to be distinct regular
expressions---
in the first case and
in the second.
The advantage of this representation is that every regular expression,
no matter how complicated, is a point in the interval
and so is represented in a finite region of space. The counterbalancing
penalty is that this region can be filled more densely than is
convenient, often too densely for a clear presentation, since the
individual points are always fairly bulky and can never be presented
visually with the mathematical precision of occupying zero area.
We are interested in regular expressions describing configurations of cellular automata, but these are sequences infinite in both directions. To capture both directions in such a string, break it at some point, then write the left hand side backwards. Inserting a figurative decimal point as before now produces two intervals of unit length; the natural thing is to treat them as two coordinates generating a unit square. The vertical, or y-coordinate represents the left half of the sequence, the horizontal or x-coordinate represents the right half.
Shifting, or changing the decision as to where to break the sequence, will naturally change the point which represents the sequence; in any event it is a geometric operation whose general nature can be described. Indeed this is a fundamental operation on sequences: a right shift splits the unit square into k horizontal strips and reassembles them into k vertical strips after stretching them in one direction and compressing them in the other.
The numerical value associated with each sequence defines a distance, and consequently the distance between two sequences is just the sequence of differences. At this point the difference between this representation and that of the real numbers manifests itself; although different terms are included in the sum in the normal way, there are no carries from one term to another and so there is no confusion that a number may have two different representations as a repeating decimal.
In mathematical terms, suppose that there are two sequences
Their distance is defined as
The distance is small, or the sequences are close, when their central regions agree; the closer they are, the longer the matching core must be.
To have an idea of how this representation works, let us consider Rule
18 for a moment. We have seen that the sequence 111, amongst others, is
excluded from the second generation. If we divide the interval into eighths, any sequence
beginning with .111
will occupy some position in the last eighth of this interval. This
interval will include sequences of the form
, but
sequences of the form
will occupy the last eighth of the
interval
. Recursively, the last eighth of
any binary subdivision of the unit interval is excluded. Similar
comments apply to sequences extending to the left, so that as the unit
square represents various doubly infinite sequences the excluded
sequences will come to resemble a Scotch plaid.
Additionally it is necessary to consider sequences whose central point lies amongst the three excluded 1's. For example 1.11 will refer to points within the upper half of the square, but in the rightmost quarter; 11.1 to the upper quarter but right half. These regions have to be added to the plaid, as well as the further subdivisions implied by further decimals. Thus the same basic pattern is repeated recursively in each quadrant, 16-ant, and so on of the unit square.
Sequences of three 1's are not the only configuration excluded from the second generation, since it is also impossible to encounter a single 1 in a sequence bounded by pairs of 1's. Taking into account all of the excluded sequences simply gives an even finer structure to the plaid. In general the plaid is a two dimensional version of the classical Cantor set, which is the residue remaining after the middle thirds of the unit interval and surviving subintervals have been removed. In arithmetic terms, the Cantor set is the set of all decimals in the unit interval which contain no 1's when they have been displayed with respect to the base 3; which is one of the simplest possible examples of an excluded sequence.
The plaid can be used as the starting point for further discussions. For example the third generation must occupy a subset of the permitted portion of the second generation plaid, and so on as the automaton evolves through further generations; it is even meaningful to think of a limiting set which consists of just those points, if any, which remain after an infinite number of generations. The limiting set will include all the points representing cycles of whatever period, as well as gliders, fuses, and a variety of structures which may have no finite or cyclical representation.
Metrizing regular expressions, and in general, has the
advantage of creating a topological space, to which the whole machinery
of analysis may be applied. As a compact topological space, even
stronger conclusions, such as the necessary existence of certain
limits, can often be drawn. An extensive literature concerns the
application of these ideas to probability and symbolical dynamical
systems; but there is no reason that it cannot be broadened to include
regular expressions as well. In particular, the interaction between
probability measures and linear cellular automata can be studied
through this model, to the mutual enlightenment of both disciplines.