Date: Tue, 9 Apr 1996 23:24:52
To: ca@think.com
Cellular automata theory has a history beginning with von Neumann's universal constructor and Moore's Gardem of Eden theorem, continuing with attempts at classification, organization, and simplification, and eventually attracting widespread public notoriety with Martin Gardner's publication of Conway's "Game of Life" in Scientific American.
Some years later, Wolfram's simulation experiments with a minicomputer and the construction of the CAM series originated by Toffoli revived the subject by allowing people to visualize the workings of cellular automata on a large scale, and compare the results with models which had grown up in other fields, such as the Zhabotinsky reactions or lattice gasses.
Somewhere along the line, especially for one dimensional automata, realization grew that the subject matter was really the study of sequences in labelled, directed graphs, and that there already existed a venerable tradition amongst those mathematicians concerned with the shift dynamical system. Somehow, the dynamicists seem not to have known about graphs, graph theorists about automata, nor automatists about dynamics. Indeed, the not-knowing-graph strongly resembles K3.
The last two portions of this series described the contents of Hedlund's often cited article, which is sometimes taken as a theoretical basis for cellular automata theory, and Nasu's first article whose graph theoretical orientation contrasts with the topological approach of Hedlund. There remains to be discussed Williams articles on the classification of Shifts of Finite Type. Before then, it might be well to describe the graph theoretical basis of the subject.
It has been convenient to call the basic structure a de Bruijn diagram, even though a similar structure was introduced contemporaneously for similar purposes by I. J. Good, both dating from 1946. The ideas apparently had an earlier history, and one cannot help wondering whether the concepts had cryptographic overtones. Shannon's noiseless channel had a similar diagram. Our awareness of the diagrams was a resut of their inclusion in Golomb's book on shift register theory, where they serve to describe the progression of overlapping segments while stepping through a long symbol sequence.
In the realm of one dimensional cellular automata, considerable information can be read directly from properly labelled de Bruijn diagrams, especially when the nodes are sequences just one short of a full neighborhood of cells so that the links join those which overlap, thereby representing complete neighborhoods. Not only neighborhoods, but any of their whole panoply of boolean functions, can be used to label the links, themselves directed arcs because the direction of overlap matters. Labelling the links by the cell into which the neighborhood evolves is especially useful for the reversed attribute of singling out ancestors, which is the implicit relationship to be deduced from Hedlund's article and explicitly spelled out in Nasu's.
By now this relationship is reasonably well known, not the least for having been mentioned regularly in CA-Mail discussions. Two extensions of the diagram are important for developing the theory, namely the subset diagram and the pair diagram. The former is a standard construction in automata theory, used for giving a systematic answer to the question of whether the original diagram does or does not contain a given labelled path. The unit classes for the subsets are the individual nodes of the de Bruijn diagram, subset linkage being defined by linking the subset to the union of the linkages in the base diagram and assigning them the same labels.
The subset diagram serves at least two purposes. First, linkage in the base diagram is not a function because nodes may have multiple out-links, so the subset diagram is a covering space in which linkage is functional, and it might be surmised that it is the least extension having this property. In passing, let it be noted that there are really two subset diagrams, according to whether in-links or out-links are functionalized. Both Hedlund and Nasu enjoy formalizing this symmetry by stating "vector" theorems, but that is a question of style.
Second, the diagram systematizes the search for a path; since it is functional it is only necessary to check paths leading from the full set to the empty set (having exploited the polite mathematical fiction that no-link is a yes-link to the empty set). Its use also avoids having to use the mathematical circumlocution "non-deterministic automaton," but of course there are also drawbacks. One of them is the huge size of the subset diagram, since it grows exponentially with the size of the base diagram. As Voorhees points out and Nasu obtains by not formally mentioning the subset diagram, only a small part of it may be needed in practice.
For any graph, it is customary to discuss a series of connectivity attributes. Nodes laking in-links are leaves (those lacking outlets are rootlets?), which can be extended recursively to those successor nodes having incoming paths of bounded length. That leaves a residue of nodes whose paths can be arbitrarily prolonged in one direction, the other, both, or neither. For a finite graph, unlimited continuation implies loops.
Further connectivity properties concern whether or not paths exist between pairs of points, and in which directions. Naturally, when one constructs a derived graph, questions of extensibility and connectivity transfer to the derivative; for example finding a leaf in some subset makes that subset a leaf, but there are others - for example when the subset contains some but not all of the outwardly linked nodes of a woud-be counterimage.
The subset diagram has its own connectivity matrix, whose powers can be used to discover exactly which words generate the Garden of Eden, and what their lengths are. It is a more subtle operation to enumerate the ancestors of those paths which actually have ancestors, which is why the "vector subset diagram" was invented, and why Nasu sometimes ties paths to an origin.
Absent the Garden of Eden, the evolution rule is a candidate for endomprphism, at which point Welch's indices come into play, as well as Nasu's observation that [his portion of] the subset diagram is "well defined" (as an image of the de Bruijn diagram). Subset size (cardinality, to be more formal) stratifies the subset diagram, the indices state which level (yea, each index unto its own handedness); there is also an interesting intersection property relating subsets of the left extensions and those of the right extensions.
There is nothing in the subset diagram which could not have been taken directly from the de Bruijn diagram, but somehow its use seems to clarify the exposition markedly. The same could be said of the pair diagram, which is the third member of the trinity.
By definition, the pair diagram is constructed from the cartesian product of two copies of the vertex set of its base diagram, pairs being linked when both members are linked. If the graph is to be labelled, the label of a pair link is the common label of its coordinates (other labellings are possible, but that is another matter, and gives different graphs). Much the same graph arises when unordered pairs are used instead of ordered peirs, but in either event, the diagonal is (isomorphic to) a copy of the base diagram.
When evolution labels the pair diagram, the existence of paths outside the diagonal implies distinct ancestors. Transients and connectivity in the pair diagram are a consequence of the same qualities in the base diagram, but now a pair can lose linkage when the coordinates behave differently. Note also that the pair diagram is vaguely part of the subset diagram, except that it loses linkages whereas they would move to a different level in the subset diagram. All of Hedlund's discussions concerning (n-1)-separatedness can be read as descriptions of the pair diagram.
Usually the pair diagram is consulted after having used the subset diagram to characterize the Garden of Eden; its emptiness assures the existence of paths in the pair diagram, so the interesting questions concern whether or not a path includes the diagonal.
Paths which contain loops cannot intersect the diagonal without being wholly contained, otherwise counting all possibilities would show a violation of the uniform multiplicity theorem. This is Hedlund's case of total (n-1)-separation. This is also related to mergibility - eventually sequences forget how they began, but only if those which could extend indefinitely in either direction have been excluded [Nasu's theorem 5 (2)].
That leaves transients which have nowhere else to go (or come from) except the diagonal, which [theorem 5 (1)] recognizes as definitivity. Similar distinctions, with variant vocabularies, are to be found in other articles dealing with symbolic dynamics.
Besides the approach from Symbolic Dynamics, much of what is nevessary to understand one dimensional cellular automata can be found in John Conway's book, "Regular Algebra," particularly if it is supplemented by a paper of Backhouse and Carr'e dealing with procedures for solving systems of symbolic equations. In particular, Conway's factors describe how sequences can begin or end, this relates to dieals in the regular algebra, and to the ideas of definitiveness and mergibility. These associations need to be explored with greater care. The book is
J. H. Conway Regular Algebra and Finite Machines Chapman and Hall, Ltd. (1971). ISBN 412 10620 5
the reference to Backhouse and Carré is
R. C. Backhouse and B. A. Carr\'e 'Regular Algebra Applied to Path-finding Problems' Journal of the Institute for Mathematics and its Applications 15 161-186 (1975).
Finally, we need to comment on the relationship of Shifts of Finite Type to Sofic Systems. Finite Type makes the structure of a subshift depend on the characteristics of one single matrix, whereas either several matrices (one for each label), or a system of labelling, is necessary to discuss Flows in a Labelled Graph. With Finite Type two birds are killed with one stone by insisting that the vertices ARE the labels (or at least wounded, through the byzantine device of using an extended alphabet). Sofic Systems resolve the problem by passing to the dual graph (in effect, putting primes on repeated vertices), but that adds an inconvenient extra endomprphism to the discussion. Whatever the reason, the literature fails to clarify this point.