One of the goals of any theory of the structure of an algebraic system is to decompose the system into standard combinations of standard components; in the case of vector spaces the result is to express any vector as a linear combination of a certain number of basis vectors. However, a vector space represents the ultimate in uniformity; the typical decomposition in other systems may be neither as complete nor as uniform, but there are usually some vestiges of both a basis and of coefficients.
When the system has both a multiplication and an addition, these terms
bearing their usual connotations with respect to things like the
commutative and the distributive laws, the principal object of interest
is the behaviour of products. If there is a subset, such as
typically closed with respect to addition, and an entire set
we may classify products according to which of these two sets contains
them, particularly with respect to the source of the factors. The most
notorious example of this relationship is the fact that the product of
any factor with zero results in zero; the idea is to generalize the
properties of zero.
The following table expresses the possible relationships:
Any subset which is simultaneously a left ideal and a right ideal is
simply called an ideal; the set consisting entirely of zero is the
prototypical ideal, but so also is the entire set .
In the case of regular expressions, Conway has explored these concepts directly, without making any reference either to ideals or to the general theory of semigroups from which the results might possibly be taken. In doing so he exploits a natural order which exists for regular expressions, or any system having the regular expression operators; namely
Zero is clearly a minimum element for this order; the goal is to factorize any regular expression E as a product of other regular expressions, but this may not be possible; thus one must work up to the concept in stages. The first is to define a trial factorization through the equation
and then define successively
Amongst others, the canonical form of a regular expression provides trial factorizations whose terms are letters and derivatives. By taking further derivatives, trial factorizations may be developed with arbitrary words as leading terms. The point of starting with trial factorizations is the expectation that they can be promoted to factorizations by systematically increasing each of their terms.
In doing so we depend upon the fact that sums provide the route to
upper bounds. Thus if and
,
follows
readily from the definition of inequality; as long as we do not insist
that the result necessarily be a regular expression, the same
conclusion also holds for arbitrary sums. Consequently the maximal
element less than E is simply the sum of all the elements less
than
A product is linear in each of its terms individually, so replacing G by summing the set
gives a new trial factorization dominating the first. Moreover the replacement does not spoil the maximality of any of the other terms; if a maximal term were increased, the distributive law shows that the expression which previously offended the inequality would still be present. Consequently trials can be refined one term at a time until they are finally factorizations.
Without further restriction, a factorization of E need not be unique; the trial factorizations even less so. One such restriction is that all terms but one be maximal; there is only one way the remaining term can be refined, and that must produce the term itself if it were already maximal. The concept of factor, defined as a maximal term, depends upon the environment in which it is encountered, but the environment can be reduced to a product of two terms for either a left or a right factor, or three terms for an interior factor, by using the associative law to consolidate the remaining terms.
Finally it is possible to conclude that, whatever the number of left
factors, they are paired with unique right factors; thus they can be
given a common index set reflecting this correspondence; let us call
them and
they all satisfy
each of
which is dominant. It is a temptation to combine them with each other.
Thus we further define
as the maximal solution to the trial
factorization
dominating
Several special cases come to mind. Since , there must
be an index
for which
and
likewise an index r for which
and
Finally, since
we conclude
As for the L's and R's, note that by the definition of inequality,
so that
making
a trial factorization with maximal
terms---a factorization. Consequently
and similarly
. Thus the
's can be arranged into a square
matrix, wherein the left factors form a certain row, the right factors
a certain column, and E lies at their intersection. The
's
in general behave like elementary matrices, satisfying the following
rules:
One consequence of these rules is that by extending inequality
elementwise to matrices, which implies that
Interesting as it may be, this discussion has so far been purely
theoretical, depending on maximizing through the formation of arbitrary
sums. Turning to the canonical representation and symbolic derivatives
it is possible to have more concrete results. Note that a right factor
of E is a maximal R satisfying in terms of words
belonging to L this means
The pairing between L and R is quite evident from the structure of
this formula. By building up the parallel theory of right derivatives,
a similar relationship can be created to express L in terms of
If it is found inconvenient to work with intersections, de Morgan's
rules for regular expressions can be invoked to obtain
using the notation that is the complement of X, which may
itself require some effort to obtain.