Machines/Automata

 "It's not what you are, but what you can effectively pretend to be. [e.g., if computers couldn't pretend they were some type of machine other than a finite state machine, they wouldn't even have had the market penetration of castor oil. ]" (cfs)

 Recall that one of the initial motivations of the work on formal systems was to place the idea of a proof on the simplest possible foundation. If a proof could be done "mechanically," then we could understand exactly what knowledge and capabilities were required of the agent that carries out a proof. The goal, of course, was to find the simplest "agent" possible. This work on formal systems directly leads into thinking about computation. Instead of the notion of proof, we simply generalize to the notion of mechanically carrying out a set of instructions.

In considering thinking as computation, we also need to understand the capabilities and resources that a thinking agent must possess in order to realize thought. Again, we hope that the agent can be simple....one that fits our intuitive notion of mechanistic.

Automata theory or the mathematical theory of computation is an important place to start because it provides a way to precisely characterize what it means to compute. And, once this precise characterization is available, we can investigate the inherent limitations that follow from this characterization.

Of the many different kinds of automata, we will discuss only two in this section; namely, Finite State Machines and Turing Machines. These types of machines represent end points on an ordering of machines and therefore are an appropriate choice for this introductory discussion. Before we examine these types of machines in detail, we will note some of the characteristics that these automata share. In particular, they both:

  • have a fixed finite set of instructions, or rules,
  • there is a fixed finite set of input and output symbols,
  • at any point in time, the automaton is in some distinct state, and
  • the rules specify the conditions under which the automaton transitions from its current state to a new state.

These characteristics can be better appreciated if we draw an analogy to a soda machine. Such a machine has various mechanisms that serve to count the coins put into the machine, dispense soda when appropriate (or fooled), and provide change (hopefully). These mechanisms constitute the rules that the machine possesses. The state of the machine consists of the soda stored in it, the change stored in it, how much money has been put into the coin slot, whether a selection has been made, etc. Given some particular state, such as "75 cents has been put in the coin slot, a selection has been made, ..." the effect of a rule is to change the state of the machine, e.g. "soda is dispensed, ...". One critical component missing from the description of shared characteristics itemized above is what causes a rule to be applied. We turn now to a description of a Finite State Machine to show more exactly how one class of automata operate.


Finite State Machines

A Finite State Machine (FSM) is a quintuple <Q,I,O,M,G>. Q is a finite set of states. I and O are finite sets of input and output symbols, respectively. M is the next state, or state transition, function which maps Q x I into Q. G is the output function that maps Q x I into O. To make this clear consider some point in a computation. The FSM is in some state say qk where qk is an element of Q. Upon input i where i is an element of I, the machine outputs some o = G(qk,i) and transitions to the new state ql = M(qk,i).

 We can represent an FSM several ways. Table 1 below shows a state table representation of a simple Soda Machine. The first entry in each row represents the current state of the machine. The state set Q of this machine consists of the four states {$.00, $.25, $.50, $.75}. At the top of each column is listed the four types of inputs that this machine accepts. This input set is {quarter, not-quarter, select, return}, or, in other words, the inputs are the inserting of a quarter into the machine, the inserting of some coin other than a quarter, the pressing of the soda selection button, or the pressing of the coin return lever. The entries inside the table give the next state of the machine followed by it output. The output set is {accept (coin), reject (coin), give (soda), dump (coins)}. The interpretation of an entry in the table is, for example, if the present state of the machine is $.00 (first row) and the input is quarter (first column) then the machine's new state is $.25 and its output or action is to accept the coin.
 

 

 quarter
 not-quarter

 select

 return
 $.00

 $.25 / accept

 $.00 / reject

 $.00 / reject
 $.00 / reject
 $.25

 $.50 / accept

 $.25 / reject

  $.25 / reject
 $.00 / dump
 $.50

 $.75 / accept
 $.50 / reject

 $.50 / reject
 $.00 / dump
 $.75

 $.25 / reject

 $.75 / reject

 $.00 / give
 $.00 / dump

Table 1. State Table Representation of Soda FSM

 
    In Figure 1 below, the same machine is depicted as a directed graph. Here the nodes of the graph (depicted as circles) represent distinct states and the directed arcs (depicted as arrows) between nodes represent potential transitions. Each arc is labeled with an input/output pair which denotes the input which is associated with the transition and the corresponding output that is generated.
 
Figure 1. Graphical Representation of the Soda FSM
 
 Now that we have an idea of how a Finite State Machine works, we can consider the characteristics of a finite state computation. Although it may not be apparent from the soda machine, FSMs are capable of operating on infinitely long strings of input symbols and generating infinitely long strings of output symbols. For example, consider the FSMs depicted in Figure 2. These machines are "recognizers"; they do not have outputs, rather they solve the problem of recognizing input strings. In particular, the machines depicted at the top of the figure will recognize, i.e., move into its accept state at the end of input, when its input is a string of the form consisting of n 0's followed by m 1's. In contrast to that simply realized FSM, it is not possible to create an FSM that will recognize strings that consist of n 0's followed by n 1's!
 
Figure 2. Graphical Representation of String Recognizers
 
Of course, we can write an FSM that can recognizes specific instances of such strings such as 01, 0011, and 000111. In fact, such specific FSMs are depicted in Figure 2. And, as is also shown in this figure, we can combine these machines to create a single FSM that recognizes 01, 0011, 000111. However, doing this required distinct states that demoted how many 0's had been recognized. This is necessary because a state in an FSM is just a name, it carries no additional information about how the computation reached that state. Therefore, if there are two different paths by which the computation reaches some state q and subsequent computation depends on which path to q was taken then q must be broken into two separate states. This is what was done to create the combined FSM in Figure 2. There were separate nodes that in effect denoted "seen 1 zero," "seen 2 zeros," and "seen 3 zeros." Thus, how many 1's had to be recognized is encoded by the node from which the recognition started. This counting of 0's has a limit, for there are only a finite number of states in a Finite State Machine, and therefore, it could not count to an arbitrarily large n.

 Nevertheless, the lack of an infinite number of states is not the main problem. Rather, it is the fact that the regularity of n 0's followed by n 1's cannot be parsimoniously captured or expressed within the finite state formalism, because the states are just names that can't carry additional information concerning how they were reached in the computation. If we adopt FSMs as the basis for a cognitive model (i.e., our dumb mechanism) then we best be certain that whatever regularities exhibited in human cognition are expressible in our model. Otherwise our models will not be parsimonious and we will be forced to argue that regularities of the device's I/O (human cognition) are only reflected in its I/O, not in its operation, much like our recognizer for n 0's followed by n 1's where 0 < n < 4.

One persuasive reason to expect that regularities should be reflected in the operation of an automaton, not just its I/O, has to do with how changes to the function that an automaton calculates can be expressed and realized. Imagine that we want to recognize strings for n 0's followed by n 1's followed by b, where 0 < n , 4 and b is a 0 or 1 depending on whether n is even or odd. Thus, the strings the machine recognizes are 011, 00110, and 0001111.

 
Figure 3. Graphical Representation of the String-Parity Recognizer
 
Such a machine is depicted in Figure 3. As you may have suspected, this machine has been built by modifying the combined machine depicted at the bottom of Figure 2. Notice how the addition of a parity bit has changed much of this machine's structure. Parity recognition engendered breaking out 1's recognition into separate computational paths. Thus the change, although it entails the recognition of just one more bit, required global changes in the machine's structure. In other words, the description of the function, as captured in the finite state formalism, has changed considerably. The task of realizing the change becomes complicated, because apparently simple changes in the function that is computed can entail large scale changes in the automaton. This is particularly worrisome to the development of any theory of cognition based on the finite state machine formalism. In addition, it is also problematic if we are attempting to describe a device that changes over time. A presumably human cognition changes over time --- people learn!


Turing Machines

 Now consider another kind of machine, a Turing Machine (TM). Turing machines were first developed by Alan Turing to provide a formal characterization for the informal concepts of algorithms and functions computable by an algorithm. An algorithm is intuitively defined as a finitely describable procedure that gives readily interpreted step-by-step instructions for computing some function in finite time.

As depicted in Figure 4 below, a Turing Machine consists of three components:

  • An infinite tape that is divided into distinct cells that can hold one symbol;
  • A Read/Write (R/W) Head that can read or write the contents of the tape cell directly below the head;
  • and a Control Unit that controls the Read/Write Head based on the unit's current state and the content of the tape cell underneath the R/W Head.

Like an FSM, a Turing Machine computation proceeds as a sequence of discrete steps. Again, like an FSM, at any point in the computation, the control unit can be in only one of a finite number of distinct states. Based on the contents of the tape cell under the R/W Head and the current state of the control unit, the unit will transition to one of a fixed finite set of new states and take one of three possible actions. These actions are:

  • write any one of a fixed finite set of tape symbols on the tape cell under the R/W Head;
  • move the tape one tape cell to the right;
  • move the tape one tape cell to the left.
 
Figure 4. Pictorial Representation of a Turing Machine (TM).
 

 The behavior of a Turing Machine is described by a set of rules, or quadruples, of the form < q, t, a, f > . So, for instance, if q is the present state of the control unit and t is the symbol in the tape cell under the R/W Head, then the control unit takes action a and transition to state f. Any finite set of such rules describes a Turing Machine as long as no two rules have the same q and the same t. This is known as the consistency condition. It insures that given some state of the Control Unit and symbol under the R/W Head, the Control Unit will not have alternative applicable quadruples; there will be at most one quadruple whose q matches the state and whose t matches the tape symbol. Two consequences follow from this. The machine operates deterministically, and, further, at some point in the computation, there may be no applicable rules, in which case the machine is said to have halted.

To use a TM as a computational device and thus to associate a function with its operation, we have to fix where its input and output are on the tape. Accordingly, the input is presumed to be immediately to the right of the head at the beginning of the computation and the output can be found immediately to the right of the head when, and if, the machine halts.

Now that we have described FSMs and TMs, we can ask how they compare. Clearly, at the level of their formal description, they are quite similar. In particular, the TM's Control Unit is very similar to an FSM. It can be in only one of a finite number of distinct states and there is a finite number of distinct input symbols. And based on the current state and the input (what is under the R/W Head), the unit makes a transition to a unique new state and takes one of a finite number of distinct actions, the latter being in essences the Control Unit's output symbol.

This similarity in description is also reflected in the underlying capabilities. Any function computable by an FSM is computable by a Turing Machine. But there is a major difference in that a TM's input and output are to an infinite tape, and, in particular, the device's previous output can be its subsequent input. The significance of this is that the device can not manipulate symbols in special ways. In particular, the previous history of the computation can be brought to bear on the present steps of the computation. For instance, we can write a TM that recognizes n 0s followed by n 1s for any n > 0 written on its tape. Roughly, each time the machine sees a 0, it can go over to another section of the tape and place a mark to denote that it has seen a 0. It thereby maintains a running tally. Subsequently, when 1 recognition begins, it can move the head over to the tally marks and erase one tally mark (e.g., write a blank) for each 1 that is read. Thus the machine can keep track of the number of 0s seen and insure that there are the same number of 1s. Indeed, it can do this for infinitely long strings because the tape is infinite. Moreover, it can do this for any n > 0 using the same uniform machine description, or quadruples. It's ability to do this rests on the fact that the machine has procedures to store these tally symbols, this string of 1s, and it also has the procedures that appropriately interpret the significance of this string when it comes time to perform 1s recognition. That is, the TM can store and interpret symbolic structures.


An alternative way to code this task in a Turing Machine would be to sequentially intersperse the erasure of 0s and 1s. (Click here for an illustration of this alternative)This exploits the fact that the representation of 0s and 1s on the tape can be used to encode how many there are. However, the somewhat more subtle distinction realized in this approach between representation and what a representation can encode or mean obscures certain points that we will subsequently make

 So, there are functions computable on a Turing Machine which are not computable on an FSM. Unlike an FSM's computational history which must be captured in its current state symbol, a Turing Machine's computational history includes the infinite tape's contents. This allows arbitrary dependencies between the past computational steps and the current steps.

These arbitrary dependencies have certain consequences. In particular, the current step of a Turing Machine's computation has a more complicated description. It is described by the contents of the tape, the current state of the Control Unit, and the location of the R/W Head. This information is the machine's instantaneous description (see the bottom of Figure 4)). However, this instantaneous description is somewhat problematic, because it presumes that the starting state and history of a computation requires recourse to a description of an infinite tape, hardly a parsimonious basis for effective theory formation. And again, this is a problem for the scientist who may want to use the formalism to describe a cognitive theory. And, it is also a problem for the device itself if that device is going to change, in some regular fashion, either the function it computes or how it computes. That is, it is a problem if learning is presumed to be part of the device's capabilities.



Turing Machines, Universal Turing Machine, Computability and Complexity

In this section, we step away from our search for a mechanistic account in order to all too briefly discuss two important concepts, complexity and computability.

The Turing Machine represents one formal characterization of an algorithm. Other characterizations have been proposed (e.g., Post's Productions and Church's Lambda Calculus). Yet all these characterizations have been shown to be equivalent in terms of their ability to compute the same functions. Church's Thesis goes further to state that any characterization will be equivalent. In other words, the Turing Machine has appropriately captured the intuitive notion of algorithm and thus effectively defines the class of functions computable by algorithm. This thesis can never be proven because an intuitive notion of algorithm is inherently informal and thus make a formal proof impossible. Nonetheless, the thesis is widely accepted as being true.

Recall that Turing machines have finite descriptions; they are based on finite sets of tape and state symbols from which a finite set of quadruples is defined. It follows from this that all possible TMs can be effectively enumerated. In effect, we can place them in some order or list (e.g. T0, T1, ...) and name a machine by its place or index in the order. This can be done even though there are infinite number of such machines, much as we can place the Natural numbers in a fixed order, (1, 2, 3, 4,...). Although we will not demonstrate it, it also follows from this that a universal Turing Machine (Tuniversal) exists which can compute the same function of any other Turing Machine given the other machine's index. So if GTn is the functions associated with the nth Turing Machine then GTuniversal(n,x) = Gn(x). Given the index n, a universal machine could use it to determine Tn, and then it could use Tn to compute GTn(x).

To get a rough idea of how one Turing machine, Tuniversal, might actually simulate another Turing machine, Tk, consider what information would be useful for deriving the next instantaneous description given the present instantaneous description of Tk's computation. First, there is the present instantaneous description and, secondly, there are the quadruples that will determine the next instantaneous description. Given such information, how might we discern the next instantaneous description?

  • Get Tk's current state, qcurrent, and current tape symbol under the R/W Head, tcurrent, from the present instantaneous description.
  • Look among the quadruples for one that is applicable, in effect one of the form <qcurrent, tcurrent, a, f >.
  • If you find such a quadruple then use a and f to update the instantaneous description.
  • If you don't find such a quadruple then the machine halts and the instantaneous description remains the same.

Starting with any instantaneous description of Tk, this procedure could repeatedly be applied to successive instantaneous descriptions to simulate the operation of Tk. The machine's universality, or programmability, depends on its ability to store and interpret another machine's quadruples, initial tape configuration, and the instantaneous descriptions which ensue. That is to say, the ability to simulate hinges on the ability to manipulate symbols. And based on this symbol manipulation, the above sketch of a procedure gives a finite set of readily interpreted step-by-step instructions for performing the simulation. Further, note that in the sense that this procedure obeys our intuitive notion of algorithm then there must be a Turing Machine, a Universal Turing Machine, that realizes it if we accept Church's Thesis. Nevertheless, the reader should consider more carefully what such a Universal Turing Machine would look like and how it would operate as it performed these various steps. For instance, how would it uniformly represent any Turing machine's input, output, and quadruples? How would it keep track of qcurrent and tcurrent when looking for the applicable quadruple and how would it actually test for applicability? What seems like simple steps in our algorithm, would tend to be rather involved operations on a Turing Machine, requiring many secondary operations not laid out in the four steps above. And the sequence of successive instantaneous descriptions of our universal machine as it simulated another machine would reflect these involved operations, obscuring any appearance of correspondence to the instantaneous descriptions of the machine being simulated. As a practical matter, therefore, people don't use Turing machines as a basis for describing particular algorithms or models.

But the benefit of Turing Machines does not reside in any ability to express an algorithm and its operation in an easily understood form. Rather, it is the simplicity of the formal characterization itself that is desirable, e.g., there are finite sets of states and tape symbols, there are only three actions that the Control Unit can take, etc. It is this simplicity which assures us that it is a mechanistic account and simplifies the task of showing equivalence between Turing Machines and other formal characterizations.

We can also use the Turing Machine as a yardstick to measure just how complex it is to solve a particular kind of problem. We can ask how many steps a Turing Machine will take in order to solve the problem of computing a particular function. In the case of the problem of recognizing n 0s followed by n 1s, we could ask how many steps it would take for our TM to perform the recognition. Well, it turns out that the number of steps it takes is a polynomial function of the "size or length" of the input, n (in effect, each bit i has to be scanned, and the distance to the tally mark is a linear function of i, so the actual steps for all bits is the summation of i which roughly makes the total number of steps < or = to c times n squared for some constant c.). In such a case, we say the algorithm is of polynomial time complexity where "time" is marked by the number of steps.

Now, it turns out that we can talk about the complexity of a problem regardless of the algorithm employed to solve the problem (i.e., compute the function). Moreover, many problems require more than polynomial time (e.g., n to the k power for some k > 1). There are classes of problems (which go by names such as NP-complete) for which there is no known algorithm which can in general solve those problems in polynomial time. And, it seems unlikely that there is any polynomial time algorithm for these problems, because such classes are built on the requirement that problems in the class are "reducible" or translatable to each other in such a way that having a polynomial time algorithm for one means having a polynomial time algorithm for the others in the class. Years of research has lent credence to the belief that no such algorithm exists. Indeed such problems may require exponential time (e.g. k to the n power). The difficulty here is that as n becomes larger, the time time required for such algorithms "explodes". For this reason, such problems are considered intractable.

Intuitively, the importance of this notion of intractable problems is that it means that some problems cannot be easily solved. It does not mean that solving some instance of the problem need also in intractable. For example, imagine the problem of recognizing whether some English sentence is grammatically correct. Now, this is a trivial problem for particular sentences in English. We can just remember whether a particular sentence is grammatical. And it is also trivial for a finite subset of English sentences, we can do it in a way similar to our FSM recognizer of n 0s followed by n 1 for 0 < n < 4. But for the entire English language it is not a trivial problem and a general solution to the problem may not be tractable. But here we can get into a finer grain analysis of the complexity of a particular algorithm which may perform well on some sentences but on others, the worse cases, still be intractable.

Finally, we can ask whether there are problems which are simply not computable, regardless of the time. We saw that certain functions which are computable on a TM are not computable on an FSM. That is the problem of computing the function "n 0s followed by n 1s regardless of the size of n" is not solvable on an FSM. But are there problems that are not solvable on a TM? Well, consider the problem of determining whether a TM ever halts. Specifically, consider a TM defined whose finite control is a single quadruple (q0,#,L,q0). Thus, if the R/W Head is scanning the blank symbol, #, in state q0 then the control unit will move the tape left and take on stat q0 again. Now, if the entire tape to the right of the starting position is blank, the machine will never halt. And, since there are only a finite number of tape symbols, we could define a similar quadruple to the one above for each permissible tape symbol as well.

In such simple cases, it is not hard to determine whether the machine halts. But , for more complex Turing machines it may not be so simple to determine. In general, we would like to know if there is some Turing computable function that will tell us whether any particular TM halts. In other words, could one build a TM that when given the description of any TM on its input tape determines whether that other machine halts on some input.

It turns out that the answer is no. So this halting function is one function which is not computable by a Turing Machine.


FSMs, TMs, and Computers

TMs and FSMs are in sharp contrast along the dimension of characterizing the current state of the computation. The FSM has a finite set of unique simple state names that must nevertheless capture any necessary distinctions in the computation history of the device. By virtue of the state names being a finite set, this severely restricts the dependencies that can be realized between present and past steps of the computation. TMs are, in essence, at the other extreme whereby an infinite tape allows arbitrary dependencies between any two steps in a computation to be realized. To help elucidate the nature of this difference, consider another familiar finite state machine, a digital computer. Now it may be surprising that a digital computer, whether it is some super computer or some personal computer, is an FSM. But the architecture of a computer can be decomposed into logical devices that perform various boolean and arithmetic operations on data and storage circuits (e.g., registers, cache, main memory, disk storage, etc.) that store data. Data, in the vast majority of digital computers, is represented as binary (i.e., base 2) numbers where each digit is either a 1 or a 0. Thus , the number we know as zero would be represented as 0, and a one would be represented as 1, but two is represented as 10 and three as 11, by virtue of it being a base 2 representation of numbers. However, the actual physical devices in the computer represent the digits (or bits) of base 2 arithmetic (e.g., 1 or 0) by being in one of two states, lets call these states On and Off. So the operation of a 1 being added to 0 would be realized by some physical device in the computer making a state transition from being Off to being On.

There is a fixed number of all these various storage circuits and devices so even if we considered all possible ways to assign 1 or 0 to these devices, the entire computer at any point could be in only a fixed (but typically very large) number of distinct states. And of course, this is an upper bound on the number of states, since some of these states would violate the internal logic devices, which in effect determine which states are possible. Moreover, given any one of these states, there is a fixed number of distinct states to which it can transition, determined by the logical devices, and the conditions under which it transitions to any of these new states must be a constant. For it to be otherwise would mean that the computer was not reliable.

In other words, any function computable by a computer is computable by an FSM. However, the above low level description of computers does them a disservice. Although, the functions computable by any computer belongs to the class of Finite State Machines, HOW a computer computes is more similar to a Turing Machine. Much of the computer's circuitry and memory is used to insure that the computer can be described and used at a "higher" level, for instance, by describing in a computer language a computation to be carried out. This additional memory encodes the data, and computations or program to be carried out on that data. And other parts of the computer encode procedures and logic for translating the program into a particular sequence of state changes by which the low level physical device realizes the computation. As an example, consider the following little "program" to realize our n 0s followed by n 1s string recognizer. It is written in English, but would be trivial to realize in any number of high level computer languages. (in fact, far more eloquently than it is expressed here in English.)

"Read the string one bit at a time, from left to right. As long as you have just seen zeros, increment a running tally by one for each zero. But as soon as you read a one, proceed to decrement the running tally as long as you continue to read one, the tally is not negative, and part of the string remains to be read. If the string ends with the tally equal to zero then accept the string, else reject it."

Much like out Turing Machine program, this little program expresses an algorithm for recognizing n 0s followed by n 1s regardless of the size of n. It has steps which imply procedures for generating.storing the tally and procedures which interpret it. Together these procedures place an interpretation on what the symbols, e.g. 1s and 0s that comprise the tally mean. However, this increase in expressiveness has come at a cost. Although this program expresses an algorithm for arbitrary n, when realized on any particular computer n would be bounded because eventually there would be insufficient distinct states (i.e., memory) in which to represent the running tally of zeros seen. The tally would bet too large to store. So the function that can be computed in still limited by the number of states. Yet, now we are dedicating some of the computer's devices (in particular memory space) to be used for the program, procedures, and logic that translate it into something that controls the sequence of state transitions. Fewer devices means fewer distinct states. Furthermore, state transitions themselves cannot be arbitrary because that would mean the interpretation of the program was arbitrary. Or in other words, transition between any two arbitrary states may now have to be realized by a sequence of state transitions. But remember, the number of different states our computer could be in is an absolute upper bound on the function it can compute.

Our discussion of FSM began by showing them to be relatively unstructured devices - states were just names and we could define any transition desired, that is, given any input symbol we could define (or draw in the case of the graph) a transition between any two states. But with the computer there is considerable structure imposed on the states, for within those states are stored programs and procedures for interpreting programs. In addition, transitions are no longer arbitrary. This structure reduced the number of states available to a computation from what it would have been given some raw enumeration of possible states based on the memory bits and logical devices. As in the case of the FSM recognizer of n 0's followed by n 1 for 0 < n < 4, the number of states available constrains the function that can be computed.

The tradeoff is between what functions can be computed and how functions are computed. As a consequence, there is a considerable gain in expressibility, a gain that is not to be undervalued. In our little program we have expressed the general case for a recognizer of a string of n 0s followed by n 1s. And in an actual computer language, this could be expressed even more parsimoniously. For many reasons, it will turn out the parsimonious expression is critical. For one, parsimonious expression of such a regularity may well be a methodological requirement. As we shall see when we discuss natural language, Chomsky argued for the existence of similar kinds of regularities in human language usage. Moreover, it may well be that the device under study, human cognition, is capable of generating and employing such parsimonious expressions much like our computer and uses its program to control its state transitions and thereby compute the function. For clearly, we could be given such a program and proceed to follow it steps in order to compute the function.



Turing Machine Example

 © Charles F. Schmidt

Formal Models of Computation - Table of Contents