An "FSM" that "Learns"

     The figure below illustrates two machines. These machines are FSM-like machines but they have been modified to allow them to output an FSM as result of processing a set of inputs. They were created to illustrate some of the basic issues that are raised when learning is considered. The machine in the light gray box is referred to as the Basic "Learning FSM" (B_LFSM) and the machine that also includes the darker colored box is referred to as the Extended "Learning FSM" (E_LFSM).
   
     The machines that are learned are restricted to finite state machines. But, the simple FSM learning machines are of course not finite state machines. What has been done is to take the idea of a FSM and to minimally increase the machine's capabilities to allow it to create an FSM from sets of inputs. The extended capabilities involve the ability to systematically write a limited set of symbols to a memory area and to create unique names. The B_LFSM is not able to look at, remember, or use any of the rules that have already been learned. The E_LFSM is given a limited ability to use rules that have already been written.

     For those who wish to examine in detail the operation of the Basic "Learning FSM," (B_LFSM) we will trace the steps involved in learning an FSM that recognizes the string ab. This B_LFSM consists of five states, {start, p1, p2, p3, halt}; and six rules that are explicitly shown. The table below provides an annotation of the trace of (B_LFSM) as it processes this string.

     In this table the first column indicates the step in the process. The second column shows the name of the current state of the machine together with the input string. The current state of the machine is shown to the immediate left of the element of the input string that is currently under the read head of the (B_LFSM). The third column indicates the parts of the rules that have been written up to and through the current step in the process. The portion that was written as a result of the execution of the current rule is shown in red.

     The machine begins in the state, start. This step is shown in row 1 of the Table The arrow from this state to the state, p1, is labeled with the Greek letter epsilon and the "action," q0. The epsilon is the standard way in which to indicate a transition with no input read. In this case the read head of the machine does not advance. (Epsilon is also used to indicate when no action accompanies the transition. This is not standard notation but should cause no confusion in this context.)

     The q0 is used to denote the action of writing the symbol q0 to a memory location which will hold the FSM that is being learned. This is shown in column 3 in the table.

     At step 2 the machine is in state p1.There are many rules that are applicable from state p1. There is a rule for each element of the input alphabet. Since each of these rules simply reads the current tape symbol under the read head and then writes this symbol in the memory location together with the action instruction R, one can think of this as a single simple function. The m( ) notation is intended to convey this idea These rules each result in a transition from state p1 to state p2. As can be seen in the table this second step results in a R being written in the column labeled Current Learned Rules.

Trace of "Learning" an FSM from the Input ab#

 Step

 Current String

Current "Learned Rules"

 1
 ...{start}ab#...  q0

 2
 ...{p1}ab#...  q0 a R

 3
 ...a{p2}b#...  q0 a R q1
q1

 4
 ...a{p3}b#...  

 5
 ...a{p1}b#...  q0 a R q1
q1 b R

 6
 ...ab{p2}#...  q0 a R q1
q1 b R q2
q2

 7
 ..ab{p3}#...  

 8
 ..ab{p1}#...  q0 a R q1
q1 b R q2
q2 # R qhalt

 9
 ...ab{halt}#...  

    Step 3 involves the rule that transitions the B_LFSM from state p2 to state p3. This rule involves an epsilon move meaning that the read head is not advanced. The action taken is to generate a new and unique state name which is then used to complete the current rule as well as begin the next rule. In this particular case the state name is q1.

      Step 4 simply transitions the machine back to p1, thus completing the basic rule writing loop. No actions are taken on execution of this rule. Steps 5 through 7 mirror steps 2 through 4 and result in the second rule of the FSM being written as shown in row 6 and column 3 of the table.

      At step 8 the # symbol (indicating a blank and thus the end of the string) is under the read head. This results in a rule being written that halts the machine. The FSM that recognizes the string ab# is complete. It consists of the three rules shown in column 3 of row 8 of the table.

     A Markov diagram of this machine is shown in the picture on the right. The machine has essentially memorized the input string and would recognize that string were it to be presented again.

    Note that this learning procedure can be used for any input string as long as the string is finite in length. But what if after encountering and remembering the string ab, the string aabb was encountered? Should a new FSM be learned or should there be a connection between the machine above and the new machine that is created to learn the string aabb ?

     This is a difficult question. One for which I know of no adequate answer. Our intuition seems to suggest that since aabb is highly similar to ab that it makes sense to capture both strings with the same FSM. But what about abb or aab or even ba ? It isn't clear where to draw the line. The typical solution in research on human as well as machine learning is to require the experimenter or teacher to draw this line by explicitly informing the learner whether an example is or is not part of a concept that the learner is to acquire.
     We will follow that practice here and consider the machine that would result when the B_LFSM is used to acquire an FSM that recognizes the strings {ab, aabb, aaabbb}. The animated figure to the right illustrates the result. The first figure in the sequence illustrates only the start state, q0, and the final state, qhalt. These are illustrated separately to call attention to the fact that the use of the same start and final states across a set of examples forces these examples to be recognized by a single FSM.
     The animation then sequences through the addition to the states required to recognize the string ab, the string aabb, and the string aabb. Notice that the FSM that is learned is nondeterministic. When the first a in a string is encountered, there are three rules that are applicable and no basis for choosing between the three.
     A deterministic FSM that recognizes the same set of strings as a nondeterministic FSM can always be constructed. If we consider the transition function to return a state from the power set of states, then the deterministic machine shown on the right can be constructed. Note that in the final machine above <q0,a> allows the transition to q1, q3, and q7. In the deterministic machine shown on the right these states are represented by the single state q137. And, state q48 represents states q4 and q8 in the original machine. However, in order to accomplish the construction of this deterministic machine, a global view of the machine is required. Can a simple modification to the B_LFSM be defined that will address this issue?  
     The Extended "Learning FSM" (E_LFSM) that is illustrated in the figure at the beginning of the page is a simple modification of the B_LSFM that will learn an FSM that is isomorphic to the deterministic FSM shown above. The FSM that is learned is shown below.

     The E_LFSM machine differs from the B_LSFM in only one respect. At the start of the machine the E_LFSM determines whether a rule has already been learned that applies to the current state of the machine. If so this rule is used. This procedure is followed until if fails; that is, no rule has been learned that applies to the machine's current state. On failure control is passed back to the B_LFSM and the learning proceeds as before until the end of the string is encountered.

     Notice that this extension does not require the "Learning FSM" to have the capability of examining the current state of its learning. The learning decisions are still quite local in character.

 
     But, this does not result in the acquisition of the minimal FSM that recognizes these three strings. Such a minimal FSM is shown to the right. (Clicking on this figure will take you to a page which animates the actions of this FSM.) There is an algorithm that will always find the minimal FSM that corresponds to a deterministic FSM, but this algorithm involves examining the FSM. This algorithm is discussed and illustrated further on the next page.  
     The algorithm for finding the minimal FSM for this example will in fact identify the machine above which has eight control states as the FSM that has the minimal states. The next page illustrates the algorithm by tracing the construction of this FSM from the machine above that was learned by the E_LFSM


Finding the Minimal FSM

 © Charles F. Schmidt

Formal Models of Computation - Table of Contents