An Early Machine Learning approach to Concept Learning

Consider the set of examples shown below. These are the kind of concept instances that were used in one of the first machine learning programs developed by Patrick Winston. ( P. H. Winston, Learning Structural Descriptions from Examples. In P. H. Winston (Ed.), The Psychology of Computer Vision. New York: McGraw-Hill, 1975. Pp. 157-209).This early machine learning artifact focused research attention on the importance of the training sequence in concept learning. Below is shown a particular training sequence. When you attempted to learn the earlier concept example you may have noticed that at any point in time you entertained very few hypotheses...often only one. This, despite the fact that the number of hypotheses consistent with the training sequence is usually very large. Our learning strategy does not at all appear to be one that is similar to an exhaustive breadth first search.
   

    Winston's learner shared this characteristic with us. In fact, this learning program never entertains more than a single hypothesis at a time. But what are some of the possible consequences of following a strategy such as this? Consider the figures below which depict some of the possible relations between the Learner's hypothesis and the Teacher's hypothesis. The yellow circle depicts the space of possible concepts. The green circle depicts the space of concept instances that are consistent with the Teacher's hypothesis and the blue circle depicts the space of concept instances that are consistent with the Learner's hypothesis.

   In the first figure the Learner's hypothesis defines only a subset of the possible instances consistent with the 'true' concept definition. In this case the Learner's hypothesis must become more general in order to capture the concept that is being taught. In the middle figure the Teacher's concept defines a subset of the possible instances consistent with the Learner's Hypothesis. In this case the Learner's hypothesis is too general and it must become more specific in order to capture the concept that is being taught. And in the third figure the Teacher and Learner concept intersect. In this case both generalization and specialization are required in order to learn the concept that is being taught.

 
 

Learner's Concept defines a Subset of instances consistent with the Teacher's Concept

Learner's Concept defines a Superset of instances consistent with the Teacher's Concept

Learner's Concept defines a instances that are inconsistent with the Teacher's Concept
 

   Winston viewed the learning task as one that involved :

  • representing the current example;
  • predicting from the representation of the current hypothesis whether or not the current instance exemplifies the concept;
    • if correct then retain the current hypothesis;
    • if incorrect then
      • identify the differences between the current example and the current concept;
      • choose from this set of differences and use the chosen differences to
        • generalize the concept hypothesis if the instance was a positive instance or
        • specialize the concept hypothesis if the instance was a negative instance.

The figure below illustrates these actions for the first three examples of the training sequence show above. The first example is a positive instance of the concept of an 'arch' and the next two are negative instances. A positive instance is always presented first in the sequence to enable the learner to establish a positive hypothesis. The yellow square below the each of the examples contains a a description of the example in the representation language of the learner. These are illustrated as a graph in a form that was referred to as a semantic net. The gray squares at the bottom provide the representation of the learner's hypothesis after the example had been processed.  The semantic net representation can also be expressed in the syntax of a predicate logic. However, viewing these representations in this graphical form may help you to mentally imagine matching the network representing the hypothesis against the network representing the example.

 

   Consider the first example above. The semantic net describes this positive example as consisting of three objects (B1, B2, and B3) which are all blocks. B1 and B2 support B3. The orientation of B1 and B2 is standing and B3's orientation if lying. Note this representation does not encode size, exact location, and color. If any of these features were a defining characteristic of the concept 'arch,' then this learner would never attain a completely accurate definition of the concept. Winston refers to the representations shown in the gray boxes as the learner's model of the concept. I have referred to this as the learner's hypothesis concerning the definition of the concept. I will use the terms interchangeably here. With only a slight change, the description of the example becomes the learner's initial model of the concept. The changes will always be shown in green so that you can easily identify them. Here the change is to replace the names of the specific blocks, B1, B2, and B3; with variables, X, Y and Z. This is an example of a generalization. The learner has assumed that objects other than these three blocks can be used to construct an instance of an 'arch.' This seems quite innocuous, but notice other possible generalizations were not made. For example, the constants 'LYING' and 'STANDING' could also have been replace with variables. This would have resulted in an overgeneralization since the orientation does matter.

   Consider the second example. This is a negative example. If we match the current model ­ the first network shown in the lower left ­ against the description of the current example ­ the second network shown in the middle; we find two mismatches. The model contains two 'SUPPORT' relations whereas the current example contains no 'SUPPORT' relation. However, the learner does not know whether the presence of these relations is crucial to the definition of the concept. Consequently, the learner would guess ­ incorrectly ­ that the example was a positive example of an arch. On learning that this guess was wrong, the learner needs to revise the model. The constraint on the revision is that the new model must predict that the first example is positive and that the second example is negative. The revision shown involves adding to the model that the 'SUPPORT' relation must be part of the concept definition ­ it is elevated from a characteristic of one exemplar to a defining characteristic of the concept. Consequently, this model will now correctly reject the second example as an example of an arch.

   The negative example has served to cause the learner to specialize the model. The third example, which is also negative, will lead to further specialization of the model. Again, the model predicts that the example is positive since nothing in the model precludes he supporting blocks from touching. The example is of course a negative example and the only aspect of its description that doesn't match the model is the 'TOUCHES' relation. Consequently, the revision involves adding to the model the requirement that the supporting blocks 'MUST NOT TOUCH.'

   Notice that whenever the learner mistakenly identifies an example, the revision is selected from the set of mismatches obtained when the model is compared with the description of the current example. Winston's program classified the types of mismatches that could occur and based on this classification proposed an appropriate revision. However, if there are many mismatches, then the learner can be led astray. This is because the learner has no way of knowing whether all, some or only one of the mismatches led to the erroneous classification. In this case there may be several, perhaps many (the power set of the number of mismatches - 1), possible models that could be adopted. But the learner has been constrained to have only one model at any point in time. If the wrong model is chosen, the concept may never be learned.

   For this reason, the training sequence can be the factor that decides whether or not the learner acquired the concept. Specifically, Winston argues that the ideal training sequence is one where each example is a 'near miss.'  A near miss is a training example that is only minimally different from the learner's current model. Ideally the training example should differ in only one aspect from the learner's current model. In this case, only one revision is possible and therefore the learner can not make the wrong choice. If the entire training sequence consists of 'near misses' then the teacher can systematically guide the learner toward the correct model of the concept. Of course, this presupposes that the teacher have knowledge of the learner's current model at each stage in the learning. Reading learners' mind is not a skill that every teacher masters. Nonetheless, Winston's learning artifact has clarified for use when and why the sequence of training examples can play such a crucial role in learning.

   This type of learning model is an example of what is termed supervised learning. There is a teacher that bears the responsibility for supervising the learning that occurs. But what happens when learning is unsupervised? Is learning still possible? Are there ways to alter the learning strategy to yield a learning method that is less dependent on intelligent and benevolent supervision? These are some of the issues that we will explore as we examine other research on machine and human learning.


Learning - Table of Contents

 © Charles F. Schmidt