An Early Machine Learning approach
to Concept Learning
| Consider the set
of examples shown below. These are the kind of concept instances
that were used in one of the first machine learning programs
developed by Patrick Winston. ( P. H. Winston, Learning Structural
Descriptions from Examples. In P. H. Winston (Ed.), The Psychology
of Computer Vision. New York: McGraw-Hill, 1975. Pp. 157-209).This
early machine learning artifact focused research attention on
the importance of the training sequence in concept learning.
Below is shown a particular training sequence. When you attempted
to learn the earlier concept example you may have noticed that
at any point in time you entertained very few hypotheses...often
only one. This, despite the fact that the number of hypotheses
consistent with the training sequence is usually very large.
Our learning strategy does not at all appear to be one that is
similar to an exhaustive breadth first search. |
| |
 |
|
|
Winston's
learner shared this characteristic with us. In fact, this learning
program never entertains more than a single hypothesis at a time.
But what are some of the possible consequences of following a
strategy such as this? Consider the figures below which depict
some of the possible relations between the Learner's hypothesis
and the Teacher's hypothesis. The yellow circle depicts the space
of possible concepts. The green circle depicts the space of concept
instances that are consistent with the Teacher's hypothesis and
the blue circle depicts the space of concept instances that are
consistent with the Learner's hypothesis.
In the first
figure the Learner's hypothesis defines only a subset of the
possible instances consistent with the 'true' concept definition.
In this case the Learner's hypothesis must become more general
in order to capture the concept that is being taught. In the
middle figure the Teacher's concept defines a subset of the possible
instances consistent with the Learner's Hypothesis. In this case
the Learner's hypothesis is too general and it must become more
specific in order to capture the concept that is being taught.
And in the third figure the Teacher and Learner concept intersect.
In this case both generalization and specialization are
required in order to learn the concept that is being taught.
|
|
 |
 |
 |
|
| |
Learner's Concept defines a Subset of instances
consistent with the Teacher's Concept |
Learner's Concept defines a Superset of
instances consistent with the Teacher's Concept |
Learner's Concept defines a instances that are
inconsistent with the Teacher's Concept |
|
|
|
Winston viewed
the learning task as one that involved :
- representing the current example;
- predicting from the representation of the current
hypothesis whether or not the current instance exemplifies the
concept;
- if correct then retain
the current hypothesis;
- if incorrect then
- identify the differences between the current example and the
current concept;
- choose from this set of differences and use the chosen differences to
- generalize the concept hypothesis if the instance
was a positive instance or
- specialize the concept hypothesis if the instance
was a negative instance.
The figure below illustrates
these actions for the first three examples of the training sequence
show above. The first example is a positive instance of the concept
of an 'arch' and the next two are negative instances.
A positive instance is always presented first in the sequence
to enable the learner to establish a positive hypothesis. The
yellow square below the each of the examples contains a a description
of the example in the representation language of the learner.
These are illustrated as a graph in a form that was referred
to as a semantic net. The gray squares at the bottom provide
the representation of the learner's hypothesis after the example
had been processed. The semantic net representation can
also be expressed in the syntax of a predicate logic. However,
viewing these representations in this graphical form may help
you to mentally imagine matching the network representing the
hypothesis against the network representing the example.
|
|
 |
|
|
Consider the
first example above. The semantic net describes this positive
example as consisting of three objects (B1, B2, and B3) which
are all blocks. B1 and B2 support B3. The orientation of B1 and
B2 is standing and B3's orientation if lying. Note this representation
does not encode size, exact location, and color. If any of these
features were a defining characteristic of the concept 'arch,'
then this learner would never attain a completely accurate definition
of the concept. Winston refers to the representations shown in
the gray boxes as the learner's model of the concept.
I have referred to this as the learner's hypothesis concerning
the definition of the concept. I will use the terms interchangeably
here. With only a slight change, the description of the example
becomes the learner's initial model of the concept. The
changes will always be shown in green so that you can easily
identify them. Here the change is to replace the names of the
specific blocks, B1, B2, and B3; with variables, X, Y and Z.
This is an example of a generalization. The learner has assumed
that objects other than these three blocks can be used to construct
an instance of an 'arch.' This seems quite innocuous, but notice
other possible generalizations were not made. For example, the
constants 'LYING' and 'STANDING' could also have been replace
with variables. This would have resulted in an overgeneralization
since the orientation does matter.
Consider the
second example. This is a negative example. If we match the current
model the first network shown in the lower left against
the description of the current example the second network
shown in the middle; we find two mismatches. The model contains
two 'SUPPORT' relations whereas the current example contains
no 'SUPPORT' relation. However, the learner does not know whether
the presence of these relations is crucial to the definition
of the concept. Consequently, the learner would guess incorrectly
that the example was a positive example of an arch. On
learning that this guess was wrong, the learner needs to revise
the model. The constraint on the revision is that the new model
must predict that the first example is positive and that the
second example is negative. The revision shown involves adding
to the model that the 'SUPPORT' relation must be
part of the concept definition it is elevated from a characteristic
of one exemplar to a defining characteristic of the concept.
Consequently, this model will now correctly reject the second
example as an example of an arch.
The negative
example has served to cause the learner to specialize the model.
The third example, which is also negative, will lead to further
specialization of the model. Again, the model predicts that the
example is positive since nothing in the model precludes he supporting
blocks from touching. The example is of course a negative example
and the only aspect of its description that doesn't match the
model is the 'TOUCHES' relation. Consequently, the revision involves
adding to the model the requirement that the supporting blocks
'MUST NOT TOUCH.'
|
|
Notice that
whenever the learner mistakenly identifies an example, the revision
is selected from the set of mismatches obtained when the model
is compared with the description of the current example. Winston's
program classified the types of mismatches that could occur and
based on this classification proposed an appropriate revision.
However, if there are many mismatches, then the learner can be
led astray. This is because the learner has no way of knowing
whether all, some or only one of the mismatches led to
the erroneous classification. In this case there may be several,
perhaps many (the power set of the number of mismatches - 1),
possible models that could be adopted. But the learner has been
constrained to have only one model at any point in time. If the
wrong model is chosen, the concept may never be learned.
For this reason,
the training sequence can be the factor that decides whether
or not the learner acquired the concept. Specifically, Winston
argues that the ideal training sequence is one where each example
is a 'near miss.' A near miss is a training
example that is only minimally different from the learner's current
model. Ideally the training example should differ in only one
aspect from the learner's current model. In this case, only one
revision is possible and therefore the learner can not make the
wrong choice. If the entire training sequence consists of 'near
misses' then the teacher can systematically guide the learner
toward the correct model of the concept. Of course, this presupposes
that the teacher have knowledge of the learner's current model
at each stage in the learning. Reading learners' mind is not
a skill that every teacher masters. Nonetheless, Winston's learning
artifact has clarified for use when and why the sequence of training
examples can play such a crucial role in learning.
This type of
learning model is an example of what is termed supervised
learning. There is a teacher that bears the responsibility for
supervising the learning that occurs. But what happens when learning
is unsupervised? Is learning still possible? Are there ways to
alter the learning strategy to yield a learning method that is
less dependent on intelligent and benevolent supervision? These
are some of the issues that we will explore as we examine other
research on machine and human learning.
|
|