|
We have argued
that if the hypothesis
space can be structured then this structure can be exploited
in hypothesis revision. But you may have noted that the space
of hypotheses, even for trivial examples, is very large. To explicitly
represent the structure of this space, even for rather simple
concepts, can involve representing millions of possible hypotheses!
(For example, if we use 5 dimensions where each dimension can
take on one of two values than there are 2 to the 5 or 32 possible
instances. The size of this power set is 2 to the 32 or 4,294,967,296.
This is a large space to explicitly represent and hold in memory.)
This raises some serious doubts about the usefulness of these
ideas.
However, recall
that when we looked at graph search similar concerns were raised.
There we noted that if the search space has some minimal algebraic
structure, then the space could be implicitly represented and
searched using the general method of generate
and test. A similar story can be told in this case. Whenever
mathematics provides a constructive method for solving some problem
in a domain, then we know that there exists some computational
algorithm that can realize this constructive method.
Tom Mitchell,
a computer scientist, demonstrated in his dissertation research
how this structure could be efficiently exploited. Essentially
he developed a way in which to implicitly represent and use a
structured space of hypotheses. The space was called a version
space.
The basic idea
is to start an inductive learning task with two of the possible
hypotheses. But these are a very special pair of hypotheses.
They are the most general hypothesis (corresponding to
the top of the lattice structures that we have discussed) and
the most specific hypothesis (corresponding to the bottom
of the lattice structures that we have discussed). Then, positive
examples will always be consistent with the most general hypothesis,
but will be inconsistent with the most specific hypothesis. Consequently,
this most specific hypothesis will be made more general. A negative
example will be consistent with the most specific hypothesis,
but inconsistent with the most general hypothesis. Consequently,
this most general hypothesis will be made more specific. Thus,
at any point in the training sequence the learner will maintain
two hypothesis; and, the true hypothesis must lie somewhere in
the area of the hypothesis space that connects these two hypotheses.
If at some point the most general hypothesis is the same as the
most specific hypothesis, then the learner has arrived at a unique
definition of the concept. Thus, with a method of this sort the
definition of a concept is usually not uniquely determined until
enough training instances have been observed that force the learner
to a unique determination of the correct hypothesis.
The algorithm
that generates the new hypothesis in response to the training
instances won't be explained here. However, an example of the
process is illustrated below. The concept that is being learned
in this example is again the concept of an 'arch'. The six training
examples that are used are shown below. The first is a positive
example of the concept, and, except for instance number 5, the
other instances are all negative examples.
|