Example of Version Space Approach to Concept Learning

     We have argued that if the hypothesis space can be structured then this structure can be exploited in hypothesis revision. But you may have noted that the space of hypotheses, even for trivial examples, is very large. To explicitly represent the structure of this space, even for rather simple concepts, can involve representing millions of possible hypotheses! (For example, if we use 5 dimensions where each dimension can take on one of two values than there are 2 to the 5 or 32 possible instances. The size of this power set is 2 to the 32 or 4,294,967,296. This is a large space to explicitly represent and hold in memory.) This raises some serious doubts about the usefulness of these ideas.

     However, recall that when we looked at graph search similar concerns were raised. There we noted that if the search space has some minimal algebraic structure, then the space could be implicitly represented and searched using the general method of generate and test. A similar story can be told in this case. Whenever mathematics provides a constructive method for solving some problem in a domain we know that there exists some computational algorithm that can realize this constructive method.

     Tom Mitchell, a computer scientist, demonstrated in his dissertation research how this structure could be efficiently exploited. Essentially he developed a way in which to implicitly represent and use a structured space of hypotheses. The space was called a version space.

     The basic idea is to start an inductive learning task with two of the possible hypotheses. But they are a very special two hypotheses. They are the most general hypothesis (corresponding to the top of the lattice structures that we have discussed) and the most specific hypothesis (corresponding to the bottom of the lattice structures that we have discussed). Then, positive examples will always be consistent with the most general hypothesis, but will be inconsistent with the most specific hypothesis. Consequently, this most specific hypothesis will be made more general. A negative example will be consistent with the most specific hypothesis, but inconsistent with the most general hypothesis. Consequently, this most general hypothesis will be made more specific. Thus at any point in the training sequence the learner will maintain two hypothesis; and, the true hypothesis must lie somewhere in the area of the hypothesis space that connects these two hypothesis. If at some point the most general hypothesis is the same as the most specific hypothesis, then the learner has arrived at a unique definition of the concept. Thus, with a method of this sort the definition of a concept is usually not uniquely determined. It is only when enough training instances have been observed that this unique determination is possible.

     We won't explain the algorithm that generates the new hypothesis in response to the training instances. However, an example of the process is illustrated below. The concept that is being learned in this example is the concept of an "arch". The six training examples that are used are shown below. The first is a positive example of the concept, and, except for instance number 5, the other instances are all negative examples.

   
       In the figure below, a geometric representation of the concept instance is shown in the upper left of the figure together with an indication of whether the example is positive or negative. A representation of the example in a concept language is shown to the right. Below the dashed line is shown the most general and most specific representation of the concept that holds after the current example is considered. Note that at this point the most general hypothesis is the most general possible while the most specific is simply a representation of the current example  
 
   
     The animation below shows the way in which these two hypotheses are altered during the training sequence.
 
 
   
 
       Note that a positive example may affect the most specific hypothesis by making it more general. Conversely, a negative example may affect the most general hypothesis by making it more specific.  The "true" hypothesis is bounded by these two hypothesis. The representation of these bounds together with an appropriate revision strategy allows the space of possible hypotheses that are consistent with the current training sequence to be implicitly represented. If the most specific and most general hypothesis are equal, then the concept has been uniquely identified from the training sequence. If these cross (that is, the most general becomes more specific than the most specific), then there is no hypothesis in this concept language consistent with the training sequence.  
 

     Note, that what is required in order to realize this type of representation of a concept space is a representation language where the expressions in the concept language can be partially ordered, and where the partial order satisfies the property of a lattice.

     Think of various commonsense concepts. Is this property rare or common in the language used to describe these concepts?

 

Induction,Concepts, Uncertainty

© Charles F. Schmidt