Defeasible Inference -- Inheritance
|
Recall that formal logic uses a single
uniform "language" within which to represent knowledge.
A representation that has this property is often referred to
as a domain independent representation. One of the advantages
of a domain independent representation language is that the procedures
that manipulate the expressions in the language can be written
so that they are domain independent. Deduction in first order
logic is the same procedure whether we are reasoning about knowledge
about cooking or knowledge about Euclidean geometry. But, it
turns out that the algorithms that implement the deductive procedures
of first order logic are generally not very efficient. As the
number of facts that must be considered increases, the number
of possible inferences increase very rapidly. Consequently, the
time and space required for the computation of a deduction can
become very large.
It
seems to generally be true that we can design a domain dependent
representation language that is more efficient in reasoning about
certain aspects of the domain than a domain independent representation
language. What is lost, of course, is generality and in the case
of logic we usually also lose the guarantee that if a proof for
some statement exists then it will be found. (Newell reflected
on this tradeoff between what he referred to as weak methods
and strong methods back in the 60's. Weak methods are very general
but usually rather inefficient; whereas strong methods are usually
quite efficient but are limited in generality because of their
domain dependence. We will see this distinction again when we
look at the question of what constitutes expertise.)
One
suggestion to account for human "deductive-like" reasoning
is that we utilize not a domain independent representation language
but domain dependent languages. And, AI researchers have developed
domain dependent reasoning strategies and carefully investigated
the advantages and disadvantages of their use.
|
| You may recall the figure on the right
from our discussion of search. In this case, I adopted a very
simple and restricted language for use in representing the block
problem. And, it would be quite easy to write special procedures
for this language that could carry out certain types of deductively
valid inferences. For example, it would be trivial to write a
procedure to "deduce that block C is to the 'right of' block
A. Now note that I have used this as an example of a domain dependent
language. But the "data-type" of this language is that
of a string (i.e., a sequence of characters). |
 |
| In
what sense, are strings a domain dependent language for the "Blocks
domain?!" Well, what people really mean by domain dependent
is this: if the syntax of the language rather directly reflects
a property of the domain that we wish to reason about; then we
tend to refer to that representation language as domain dependent.
Here the string, a left to right structure, mirrors the left
to right structure in the domain. And, I used the '#' to delimit
stacks and the order that I wrote the stack down to mirror the
"on top of" relation. These syntactic properties
of the "data-type" can then be exploited to yield efficient,
but limited procedures that yield deductively valid inferences. |
|
Trees, Hierarchies, and
Inheritance
| One
of the earliest and most thoroughly investigated special representational
language can be thought of as a kind of graph known as a tree.
This type of graph was used to represent information about classes,
instances of the classes, and properties of the classes. You
will probably remember that we used trees to represent search.
Trees are much beloved graphs because they have some properties
that can be exploited when writing algorithms to search the trees. |
|
The figure to the right illustrates the
way in which a variety of class information, in this case information
about animals, could be organized. The organization is shown
as a tree where Animal is the root node.
Various
classes of animals are shown in blue and these classes are organized
in a hierarchy that reflects the class inclusion relations among
these classes. For example, both Birds and Mammals are subclasses
of Animal; Mammal is a superclass of Elephant; and so on. In
white, at the terminal nodes of this tree, are shown instances
or individuals that are members of the class. For example, both
jumbo and clyde are members of the class Elephant. Finally, in
yellow are shown various properties that are characteristic of
a class. For example, Birds fly; and so on.
|
 |
|
 |
Contrast
this representation with the figure to the left. This figure
contains the same information but expresses it as a set of logical
implications. Note
that more information is explicitly stated in this figure.
There would be even more except that I got lazy and simply entered
'...' to indicate that much more would have to be listed to make
it complete. Note also that on the left are some of the logical
implications that involve negation. Again, many more would have
to be listed to complete the list. Finally, although I did impose
some organization on the presentation of the information in this
figure; this organization is not required nor depended on when
we carry out deduction. The statements could be written down
in any order. Note, that this isn't true for the tree representation.
Comparing these two figures should
help you see that the tree structure is a succinct way in which
to organize this type of information. And, because it is organized
in his way, an inference procedure can take advantage of the
fact that lower entities in the tree inherit the properties of
their ancestors in the tree. For example, "jumbo is an elephant",
"jumbo is grey", "jumbo is a mammal", "jumbo
bears live young", and so on.
|
|
| The picture to the right marks in red
the nodes along the path that must be identified in this structure
in order to determine that "jumbo is grey." The procedure
simply searches for jumbo in the terminal elements of the tree.
If jumbo is not found then jumbo would be sought in the set of
parents of each of these terminal elements. In this case, jumbo
is a terminal element. Consequently, the procedure focuses only
on this element and begins tracing the paths that move upward
from jumbo. The first node encountered is Elephant which does
not equal "grey", so the procedure continues to the
next connected nodes which are {Mammal, Grey}. Consequently,
at this point the procedure has derived from this structure and
its associated procedure that "jumbo is grey;" and
the process terminates. |
 |
| Note, that this pattern is reminiscent
of the repeated application of modus ponens. If we had tried
to derive that "jumbo is white," then the procedure
would have continued until it had reached the top of the structure.
At this point there would be no new places to look. Consequently,
the procedure has failed to derive that "jumbo is white,"
and based on this failure derives that "jumbo is not white."
This type of inference is known as Negation by Failure. |
|
|
This is really quite encouraging.
A simple data structure allows us to define a highly efficient
procedure for making inferences. But, are these inferences "deductively
valid"? The answer is ...it depends. The procedure followed
is most definitely not a deductive procedure in the technical
sense. Negation by Failure is certainly not a deductive rule
of inference. In order to do deductive inference we would have
to list all the information as illustrated in the second figure
and then employ some deductive procedure such as resolution theorem
proving.
Whether all of the inferences that a domain dependent procedure
allows us to make are deductively valid will simply depend on
the procedure. There is no way to know in advance; and it may
be quite hard to firmly answer the question even when the procedure
is known. What will generally be true is that inference procedures
that depend on the map between properties of the data structure
and properties of the domain will not provide all deductively
valid inferences. This is because the map focuses the procedure
on a subset of the deductively valid inferences. For example,
the procedure outlined above would not yield the deductively
valid statement "jumbo or clyde implies not-flies".
|
|
Defeasible Inference
 |
An entirely different issue arises with the type of knowledge
that we are considering here. Notice the nodes circled in red.
This tree representation has been used to represent that both
penguins and robins are birds, and that birds fly.
Notice that our procedure could derive that tweety is a penguin
and also that tweety flies. The figure to the left shows the
problem. In general, Birds fly and of course Penguins are Birds.
But they are an exception. They swim very nicely, but they just
don't fly.
We need some way in which to represent these facts which yields
the appropriate conclusion and blocks the inappropriate conclusions
about tweety.
|
| This turns out to be a rather serious
problem because it raises the question of the exact nature of
this type of knowledge. In standard logics either a statement
holds universally or it doesn't. But here we have knowledge that
is quite general, but not without exception. And, if you think
about it a bit, this seems to be a characteristic of much of
our knowledge But, how can we reason with knowledge that is not
universal but must be qualified with terms like 'normally p'
or 'typically p'? Or is there a simple way in which to handle
these exceptions within first order logic and/or within the graph
language of a tree structure as depicted above? |
|
Exceptions: General Axioms are often inadequate.
| Because of exceptions,
the general axiom that states that if x is a bird then x flies
leads to an inconsistency as illustrated in the axioms on the
figure to the right below. |
| An alternative is
to throw out the general axiom and explicitly list the exceptions
as illustrated above. However, other exceptions come to mind...what
if a wing is broken, what if an oil spill has covered the bird,
what if the bird is still too young to fly, ....etc. The fact
that we can come up with so many exceptional conditions, makes
us suspect that there isn't a closed set of exceptions. Perhaps,
first order axioms are simply not the way to capture this type
of knowledge. But how can exceptions be handled while retaining
the ability to represent this type knowledge? |
 |
|
Default Theory
| One suggestion for
representing and reasoning with common sense knowledge was developed
by Ray Reiter and is known as default theory. |
 |
The idea is to reason in first order
logic but to have available a set of default rules which are
used only if an inference can not be obtained within the first
order formulation. The general framework of this proposal is
depicted on the left.
Probably
the simplest place to start is with the example rule shown at
the bottom. One kind of knowledge that we seem to possess and
use is knowledge about what is typically the case. Here, we have
the "generalization" that "Typically an American
adult owns a car." Clearly this is not the same as the logical
statement, "For all x if x is an American and x is an adult,
then x owns a car."
Consequently,
such knowledge has to be used with care or we will constantly
be introducing contradictions into our beliefs. The default rule
consists of three components. First, there is the prerequisite.
Think of this as some kind of foundation or evidence for the
rule. And this foundation must be logically derived from your
beliefs. You must be able to prove that "John Doe"
is both an American and an adult to employ this rule. Then, the
second component requires that you not be able to prove that
"John Doe" does not own something which is a
car. This is the consistency test. If the prerequisite
and the consistency component have been satisfied, then the third
component of the rule, the consequent, may be asserted.
|
|
|
It turns out that only under very special circumstances can
this way of formulating this type of knowledge be used efficiently.
Notice that checking whether some belief is consistent with our
current beliefs amounts to trying to prove its negation. And,
deduction is generally an intractable problem.
Also, note that a default conclusion isn't the same as a deduction.
It may be defeated by susbsequent information and consequently
the use of Default theories results in a logic that is non-monotonic.
For example, we may run into John Doe and he may tell us that
he doesn't own a car. Rather than arguing with John by pointing
out to him that he is an American and an adult and therefore
should, by our default theory, own a car; we should probably
just throw out the statement that "John Doe owns a car"
and substitute the statement "John Doe does not own a car".
|
|