|
Of
course, we don't exactly know how the human mind accomplishes
this task. But, we can study a simplification of this task and
thereby:
- obtain a clear understanding
of the nature of this problem; and
- demonstrate the way in which
assumptions about the world can simplify what is otherwise an
intractable problem.
The
work we will discuss is based on research carried out by David
Waltz. His work was more extensive than what we will present
here. We will limit our discussion of his work to scenes where
no more that three lines come together at any vertex and we will
ignore shadows and cracks. Waltz actually considered a more general
case, but the story is more simply told if we limit ourselves
to scenes that only involve trihedral vertices.
We
take two rectangles as shown in the figure below on the left
and join them to obtain the set of lines and vertices shown in
the figure on the right. This figure will serve as our simple
example object for use in explaining this work.
|
 |
|
 |
Two Rectangle Composed
|
|
Example Figure
|
|
|
|
Example Figure - Vertices Only
|
|
Example Figure - Lines Only
|
|
|
The
next figure above on the left shows the example figure without
the lines and only the vertices. Note that the three lighter
colored vertices involve two lines and the rest involve three...these
are trihedral vertices. Notice that even though these vertices
are not identical, it appears that there are only a few types
represented here.
The
next figure above on the right, shows the example figure as only
the set of lines that connect the vertices. The vertices themselves
are deleted. Now it turns out that for the simple case that we
are considering, the lines must be either boundary lines
or interior lines. For a boundary line, the object
lies on one side of the line but not on the other. If we think
of walking clockwise around the object on its boundary lines,
then we can distinguish two types of boundary lines based on
the direction that we are moving. For an interior line, the object
lies on both sides. The lighter colored lines in this figure
will turn out to be interior lines. We can also distinguish two
types of interior lines; those that are convex and those that
are concave. The middle white line above is an example of a concave
line and the remaining interior lines are convex.
Now,
pick up some square or rectangular object, move it around and
rotate it while focusing on a particular corner. You will notice
that the vertex changes and may even disappear as you rotate
the object. There is a systematic way (using a Euclidean three-space
and moving the observer relative to this space) to exhaustively
identify the types of vertices that can occur in the world we
are considering.
The
figure below illustrates all of the possibilities. Notice that
there are only 4 types of junctions or vertices. This does not
mean that they are physically the same. For example, the size
of the angle of an L vertex can vary enormously. But,
for purposes of interpreting scenes composed of objects it is
important to see these as equivalent.
|
|
|
Notice
that I have labeled this image as depicting the physically
possible vertices; and there are only 18 of them! What
are the logical possibilities? The L vertex has
two lines, and each line could be labeled in one of 4 ways. Thus,
there are 16 logically possible L vertices. The other 3 types
of vertices each have 3 lines. Again, each could be labeled in
one of 4 ways. Thus, each of these vertices admits 64 possibilities.
Then, the total logical set of possibilities is 16 + (3 x 64)
or 208. Thus, by building in knowledge about what is physically
possible...by making the so-called "rigid body assumption"
we can drastically reduce the possible labelings that must be
considered.
Now,
recall the obvious fact that any line connects two vertices.
And, of course, a line can only be labeled in one way. Thus,
a decision about how to label one particular vertex affects the
other vertex which shares the line with that particular vertex.
When making one decision influences another decision, we say
that there is a dependency between the decisions or that
one decision constrains the other. Sometimes, as in the
traffic light example you worked out, the constraint is so strong
that knowing one decision completely determines another. Sometimes
it is weaker, and knowing one decision precludes others, but
does not uniquely determine the other.
|
|
|
| The figure above shows our example object
with each vertex identified as to its type. Below the object
is a vertex by vertex table where a cell entry is 1 if the two
vertices share a line. I have also referred to this as a communication
matrix. This term is used because in communication if I can communicate
something to you, then it may be that you will communicate my
message to someone else. If you do, then I have indirectly
communicated with that person. Things are analogous here. For
example, referring to the matrix and picture above, notice that
L1 shares a line with A1 but not with L2. But A1 shares a line
with L2. Consequently, a decision about how to label L1 will
affect L2 even though there is not a direct connection. The algorithm
that works out all of these effects is called a constraint
propagation algorithm. The animation below will give you
some feel for how it proceeds. Note I have not attempted to be
completely accurate to the algorithm in this animation, but simply
to give you an intuitive feel for how it works. You will notice
that on some steps the algorithm will fill in the labels in blue
and on other steps it will fill in the labels in green. These
latter steps, those that involve green, represent the case where
the label is simply propagated along a line from one vertex to
the next. |
|
|
|
|
The
point at which the algorithm begins is arbitrary. In this example
figure, the algorithm would arrive at a unique labeling for each
line in the figure. This is what typically happens, and it can
happen quite rapidly because of the small number of possible
labelings for any vertex and the way in which the labelings constrain
each other because of the rigid body assumption mentioned earlier.
Intuitively, what this assumption says is that if the world is
made up of rigid objects, then the shape of one object can't
really depend on the shape of another object...there are no constraints
to propagate beyond object boundaries. Note this is not the case
with non-rigid objects such as flags or pools of water. Toss
a rock into a pool of water, and the effect is propagated indefinitely.
We
know from the study of dependency networks and the analysis of
algorithms for propagation of dependency that generally this
class of problems is computationally complex (or intractable
to use a more technical term)....that is, if you have a large
problem then it may take a very long time to see if there
is a consistent way in which to assign values to the network
of dependencies.
The
algorithm works efficiently for partitioning a physical scene
into objects because the rigid-body assumption holds most of
the time. And, it appears that this knowledge is "built
into" our way of reasoning about such scenes. Thus, this
is an example where the mind is biased, but biased in a way that
allows it to efficiently, and usually accurately, interpret physical
scenes.
|
| What about all those crazy pictures at
the beginning? Well, what I was attempting to do was to create
pictures that were complex enough to either defeat your ability
to maintain focus on a coherent set of vertices or create pictures
that were ambiguous, that is, had more than one consistent labelings.
The picture below depicts a "ribbon" of vertices. If
you focus in the center, you will see that there are not enough
constraints to force you to a single interpretation. You can
see a line as convex or concave and it can shift back and forth.
Your mind can give this picture more than one consistent interpretation |
|
|
| There are some pictures that you can
give no consistent interpretation. But your mind has a
hard time determining exactly why. Many of the constraints hold
"locally" but there is not a consistent global interpretation.
These types of figures are often referred to as impossible
figures...because they are. The impossible triangle that was
used earlier is shown again below on the left. It is also shown
on the right but with circles used to impose "local"
views of the figure. If you look at each of these views, you
will see that each is perfectly fine. It is when the constraints
get "passed" to the next view that we recognize the
impossibility of the figure. |
|
|
| And,
just in case this all hasn't seemed impossible enough....here
are two more figures. The impossible prong on the right and an
impossible nut on the left! |
|
|
|