Partitioning the World into Objects - Using Physical Constraints

     The middle drawing below consists of about 80 lines and 73 vertices (points where lines join). The vertices are indicated in the drawing on the right with a blue circle. In the middle drawing no attempt has been made to make it appear three-dimensional. It is a flat, two-dimensional collection of lines that intersect. Nonetheless, you have probably organized this jumble of lines and vertices in a way that is consistent with the view that a collection of three-dimensional objects are depicted in this drawing. The drawing on the left includes the lines that might complete the various objects. The grouping of the lines and vertices of the middle figure into a set of objects results in a partitioning of the set of lines. A partition is a grouping of a set of things into subsets such that no element occurs in more than one subset and each element occurs in one of the subsets. How might this partitioning be accomplished? The set of possible partitionings is enormous-even for this trivial example.

 Completed "Objects"

 Basic Lines and Vertices

 Vertices Explicitly Depicted

     Of course, we don't exactly know how the human mind accomplishes this task. But, we can study a simplification of this task and thereby:

  • obtain a clear understanding of the nature of this problem; and
  • demonstrate the way in which assumptions about the world can simplify what is otherwise an intractable problem.

     The work we will discuss is based on research carried out by David Waltz. His work was more extensive than what we will present here. We will limit our discussion of his work to scenes where no more that three lines come together at any vertex and we will ignore shadows and cracks. Waltz actually considered a more general case, but the story is more simply told if we limit ourselves to scenes that only involve trihedral vertices.

     We take two rectangles as shown in the figure below on the left and join them to obtain the set of lines and vertices shown in the figure on the right. This figure will serve as our simple example object for use in explaining this work.

   

 Two Rectangle Composed

 

 Example Figure

     

 Example Figure - Vertices Only

 

 Example Figure - Lines Only

     The next figure above on the left shows the example figure without the lines and only the vertices. Note that the three lighter colored vertices involve two lines and the rest involve three...these are trihedral vertices. Notice that even though these vertices are not identical, it appears that there are only a few types represented here.

     The next figure above on the right, shows the example figure as only the set of lines that connect the vertices. The vertices themselves are deleted. Now it turns out that for the simple case that we are considering, the lines must be either boundary lines or interior lines. For a boundary line, the object lies on one side of the line but not on the other. If we think of walking clockwise around the object on its boundary lines, then we can distinguish two types of boundary lines based on the direction that we are moving. For an interior line, the object lies on both sides. The lighter colored lines in this figure will turn out to be interior lines. We can also distinguish two types of interior lines; those that are convex and those that are concave. The middle white line above is an example of a concave line and the remaining interior lines are convex.

     Now, pick up some square or rectangular object, move it around and rotate it while focusing on a particular corner. You will notice that the vertex changes and may even disappear as you rotate the object. There is a systematic way (using a Euclidean three-space and moving the observer relative to this space) to exhaustively identify the types of vertices that can occur in the world we are considering.

     The figure below illustrates all of the possibilities. Notice that there are only 4 types of junctions or vertices. This does not mean that they are physically the same. For example, the size of the angle of an L vertex can vary enormously. But, for purposes of interpreting scenes composed of objects it is important to see these as equivalent.

 

 

     Notice that I have labeled this image as depicting the physically possible vertices; and there are only 18 of them! What are the logical possibilities? The L vertex has two lines, and each line could be labeled in one of 4 ways. Thus, there are 16 logically possible L vertices. The other 3 types of vertices each have 3 lines. Again, each could be labeled in one of 4 ways. Thus, each of these vertices admits 64 possibilities. Then, the total logical set of possibilities is 16 + (3 x 64) or 208. Thus, by building in knowledge about what is physically possible...by making the so-called "rigid body assumption" we can drastically reduce the possible labelings that must be considered.

     Now, recall the obvious fact that any line connects two vertices. And, of course, a line can only be labeled in one way. Thus, a decision about how to label one particular vertex affects the other vertex which shares the line with that particular vertex. When making one decision influences another decision, we say that there is a dependency between the decisions or that one decision constrains the other. Sometimes, as in the traffic light example you worked out, the constraint is so strong that knowing one decision completely determines another. Sometimes it is weaker, and knowing one decision precludes others, but does not uniquely determine the other.

 

 
     The figure above shows our example object with each vertex identified as to its type. Below the object is a vertex by vertex table where a cell entry is 1 if the two vertices share a line. I have also referred to this as a communication matrix. This term is used because in communication if I can communicate something to you, then it may be that you will communicate my message to someone else. If you do, then I have indirectly communicated with that person. Things are analogous here. For example, referring to the matrix and picture above, notice that L1 shares a line with A1 but not with L2. But A1 shares a line with L2. Consequently, a decision about how to label L1 will affect L2 even though there is not a direct connection. The algorithm that works out all of these effects is called a constraint propagation algorithm. The animation below will give you some feel for how it proceeds. Note I have not attempted to be completely accurate to the algorithm in this animation, but simply to give you an intuitive feel for how it works. You will notice that on some steps the algorithm will fill in the labels in blue and on other steps it will fill in the labels in green. These latter steps, those that involve green, represent the case where the label is simply propagated along a line from one vertex to the next.

 

 

     The point at which the algorithm begins is arbitrary. In this example figure, the algorithm would arrive at a unique labeling for each line in the figure. This is what typically happens, and it can happen quite rapidly because of the small number of possible labelings for any vertex and the way in which the labelings constrain each other because of the rigid body assumption mentioned earlier. Intuitively, what this assumption says is that if the world is made up of rigid objects, then the shape of one object can't really depend on the shape of another object...there are no constraints to propagate beyond object boundaries. Note this is not the case with non-rigid objects such as flags or pools of water. Toss a rock into a pool of water, and the effect is propagated indefinitely.

     We know from the study of dependency networks and the analysis of algorithms for propagation of dependency that generally this class of problems is computationally complex (or intractable to use a more technical term)....that is, if you have a large problem then it may take a very long time to see if there is a consistent way in which to assign values to the network of dependencies.

     The algorithm works efficiently for partitioning a physical scene into objects because the rigid-body assumption holds most of the time. And, it appears that this knowledge is "built into" our way of reasoning about such scenes. Thus, this is an example where the mind is biased, but biased in a way that allows it to efficiently, and usually accurately, interpret physical scenes.

     What about all those crazy pictures at the beginning? Well, what I was attempting to do was to create pictures that were complex enough to either defeat your ability to maintain focus on a coherent set of vertices or create pictures that were ambiguous, that is, had more than one consistent labelings. The picture below depicts a "ribbon" of vertices. If you focus in the center, you will see that there are not enough constraints to force you to a single interpretation. You can see a line as convex or concave and it can shift back and forth. Your mind can give this picture more than one consistent interpretation

 

 
     There are some pictures that you can give no consistent interpretation. But your mind has a hard time determining exactly why. Many of the constraints hold "locally" but there is not a consistent global interpretation. These types of figures are often referred to as impossible figures...because they are. The impossible triangle that was used earlier is shown again below on the left. It is also shown on the right but with circles used to impose "local" views of the figure. If you look at each of these views, you will see that each is perfectly fine. It is when the constraints get "passed" to the next view that we recognize the impossibility of the figure.

     
     And, just in case this all hasn't seemed impossible enough....here are two more figures. The impossible prong on the right and an impossible nut on the left!

         

Productive Thinking...the Gestalt Emphasis

© Charles F. Schmidt