(Aug 06 3-5pm) Meeting Summary: Wainwright / Jordan, ch 8, p195-213

August 6, 2009

(Aug 06 3-5pm) Meeting Summary: Wainwright / Jordan, ch 8, p195-213

Filed under: Uncategorized — umassliving @ 5:47 pm
Tags: meeting summary, Spring + Summer 2009

Discussed the variational formulation of mode computation in theorem 8.1 and the zero-temperature limit of the Bethe variational principle. The resulting IP can be transformed to a LP over $\mathbb{L}(G)$ (more precisely, its dual). Gary provided a description of the dual of a LP.

Covered the max-product algorithm for Gaussians and the process of converting the standard Gaussian in exponential form to a pairwise MRF. Gary pointed out the type in 8.15 (see pre-meeting overview comments).

Covered LP relaxation and how it provides an approximation to the exact LP. Discussed the relationship between the extreme points in $\mathbb{M}(G)$ and $\mathbb{L}(G)$ and the definition of weakly and strongly persistent solutions (Gary won the debate, after Manju sided with him).

Covered the main point of section 8.4.2, that max-product does not solve the relaxed LP in the case of graphs with cycles (contrary to the analogous result for sum-product).

Comments (2)

2 Comments »

I’ve been thinking a little more about the question of number of extreme points versus number of half-space constraints. I showed an example of how removing a half-space constraint in 2D can lead to an additional extreme point, but my example relied on half-space constraints that weren’t needed in the original problem but were required in the relaxed problem to maintain two of the extreme points after removing one constraint.

Carl provided an example in 3D where all the half-space constraints were necessary in the original problem, but removing the bottom half-space constraint led to an additional extreme point below the original polyhedron. To recap his example, first image a standard pyramid. Now add three additional half-space constraints that each chop off one of the three bottom corners of the pyramid. Construct these new half-space constraints such that they intersect the bottom face of the original pyramid to form a triangle. This is the original polyhedron. Now form a relaxed space by removing the bottom face of the original pyramid. The three half-space constraints that each chopped up a corner of the pyramid now meet at an extreme point below the original pyramid. The key is that the three extreme points on the bottom face of the pyramid that formed a triangle still are extreme points of the relaxed polyhedron.

I think this is the key to understand what’s going on. In $d$ -dimensions, $d$ linearly independent constraints are needed to define an extreme point (think of inverting a matrix of constraints, the requirement is that the matrix be full rank). So in 2D, you need two non-degenerate lines, and in 3D you need three non-degenerate planes. What happened in the pyramid example was that the three extreme points forming a triangle on the bottom plane of the original pyramid were actually defined by four different constraints that happened to intersect the same point, i.e. one of the constraints was a linear combination of the other three. To see this, take any one of those three points. It clearly lies on the bottom face of the original pyramid, it also clearly lies on one of the side faces of the original pyramid. In addition to that, it lies on two of the three new constraints that chopped off the corners. That’s why removing the bottom face didn’t cause any of those extreme points to disappear – there were still three active constraints and hence they were still extreme points.

In 2D, you can’t quite get the same effect. If you have three half-planes that intersect at the same point in 2D, then one of them is unnecessary since the polygon is just defined by the other two. That obviously isn’t the case in 3D, a simple example being a pyramid with a square base. The four sides meet at the same point, but each of them plays an active role in defining the pyramid.

The basic intuition is then, that with a set of half-space constraints in $d$ -dimensions, if you take any $d$ of them, they will potentially define an extreme point. It will not be an extreme point if the constraints are not linearly independent, and it will not define a point on the polytope if the point falls outside the feasible set because of another constraint, say constraint $c$ . In the latter case, if you remove constraint $c$ , then it will be an extreme point that defines a vertex of the polytope, so you will have added a new extreme point. In order to not lose extreme points by the removal of $c$ , you need that, for every extreme point that $c$ participated in, there were actually more constraints than necessary, so those extreme points don’t disappear.

If you were to just define half-space constraints uniformly at random, then it would be extremely unlikely for $d$ +1 of them to intersect at the same point. (I guess the event space in which that would happen would have measure zero?) I guess though, that for the marginal polytope, these constraints are very much not random, and that’s why you can get the behavior that by removing constraints, you add extreme points without losing extreme points.

Comment by Gary — August 8, 2009 @ 4:11 pm | Reply
I made the following comment in the pre-meeting notes for this section:

One thing that I think is worth doing is taking the second form of the dual function $\mathcal{Q}$ in equation (8.9), and verifying that it matches the form of the dual function given in Proposition 8.2.

If you do that, you’ll find that one set of the Lagrange multipliers $\lambda_{ts}$ are missing. Why?

The important thing to note is that, in the proof, the first thing that they do is to add directions to the edges to form a directed tree after choosing some root arbitrarily, and in the new decomposition of the inner product $\langle \theta, \mu \rangle$ , there no longer are any mean parameters for nodes $\mu_s$ except for the root note. Instead, the mean parameters for all the other nodes are just defined implicitly as $\mu_s(x_s) = \sum_{x_t} \mu_{st}(x_s, x_t)$ . Therefore, the constraint associated with $\lambda_{ts}$ is automatically satisfied and no longer necessary.

Comment by Gary — August 8, 2009 @ 4:19 pm | Reply

RSS feed for comments on this post. TrackBack URI

	Gary on (Sep 25 12-2pm) Pre-meeting Ov…
	Erik on (Sep 25 12-2pm) Pre-meeting Ov…
	Erik on (Sep 25 12-2pm) Pre-meeting Ov…
	David on (Sep 18 12-2pm) Pre-meeting Ov…
	Erik on (Sep 18 12-2pm) Pre-meeting Ov…

UMass LIVING

August 6, 2009