The main points that will be covered in these subsections are

**(1) Extending the LP view of mode finding that we saw for max-product on tree graphs to reweighted-max-product.**

**(2) Examples of first order LP relaxations.**

**(3) Higher order LP approximations and how they can be more accurate.**

**(1) Extending the LP view of mode finding that we saw for max-product on tree graphs to reweighted-max-product.**

The tree-reweighted Bethe variational problem uses the same polytope that defines the first order LP in the first half of this section. Thus, naturally, the tree-reweighted Bethe variational problem (and hence the reweighted max-product) has the same connection to first order LP as the regular Bethe problem (or regular max-product).

, where is the fixed point in reweighted max- product algorithm, can be the solution to the LP in the reweighted case. This is true when equations 8.23 are satisfied by the *pseudo-max-marginals.*

**(2) Examples of first order LP relaxations.**

Section 8.4.4 discusses examples where an LP problem is formulated from a graph problem. Example 8.3 shows how the Error-control coding problem may be thought of as an LP problem.

Steps in the process:

(a) Convert the error-control coding problem’s graph to a pairwise markov random field. Create auxillary variables to handle any interactions that have more than two variables. Appendix E.3 should help in understanding this

(b) Find out the constraints that need to be enforced in the model (Equations 8.24, which may be rewritten as 8.25)

(c) Solve the LP problem (Feldman)

Example 8.4 shows the same for the (a) Ising model, (b) (Maximum Weight) Independent Set problem, (c) (Minimum Weight) Vertex Cover problem, (d) Max-cut problem. These are all examples of intractable problems, but a first oder LP approximate solution is possible. Example 8.5 is another classic graph problem, Maximum Weight Matching problem, solved using an LP. Equation 8.29 is the exact problem, Equation 8.30 is the LP version.

Other important points in this subsection

(a) First order LP interger program with binary variables has strong consistency property. This is not true for higher order variables.

(b) MWIS, Maxcut are submodular maximization problems, and MWVC is a supermodular minimization problem. Thus, all three are not *regular binary QPs. *They are intractable in general. LP approximate solutions are hence needed for all three.

**(3) Higher order LP approximations and how they can be more accurate.**

Section 8.5 discusses higher order LP relaxations by using the marginal polytopes defined on hypergraphs. So far, we used the polytopes as defined by the pairwise interactions in the graph. By considering polytopes defined by the hypergraphs instead, we can enforce more constraints. This means that the relaxations in the LP will be of higher order.

Higher order relaxation LPs are more accurate than (or atleast as accurate as) the lower order relaxation LPs for the same graphical model. This is because the higher order relaxation constraints are in addition to the lower order ones.

Equation 8.34 is significant. .

If is the tree-width of the hypergraph for the graphical model, then the LP solution is the same as the exact solution on the original polytope .

Finally, example 8.6 shows how the second order relaxation adds additional constraints that makes the solution “tighter” than a first order relaxation. This is done by solving the same problem solved earlier in Example 8.3 by two methods:

(a) binary MRF, which is a first order relaxation LP

(b) hypergraph MRF, which is a second order relaxation LP

Comparing the constraints, we see that has no constraints in (a). 8.38 applied in case of (b) enforces the contraint on .

Considering one solution , we see this is valid in (a), but not valid in (b). Thus, adding higher order constraints results in more accurate solutions.

*Lifting and Projecting*

The introduction of additional parameters in order to achieve higher order relaxations is called *lifting*. The lifted polytope can be projected back to lower space by *projecting*. Example 8.7 illustrates this. Lifting by introducing results in inequalities 8.41. These can be projected back to the original pairwise parameters by using the simple Fourier-Motzkin elimination method, resulting in inequalities 8.41. Some of these inequalites correspond to the *cycle inequalities* or *triangle inequalities*. This was alluded to earlier in Example 4.1)

There was some question during the meeting about the meaning and importance of supermodular and submodular functions.

The wikipedia page has more information on this – http://en.wikipedia.org/wiki/Supermodular_function – as well as this introduction – http://www.ima.umn.edu/optimization/seminar/queyranne.pdf

It seems like supermodularity is important in the same way that convexity is important – it comes up frequently in real problems and has a structure that lends itself to theoretical results and guarantees.

There seem to be several interpretations of supermodularity. The most general seems to be the first definition given on the wikipedia page. In addition, if the function is smooth, supermodularity becomes equivalent to the condition for all . This in turn has an interpretation in economics in terms of whether an increase in one person’s choice of leads to an increase in the marginal payoff for other people.

In addition to smooth functions, supermodularity can be defined on lattices and on sets, for which in the latter case rather than component-wise max and min you would get union and intersection. In addition, for finite sets, supermodularity can be re-written in the form given at the bottom of the wikipedia page, which lends itself to an interpretation of diminishing returns for submodular functions.

Lastly, this paper, Learning Associative Markov Networks, B. Taskar, V. Chatalbashev and D. Koller. – http://www.seas.upenn.edu/~taskar/pubs/mmamn.ps – is an example of a lot of the ideas we’ve been talking about, including associated (supermodular) potentials, linear programming formulation, solving the dual problem, etc, as well as other things we haven’t directly covered in the reading, like maximum margin learning and kernels.

Comment by Gary — August 16, 2009 @ 2:34 pm |