This will be the main paper: What Energy Functions Can Be Minimized via Graph Cuts? Komogorov, Zabih

This additional paper may help in understanding graph cuts: Fast Approximate Energy Minimization via Graph Cuts. Boykov, Veksler, Zabih.

There is no paper set yet for this week. Some possibilities:

Variable Elimination, Constructing the Pseudo-tree, Min-fill, and Mini-bucket heuristic

A two week sequence of:

- (CVPR 2008 best paper) Beyond Sliding Windows: Object Localization by Efficient Subwindow Search, Christoph H. Lampert, Matthew B.Blaschko,Thomas Hofmann – http://www.kyb.mpg.de/publications/pdfs/pdf5070.pdf
- (ECCV 2008 best student paper) Matthew B. Blaschko and Christoph H. Lampert. Learning to Localize Objects with Structured Output Regression. – http://www.kyb.mpg.de/publications/attachments/ECCV2008-Blaschko_5247%5B0%5D.pdf

The first paper would be related to our previous reading, in that they do search with branch and bound.

Anything else on this page – https://umassliving.wordpress.com/2009/09/03/fall-2009-organization/#comments

Post your suggestions soon, with the aim of settling on a paper by mid-day Monday.

]]>Max-product can be used to compute the MPE. However, when inference can also be done using sum-product and taking the value that gives you the maximum of the computed marginal distribution. If our variables are letters in a particular word, then max-product gives us the word with highest probability, while sum-product gives us each individual letter with highest probability (marginalizing over the possible assignments to the other letters).

There was the question on the requirements of being a pseudo-tree and how to create a pseudo-tree. The key was that edges in the original graph, but not in the pseudo-tree, must be back-arcs, that is, they must connect a node to its ancestor in the tree. This is required so that it is possible to compute the relevant CPTs when traversing a single branch.

There was also the question of how AOBF differs from A*. It appears that they are the same, the only difference being that AOBF provides the framework for taking advantage of conditional independencies in the original graph.

A subquestion came up in exactly how AOBF is done. We noted that is the overestimate of the remaining probabilities to be set, and that the bottom-up propagation revises the initial overestimates . The marked arcs indicate, given the current estimates, which path is best and should be taken (although there will be ties, for instance when choosing which child to choose at an AND node), in which case a path can be chosen arbitrarily. The key is that the marked arcs may change, if later choices revise the overestimates accordingly.

More later.

]]>http://www.ics.uci.edu/%7Ecsp/r144.pdf

The motivation for reading this paper is as follows. In our work on both OCR and on Scene Text Recognition, I have been somewhat less focused on parameter setting, and somewhat more focused on finding the MAP solution, or the Most Probable Explanation (MPE) solution.

There are many ways to do this, including:

1) Loopy Belief Propagation, or Max Product; (problems: convergence not guaranteed; if it converges, still don’t know if we have the right solution)

2) Linear programming on a relaxed version of the marginal polytope, using techniques of Wainwright, or the cutting plane inequalities we just read about in Jaakkola and Sontag, currently under investigation by Gary.

3) A* search as being investigated by David and Jackie at the moment.

4) Graph cut methods. These work on binary labelings with submodular potentials, and can be extended to multi-class labelings using -expansion moves. For instance, see http://www.cs.cornell.edu/~rdz/Papers/KZ-PAMI04.pdf

and probably others.

I’m trying to learn more about the A* version, and in particular, compare it to the Jaakkola Sontag approach. I’m hoping this paper illuminates graph-search type methods of solving the MAP assignment problem.

]]>This week we’ll be covering the following paper from NIPS 2007 (best student paper) – Sontag and Jaakola, “New Outer Bounds on the Marginal Polytope“.

Aside from the paper, we will also need to deal with two organizational issues, when to have our regular meeting for the rest of fall semester, and what topics we want to cover (at least decide upon the first topic to cover). See the previous organizational post for details.

**Paper Overview:**

This paper presents a new variational algorithm for inference in discrete (binary & non-binary) MRF’s. Recall that the main difficulty in exact inference is in explicitly characterizing the marginal polytope and in exactly computing the conjugate dual. The main contribution of this paper is a new [tighter] outer-bound on the marginal polytope. For the conjugate dual, their algorithm utilizes existing approximations such as the log-determinant and tree-reweighted (TRW) approximations.

A broader goal of this paper is in highlighting an emerging connection between polyhedral combinatorics and probabilistic inference. To this end, their outer-bound for the marginal polytope is based on a cutting-plane algorithm (Algorithm 1 on page 3). A key aspect cutting-plane algorithms is in having an efficient mechanism for detecting violated constraints (Step 5 of Algorithm 1). One contribution in this paper is in using the cycle inequalities for which efficient separation algorithms are known (Section 2, page 5). A second main contribution is extending these inequalities and the separation mechanism to non-binary MRF’s (Section 4). Notice that the extension to non-pairwise MRF’s is trivial since any non-pairwise MRF can be easily converted to a pairwise one by introducing auxiliary variables (as described in Appendix E.3 of Wainwright & Jordan). The first two pages of the paper provide a nice & succinct summary of most of the points we covered in Wainwright & Jordan.

The experiments in this paper show the improvement in their inference procedure on computing marginals and MAP estimation for protein side-chain prediction.

]]>Remember, only mark red=do not have time if it is absolutely a conflict you can’t change, like a class. Otherwise mark less desirable choices with yellow=could make time if absolutely necessary.

The time for the meetings for the rest of fall semester will be Friday, 12-2pm.

To summarize the end of our last meeting, we’ve decided to search through relevant conferences (NIPS, ICML, CVPR, ICCV, ECCV, BMVC) for best papers, orals, or tutorials which we would like to cover over the fall. For individual papers, we would first spend a few meetings covering the fundamental theory necessary to understand the papers as preparation.

One possible fall back if this doesn’t work out is to cover convex optimization.

Post ideas and links in the comments.

One thing to keep in mind is that, if we feel like the min(two hours reading), two hour meeting format doesn’t leave enough time to prepare each week, we can move to a min(three hours reading), one hour meeting format, where, as Moe emphasized in the last meeting, two/three hours reading doesn’t need to mean read exclusively on your own. Discussions with others in advance of the meeting is encouraged.

]]>However, this paper showed that, using only certain SOC constraints associated with edges of the MRF, that the SOCP was equivalent to the QP, and both were dominated by the LP. We discussed the notion of dominance, and why it makes sense that, for the minimization problem, a relaxation dominates a second relaxation if the first has a **greater** minimum value than the second. One key to understanding this is to see that every relaxation uses a feasible set that is a super set of the original problem, thus the original problem has a greater minimum value than every relaxation (why?) and hence dominates every relaxation, as would be expected.

It took me a little while to prove the facts stated in the paper regarding , on page 95 below equation (46) and at the beginning of page 96. The set on page 95 can be proved by considering the different cases of positive and negative , for instance , with . The first inequality on page 96 is easy to prove (in fact I think it’s an equality). The second inequality can be proven by using the fact that the geometric average is always less than or equal to the arithmetic average.

The final section of the paper deals with ways of moving forward, given the theoretical results of the paper. One thing to do is to simply add the linear marginalization constraints to the SOCP, rather than using the SOC constraints on edges. Cycle inequality SOC constraints can be added to the SOCP (to get SOCP-C), but linear cycle inequalities can also be added to the LP. The paper ends by giving additional SOC constraints on cliques (SOCP-Q), results in a feasible region that is not a super set of the corresponding LP feasible region.

The takeaway message is that, only using the straightforward SOC constraints on edges is actually dominated by the LP. In general, SOCP is seen as a balance between LP and SDP, but in order to do better than LP, additional SOC constraints must be used.

The Ravikumar Lafferty QP paper is a good related paper on this general topic. One thing that is not immediately clear is why, in the QP paper, the LP performs substantially worse, despite the results in the paper we read. It could be that the LP or QP used in the Ravikumar paper do not match those in the paper we read.

]]>A related paper is Kumar, Torr, and Zisserman, “Solving Markov Random Fields using Second Order Cone Programming Relaxations“, CVPR 2006, which may be useful as a quick introduction as well as an example of how to apply these methods to a vision problem (object recognition), but the main focus of the meeting will be on the JMLR paper.

The paper begins by describing the integer programming formulation of MAP estimation, equivalently energy minimization, hence the integer program and all relaxations are minimization problems in the paper. Next, the paper gives three relaxations of the integer program – LP-S, the linear programming relaxation (the main one we dealt with in Wainwright); QP-RL, a quadratic programming relaxation; and SOCP-MS, a second order cone programming relaxation. One of the focuses of the paper is to compare these different relaxations to one another, and describes how to do so in the next section, through the notion of dominance. The paper goes on to prove that LP-S dominates SOCP-MS, and prove that SOCP-MS and QP-RL are equivalent.

The next focus of the paper is to prove that for a class of relaxations (on trees and certain types of cycles) LP-S strictly dominates SOCP-MS (and consequently QP-RL). The paper then gives two additional types of constraints that can be added to the SOCP relaxation, such that the new SOCP relaxation dominates LP-S for some problems and has a feasibility region not a super-set of the LP-S feasibility region.

The majority of the paper is in the proofs, so beyond understanding the basic set-up, I think that’s where we should be spending our time. In particular, if there are any steps that you do not follow, post them in the comments and we can try to work through it during the meeting.

**Going from SDP to SOCP relaxation:**

One fact that seems to be used without an explicit proof is , where is the Frobenius inner product on matrices. Here’s a quick proof, where the second direction uses part of the CVPR paper.

First, note that , since for any vector , . Therefore, .

To go the reverse direction, note that any symmetric positive semidefinite matrix can be written as using the Cholesky decomposition. Let , and be any positive semidefinite matrix. , where is the th column of (not row, as stated in the CVPR paper), and the last inequality uses the fact that .

Recall that in the SDP relaxation, our positive semidefinite constraint was . From the CVPR paper, we then have, for any positive semidefinite matrix , .

Finally, note that the final inequality is equivalent to the following SOCP constraint: where and , which can be verified by squaring both sides.

At this point, the SOCP relaxation is equivalent to the SDP relaxation. However, the SOCP relaxation is made more tractable (at the cost of removing constraints) by only considering a subset of psd matrices . The typical set of used is those with all zeros except for a 2 x 2 submatrix corresponding to two vertices connected by an edge in the MRF.

]]>This section describes how to use moment matrics and conic programming, semidefinite programming(SDP) and second-order cone programming(SOCP) to construct variational relaxations.

We spent a good amount of time going over the background information in the section. First we discussed the definition of a moment matrix and the property that any valid moment matrix is positive semidefinite. Next we looked at the definition of two different bases, the multinomial base and the indicator base and went trhough the lemma that shows that it is always possible to convert between the two. Lastly, we looked at the new definition of the marginal polytope for hypergraphs in terms of the multinomial base.

Next we went over how the Lasserre sequence(moment matrices) provides a nested hierarchy of outer bounds on the marginal polytope. For any hypergraph on m nodes, the where t=m provides and exact characterization of the margional polytope.

The last section discussed an alternate second-order cone relaxation technique, which we deffered discussing until next week’s meeting.

]]>