Sum-of-squares: proofs, beliefs, and algorithms — Boaz Barak and David Steurer

Optimality of sum-of-squares

In this lecture, we show that sum-of-squares achieves the best possible approximation guarantees for every constraint satisfaction problems (CSPs) among all polynomial-size semidefinite programming (SDP) relaxations (Lee, Raghavendra, and Steurer 2015).

The first step is to formalize the notion of SDP relaxations for constraint satisfaction problem. For simplicity, we restrict ourselves to Boolean CSPs, that is, every variable takes values from the alphabet \(\{0,1\}\). There are many equivalent for definitions for the kind of relaxations we want to study. It turns out that the most convenient definition for the purpose of our proof is in terms of certain certificates of nonnegativity, which generalize the notion of sum-of-squares certificates.

Let \(U\) be a linear subspace of the set of real-valued functions on \(\bits^n\). We say that \(f\from \bits^n\to \R\) has a \(U\)-certificate (of nonnegativity) if there exists functions \(g_1,\ldots,g_r\in U\) such that \[ f = g_1^2 + \cdots + g_r^2\,. \]

We can view the task of approximating a boolean constraint satisfaction problem in terms of certifying the nonnegativity of a collection of functions. For example, we can consider nonnegative functions of the form \[ x\mapsto \frac{\max f_G}{0.878} - f_G(x), \] where \(G\) ranges over all graphs and \(f_G(x)\) is the function that counts the number of edges cut by the bipartition \(x\). The fact that degree-2 sos achieves approximation ratio \(0.878\) for Max Cut is equivalent to the fact that this set of functions has degree-2 sos certificates of nonnegativity.

The following definition formalizes what it means for a set of nonnegative functions to have small subspace certifcates of nonnegativity.

Let \(\cF\) be a family of real-valued functions on hypercubes. For \(n\in \N\), let \(\cF_n\) be the restriction of \(\cF\) to functions on \(\bits^n\). For a function \(s\from \N\to \N\), we write \(\cF\in {\mathrm{\small SIZE}_{\mathrm{SDP}}}(s)\) if there exists a constant \(C\) such that for every \(n\in \N\), there exists a subspace \(U\subseteq \R^{\bits^n}\) of dimension at most \(C\cdot s(n)\) such that every function \(f\in \cF_n\) has a \(U\)-certificate of nonnegativity.

Since subspace certificates generalize sos certificates, any family \(\cF\) of real-valued functions on hypercubes with degree-\(d\) sos certificates satisfies \(\cF\in {\mathrm{\small SIZE}_{\mathrm{SDP}}}(n^d)\).

The following theorem gives a partial converse to that statement. For a function \(f\from \bits^m\to \R\), we define \({\mathrm{extensions}}(f)\) to be the set of functions \(g\from \bits^n\to \R\) with \(n\ge m\) such that \(g(x)=f(x_S)\) for some \(S\subseteq [n]\) with \(\card{S}=m\). In other words, \({\mathrm{extensions}}(f)\) consists of all functions on hypercubes that compute \(f\) on a subset of input bits.

Let \(f\from \bits^m\to\R\). Suppose \({\mathrm{extensions}}(f)\in {\mathrm{\small SIZE}_{\mathrm{SDP}}}(n^d)\) for some \(d\in \N\). Then, \(f\) has a degree-\(10d\) sos certificate of nonnegativity.

For the theorem, it is important to pass to extensions because every nonnegative function \(f\) on the hypercube satisfies \(\{f\} \in {\mathrm{\small SIZE}_{\mathrm{SDP}}}(1)\).To see that every nonnegative function satisfies \(\{f\} \in {\mathrm{\small SIZE}_{\mathrm{SDP}}}(1)\) consider the \(1\)-dimensional subspace spanned by the function \(\sqrt f\)..

Reference:optimality-of-sum-of-squares shows the optimality of sos for CSPs because the families \(\cF\) of functions that correspond to CSPs are closed under extensions, so that \({\mathrm{extensions}}(\cF)=\cF\). For families \(\cF\) with this closure property, the above theorem says that \(\cF\in {\mathrm{\small SIZE}_{\mathrm{SDP}}}(n^d)\) implies that \(\cF\) has degree-\(10d\) sos certficiates.

We remark that the proof of Reference:optimality-of-sum-of-squares also gives some bounds for super constant \(d\) (i.e., \(d\) is allowed to depend on \(n\)). However, the resulting bounds appear to be far from tight.

Positive matrix functions

The following lemma gives a characterization of SDP size that is algebraically more concise. This alternative characterization will be convenient for the purposes of the proof of Reference:optimality-of-sum-of-squares. We say that a matrix function \(Q\from \bits^n\to \R^{N\times N}\) is positive if \(Q(x)\succeq 0\) for all \(x\in \bits^n\).

Let \(U\) be a linear subspace of functions on \(\bits^n\) of dimension \(N\). Let \(\cF_U\) be the set of functions with a \(U\)-certificate of nonnegativity. Then, there exists a positive matrix function \(Q\from \bits^n\to \R^{N\times N}\) such that \[ \cF_U = \{ x\mapsto \Tr P \cdot Q(x) \mid P\succeq 0\}\,. \]

To get some intuition about this lemma, observe that functions of the form \(x\mapsto \Tr P\cdot Q(x)\) are indeed nonnegative because the product of two positive semidefinite matrices has nonnegative trace. To prove this inequality note that \(\Tr A B = \norm{\sqrt A \sqrt B}^2_F\ge 0\) for all \(A,B\succeq 0\).

Let \(h_1,\ldots,h_N\) be a basis for \(U\) and let \(h\from \bits^n\to \R^N\) be the vector-valued function \(h=(h_1,\ldots,h_N)\). Note that every function \(g\in U\) has a representation \(g(x)=\iprod{h(x),v}\) for \(v\in \R^N\). Choose \(Q\) such that \(Q(x)=\dyad{h(x)}\). Let \(f\) be any function with a \(U\)-certificate of nonnegativity so that \(f=g_1^2+\dots+g_r^2\) for \(g_1,\ldots,g_r\in U\). Let \(v_1,\ldots,v_r\) be the coordinates of these functions so that \(g_i(x)=\iprod{h(x),v_i}\) for all \(i\in [r]\). Then, \[ f(x) = g_1(x)^2+\dots+g_r(x)^2 = \sum_{i=1}^r \iprod{h(x),v_i}^2 = \Tr \Paren{\sum_{i=1}^r \dyad{v_i}} Q(x)\,. \] We conclude \(f\in \cF_U\) as desired.

References

Grigoriev, Dima. 2001. “Linear Lower Bound on Degrees of Positivstellensatz Calculus Proofs for the Parity.” Theoret. Comput. Sci. 259 (1-2): 613–22. doi:10.1016/S0304-3975(00)00157-2.

Lee, James R., Prasad Raghavendra, and David Steurer. 2015. “Lower Bounds on the Size of Semidefinite Programming Relaxations.” In STOC, 567–76. ACM.