Wolsey’s generalization of the greedy Set-Cover algorithm to a large class of problems.

It is natural to ask what general classes of problems the greedy Set-Cover algorithm generalizes to. Here we describe one such class, due to Wolsey (1982), that captures many, but not all, such problems.

Click for background material…

The greedy Set-Cover algorithm

Minimizing a linear cost subject to a submodular constraint

The class is called choosing a subset to minimize a linear cost function subject to a submodular constraint. It contains all problems of the form

\[ \min _ S \Big\{ \sum _{s \in S} c_ s ~ \Big|~ S\subseteq C, f(S) \ge k\Big\} \]

where $C$ is a universe of elements and $f:2^ C\rightarrow {\mathbb R}$ is a monotone submodular function (mapping subsets to reals).¹

The generalized algorithm starts with $S=\emptyset$ , then repeatedly chooses an element $s\in C$ to add to $S$ , where $s$ maximizes $[f(S\cup \{ s\} ) - f(S)]/c_ s$ , stopping when $f(S)$ achieves the maximum possible value $f(C)$ .

Performance guarantees.

Wolsey proves a number of performance guarantees for the algorithm . The most satisfactory guarantee holds only when the submodular function is integer-valued. In this case, the approximation ratio is ${\rm H}(d)$ , where $d$ is $\max \{ f(\{ s\} ) - f(\emptyset )~ |~ s\in C\}$ , the largest possible increase in $f$ from adding a single element. The proof is primal-dual, based on an abstract linear-program relaxation of the problem.

For more general submodular functions, Wolsey bounds the approximation ratio (somewhat weakly) by $1+\ln \min (\lambda _1, \lambda _2, \lambda _3)$ , where

$\lambda _1$ is $\displaystyle \max _{s,t} \Big\{ \frac{f(\{ s\} ) - f(\emptyset )}{f(S_ t\cup \{ s\} ) - f(S_ t)} ~ \Big|~ {f(S_ t\cup \{ s\} ) - f(S_ t) \gt 0} \Big\}$
where $S_ t$ contains the first $t$ elements chosen by the algorithm and $s$ ranges over $s\in C$ . This generalizes Wolsey’s result for integer-valued $f$ .
$\lambda _2 = \sigma _1 / \sigma _ T$ , where $T$ is the number of iterations and $\sigma _ t$ is the value of the quantity that is maximized in iteration $t$ of the algorithm (that is, $\sigma _ t = \max _ s [f(S_ t\cup \{ s\} )-f(S_ t)]/c_ s$ ).
$\lambda _3$ is $\frac{f(C) - f(\emptyset )}{f(C) - f(S_{T-1})}$ .

These latter bounds aren’t fully satisfying.

Examples

The class of problems is somewhat abstract. Here are some examples:

Set Partial Cover: Given an integer $k$ and a collection $C$ of sets over a universe ${\cal U}$ , where each set $s\in C$ has a cost $c_ s$ , choose a minimum-cost collection of the sets so that their union contains at least $k$ elements. The approximation ratio is ${\rm H}(d)$ , where $d$ is the maximum size of any single set $s\in C$ .

Explanation…
Here the submodular function is the minimum of $k$ and the size of the union of the chosen sets. The problem requires choosing a minimum-cost collection of sets that makes this function take value at least $k$ . The greedy algorithm repeatedly chooses a set $s\in C$ that maximizes the increase in this function, divided by the cost $c_ s$ .
Set Multicover: Given a collection $C$ of sets, where each set $s\in C$ has a cost $c_ s$ , and an integer requirement $r_ e$ for each element $e$ , choose a minimum-cost collection of the sets so that each element $e$ is contained in at least $r_ e$ of the chosen sets. Each set can only be chosen once. The approximation ratio is ${\rm H}(d)$ , where $d$ is the maximum size of any single set $s\in C$ .

Explanation…
The underlying submodular function is the sum, over the elements $e$ , of the number of chosen sets covering $e$ (or $r_ e$ , whichever is smaller). The greedy algorithm repeatedly chooses a not-yet-chosen set $s\in C$ to maximize the increase in this function, divided by the cost $c_ s$ .
Given a collection $C$ of sets of vectors of ${\mathbb R}^ n$ , where each set $s\in C$ has a cost $c_ s$ , choose a minimum-cost collection of the sets so that their union spans ${\mathbb R}^ n$ . The approximation ratio is ${\rm H}(d)$ , where $d$ is the maximum rank of any single set $s\in C$ .

Explanation…
The submodular function is the vector rank of the union of the sets chosen so far. The greedy algorithm repeatedly chooses a set $s\in C$ that maximizes the increase in this quantity, divided by the cost $c_ s$ .
Given a collection $C$ of sets of edges in a graph $G=(V,E)$ , choose a minimum-cost collection of the sets so that their union connects the graph. The approximation ratio is ${\rm H}(d)$ , where $d$ is the maximum size of any set $s\in C$ . If $d=1$ the problem is MST, the algorithm gives 1-approximation (it reduces to Kruskal’s algorithm).

Explanation…
The submodular function is $|V|$ minus the number of connected components in the subgraph formed by the edges in chosen sets. The greedy algorithm repeatedly chooses a set $s$ that maximizes the increase in this quantity, divided by the cost $c_ s$ .

An open problem

One of the most natural generalizations of Set Cover is Integer Covering with upper bounds on the variables, that is, $\min \{ c\cdot x ~ |~ Ax\ge b, ~ x_ j\in \{ 0,1\} \}$ where each $A_{ij},b_ i\in {\mathbb R}_+$ . (To model a more general constraint $x_ j\in \{ 0,1,\ldots ,u_ j\}$ for a variable $x_ j$ , just replace each variable $x_ j$ with $u_ j$ copies.)

A nice open question is whether the ${\rm H}(d)$ approximation ratio (or perhaps a $2{\rm H}(d)$ ratio) can be shown for any natural greedy algorithm for this problem, where $d$ is the maximum number of constraints in which any variable appears. An $O(\log d)$ -approximation algorithm is known for the problem, but it is not a greedy algorithm [1].

For this problem, it is natural to define the underlying submodular function by $f(x) = \sum _ i \min (A_ i x, b_ i)$ or $g(x) = \sum _ i \min (A_ i x/b_ i, 1)$ . But for these functions, none of Wolsey’s three quantities above $(\lambda _1,\lambda _2,\lambda _3)$ are bounded independent of $A$ and $b$ .

The question is resolved positively for for two slightly easier problems:

$\min \{ \, c\cdot x ~ |~ Ax\ge 1,~ x_ j\in {\mathbb N}_{\ge 0}\}$ (no upper bounds on the variables), and
$\min \big \{ \, c\cdot x ~ |~ Ax\ge b,~ x_ j\in \{ 0,1\} \big \}$ where $A_{ij}\in \{ 0,1\}$ . (This one is Set Multicover, mentioned previously, for which the function $f$ above is integer-valued, so Wolsey’s result applies.)

The set-cover problem (wikipedia)
Submodular functions (wikipedia)
Using randomized rounding to derive greedy algorithms

Bibliography

[1]	S. G. Kolliopoulos and N. E. Young. Approximation algorithms for covering/packing integer programs. Journal of Computer and System Sciences, 71(4):495–505, 2005. Preliminary version appeared in FOCS’01.
[2]	G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, New York, 1988.
[3]	L. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.

Footnotes

$f$ is submodular if $f(S\cup \{ b\} ) - f(S) \ge f(S\cup \{ a,b\} ) - f(S\cup \{ a\} )$ for any $S\subset C$ and $a,b\in C$ . That is, adding $b$ to the current set $S$ gives at least as much benefit as would adding $b$ after adding $a$ . Intuitively, adding $a$ cannot make it better to add $b$ . The key property for the analysis of the greedy algorithm is the following: if adding a set $A$ of elements would increase $f$ by at least $\delta$ , then there must be at least one element in $A$ that, if added by itself, would increase $f$ by at least $\delta /|A|$ .

Notes on algorithms

Lecture notes on algorithms

Set Cover / Wolsey’s generalization

Minimizing a linear cost subject to a submodular constraint

Examples

An open problem

Related

Bibliography

Footnotes