Wolsey’s generalization of the greedy Set-Cover algorithm to a large class of problems.
It is natural to ask what general classes of problems the greedy Set-Cover algorithm generalizes to. Here we describe one such class, due to Wolsey (1982), that captures many, but not all, such problems.
Minimizing a linear cost subject to a submodular constraint
The class is called choosing a subset to minimize a linear cost function subject to a submodular constraint. It contains all problems of the form
where C is a universe of elements and f:2^ C\rightarrow {\mathbb R} is a monotone submodular function (mapping subsets to reals).1
The generalized algorithm starts with S=\emptyset , then repeatedly chooses an element s\in C to add to S, where s maximizes [f(S\cup \{ s\} ) – f(S)]/c_ s, stopping when f(S) achieves the maximum possible value f(C).
Wolsey proves a number of performance guarantees for the algorithm [3, 2]. The most satisfactory guarantee holds only when the submodular function is integer-valued. In this case, the approximation ratio is {\rm H}(d), where d is \max \{ f(\{ s\} ) – f(\emptyset )~ |~ s\in C\} , the largest possible increase in f from adding a single element. The proof is primal-dual, based on an abstract linear-program relaxation of the problem.
For more general submodular functions, Wolsey bounds the approximation ratio (somewhat weakly) by 1+\ln \min (\lambda _1, \lambda _2, \lambda _3), where
-
\lambda _1 is \displaystyle \max _{s,t} \Big\{ \frac{f(\{ s\} ) – f(\emptyset )}{f(S_ t\cup \{ s\} ) – f(S_ t)} ~ \Big|~ {f(S_ t\cup \{ s\} ) – f(S_ t) \gt 0} \Big\}
where S_ t contains the first t elements chosen by the algorithm and s ranges over s\in C. This generalizes Wolsey’s result for integer-valued f. -
\lambda _2 = \sigma _1 / \sigma _ T, where T is the number of iterations and \sigma _ t is the value of the quantity that is maximized in iteration t of the algorithm (that is, \sigma _ t = \max _ s [f(S_ t\cup \{ s\} )-f(S_ t)]/c_ s).
-
\lambda _3 is \frac{f(C) – f(\emptyset )}{f(C) – f(S_{T-1})}.
These latter bounds aren’t fully satisfying.
Examples
The class of problems is somewhat abstract. Here are some examples:
-
Set Partial Cover: Given an integer k and a collection C of sets over a universe {\cal U}, where each set s\in C has a cost c_ s, choose a minimum-cost collection of the sets so that their union contains at least k elements. The approximation ratio is {\rm H}(d), where d is the maximum size of any single set s\in C.
-
Set Multicover: Given a collection C of sets, where each set s\in C has a cost c_ s, and an integer requirement r_ e for each element e, choose a minimum-cost collection of the sets so that each element e is contained in at least r_ e of the chosen sets. Each set can only be chosen once. The approximation ratio is {\rm H}(d), where d is the maximum size of any single set s\in C.
-
Given a collection C of sets of vectors of {\mathbb R}^ n, where each set s\in C has a cost c_ s, choose a minimum-cost collection of the sets so that their union spans {\mathbb R}^ n. The approximation ratio is {\rm H}(d), where d is the maximum rank of any single set s\in C.
-
Given a collection C of sets of edges in a graph G=(V,E), choose a minimum-cost collection of the sets so that their union connects the graph. The approximation ratio is {\rm H}(d), where d is the maximum size of any set s\in C. If d=1 the problem is MST, the algorithm gives 1-approximation (it reduces to Kruskal’s algorithm).
An open problem
One of the most natural generalizations of Set Cover is Integer Covering with upper bounds on the variables, that is, \min \{ c\cdot x ~ |~ Ax\ge b, ~ x_ j\in \{ 0,1\} \} where each A_{ij},b_ i\in {\mathbb R}_+. (To model a more general constraint x_ j\in \{ 0,1,\ldots ,u_ j\} for a variable x_ j, just replace each variable x_ j with u_ j copies.)
A nice open question is whether the {\rm H}(d) approximation ratio (or perhaps a 2{\rm H}(d) ratio) can be shown for any natural greedy algorithm for this problem, where d is the maximum number of constraints in which any variable appears. An O(\log d)-approximation algorithm is known for the problem, but it is not a greedy algorithm [1].
For this problem, it is natural to define the underlying submodular function by f(x) = \sum _ i \min (A_ i x, b_ i) or g(x) = \sum _ i \min (A_ i x/b_ i, 1). But for these functions, none of Wolsey’s three quantities above (\lambda _1,\lambda _2,\lambda _3) are bounded independent of A and b.
The question is resolved positively for for two slightly easier problems:
-
\min \{ \, c\cdot x ~ |~ Ax\ge 1,~ x_ j\in {\mathbb N}_{\ge 0}\} (no upper bounds on the variables), and
-
\min \big \{ \, c\cdot x ~ |~ Ax\ge b,~ x_ j\in \{ 0,1\} \big \} where A_{ij}\in \{ 0,1\} . (This one is Set Multicover, mentioned previously, for which the function f above is integer-valued, so Wolsey’s result applies.)
Related
-
The set-cover problem (wikipedia)
-
Submodular functions (wikipedia)
Bibliography
Footnotes
- f is submodular if f(S\cup \{ b\} ) – f(S) \ge f(S\cup \{ a,b\} ) – f(S\cup \{ a\} ) for any S\subset C and a,b\in C. That is, adding b to the current set S gives at least as much benefit as would adding b after adding a. Intuitively, adding a cannot make it better to add b. The key property for the analysis of the greedy algorithm is the following: if adding a set A of elements would increase f by at least \delta , then there must be at least one element in A that, if added by itself, would increase f by at least \delta /|A|.