Set Cover / greedy algorithm

A brief review of the greedy algorithm for the Set-Cover problem.

 



The algorithm

  input: collection S of sets over universe U, costs c: S\rightarrow {\mathbb R}_+
  output: set cover C
1. Let C \leftarrow \emptyset .
2. Repeat until C is a set cover:
3. Find a set s\in S maximizing the number of elements in s
not yet covered by any set in C, divided by the cost c_ s.
4. Add s to C.
5. Return C.

Performance guarantee

The following performance guarantee was proved in the 1970’s:

Theorem ([3, 4, 1]).

The greedy set-cover algorithm returns a set cover of cost at most {\rm H}(d)\, {\textsc{opt}}, where {\textsc{opt}} is the minimum cost of any set cover, d = \max _{s\in S} |s| is the maximum set size, and {\rm H}(d)\approx 0.58+\ln d is the dth Harmonic number.

The guarantee actually holds with respect to the optimum fractional set cover. The proof is typically given as a charging argument, or by a primal-dual argument (constructing a related dual solution to bound opt).

The logarithmic approximation guarantee is the best possible in the following sense: if P\ne NP, in the worst case, no polynomial-time algorithm guarantees a cover of cost o({\textsc{opt}}\log n), where n=|U| is the number of elements to be covered [2].