Set Cover / alterations

Yet another approximation ratio for Chvátal’s greedy set-cover algorithm.

We show that the rounding scheme and algorithm give (1+\ln (n/{\textsc{opt}}))-approximate solutions, matching bounds of Srinivasan [2, Thm.5.2(b)] and Slavik [1]. For unweighted set cover, this bound compares favorably to the {\rm H}(d)-approximation ratio, because n/{\textsc{opt}} is always at most d (the maximum set size). For the weighted case the bounds are incomparable. The proof is related to the so-called method of alterations.

The algorithm

We prove the following theorem by applying the method of conditional probabilities to the localized rounding scheme:

Theorem [1]).

Assuming the cost of each set is at most 1, Chvátal’s algorithm returns a cover of cost at most 1 + {\textsc{opt}}\big (1+\ln (n/{\textsc{opt}})\big ).

To prove the theorem, we first analyze the rounding scheme:


Assume the cost of each set is at most 1. With non-zero probability, the cost of the random set cover C is less than 1+[1+\ln (n/c\cdot x)]c\cdot x.

Proof of lemma.

Define (with foresight) \alpha = \ln (n/c\cdot x) \, c\cdot x. Let n_ t denote the number of elements remaining after t samples. Let stopping time T be the number of samples until the cumulative cost of sampled sets is in the range [\alpha ,\alpha +1), so that the total cost of the T sets chosen in this first stage is less than \alpha +1. In the second stage, n_ T elements are initially not-yet-covered, and then each set chosen covers at least 1 new element and costs at most 1. So the total cost of the second stage is at most n_ T. Thus, the total cost of all sets chosen is less than \alpha +1+n_ T. To finish the proof, we show that with positive probability n_ T is at most c\cdot x, in which case the upper bound \alpha +1+n_ T on the total cost is at most \ln (n/c\cdot x) \, c\cdot x + 1 + c\cdot x, as desired.


Method of conditional probabilities

To prove the theorem, we show that applying the method of conditional probabilities to the lemma yields the algorithm.

Proof of theorem.

Define T, n_ t, and \alpha as in the proof of the lemma. Following the proof of the lemma, in any outcome where n_ T is less than c\cdot x, the performance guarantee holds. We show that the algorithm chooses each of the first T sets to keep the conditional expectation of \ln n_ T below \ln c\cdot x. It does so by keeping the following pessimistic estimator from decreasing:

\[ \phi _ t ~ =~ \ln (n_ t) \, – \, \frac{\alpha -C_ t}{c\cdot x}, \]

where (abusing notation) C_ t denotes the cost of the first t sampled sets.