Yet another approximation ratio for Chvátal’s greedy set-cover algorithm.
We show that the rounding scheme and algorithm give (1+\ln (n/{\textsc{opt}}))-approximate solutions, matching bounds of Srinivasan [2, Thm.5.2(b)] and Slavik [1]. For unweighted set cover, this bound compares favorably to the {\rm H}(d)-approximation ratio, because n/{\textsc{opt}} is always at most d (the maximum set size). For the weighted case the bounds are incomparable. The proof is related to the so-called method of alterations.
The algorithm
We prove the following theorem by applying the method of conditional probabilities to the localized rounding scheme:
Assuming the cost of each set is at most 1, Chvátal’s algorithm returns a cover of cost at most 1 + {\textsc{opt}}\big (1+\ln (n/{\textsc{opt}})\big ).
To prove the theorem, we first analyze the rounding scheme:
Assume the cost of each set is at most 1. With non-zero probability, the cost of the random set cover C is less than 1+[1+\ln (n/c\cdot x)]c\cdot x.
Define (with foresight) \alpha = \ln (n/c\cdot x) \, c\cdot x. Let n_ t denote the number of elements remaining after t samples. Let stopping time T be the number of samples until the cumulative cost of sampled sets is in the range [\alpha ,\alpha +1), so that the total cost of the T sets chosen in this first stage is less than \alpha +1. In the second stage, n_ T elements are initially not-yet-covered, and then each set chosen covers at least 1 new element and costs at most 1. So the total cost of the second stage is at most n_ T. Thus, the total cost of all sets chosen is less than \alpha +1+n_ T. To finish the proof, we show that with positive probability n_ T is at most c\cdot x, in which case the upper bound \alpha +1+n_ T on the total cost is at most \ln (n/c\cdot x) \, c\cdot x + 1 + c\cdot x, as desired.
Method of conditional probabilities
To prove the theorem, we show that applying the method of conditional probabilities to the lemma yields the algorithm.
Define T, n_ t, and \alpha as in the proof of the lemma. Following the proof of the lemma, in any outcome where n_ T is less than c\cdot x, the performance guarantee holds. We show that the algorithm chooses each of the first T sets to keep the conditional expectation of \ln n_ T below \ln c\cdot x. It does so by keeping the following pessimistic estimator from decreasing:
where (abusing notation) C_ t denotes the cost of the first t sampled sets.