Yet another approximation ratio for Chvátal’s greedy set-cover algorithm.

We show that the rounding scheme and algorithm give $(1+\ln (n/{\textsc{opt}}))$ -approximate solutions, matching bounds of Srinivasan and Slavik . For unweighted set cover, this bound compares favorably to the ${\rm H}(d)$ -approximation ratio, because $n/{\textsc{opt}}$ is always at most $d$ (the maximum set size). For the weighted case the bounds are incomparable. The proof is related to the so-called method of alterations.

Click for background material…

Proof of ${\rm H}(n)$ -approximation ratio.

The algorithm

We prove the following theorem by applying the method of conditional probabilities to the localized rounding scheme:

Theorem ( [1]).

Assuming the cost of each set is at most $1$ , Chvátal’s algorithm returns a cover of cost at most $1 + {\textsc{opt}}\big (1+\ln (n/{\textsc{opt}})\big )$ .

Click to see localized rounding scheme and algorithm…

	input: weighted Set-Cover instance ${\cal I}$
	output: set cover for ${\cal I}$
0.	Compute a min-cost fractional set cover $x^*$ .
1.	Repeat until the chosen sets form a cover:
2.	Choose a set randomly from the distribution defined by $x^/\|x^\|$ .
3.	Return the sets that, when chosen, contained not-yet-covered elements.

	input: weighted Set-Cover instance ${\cal I}$
	output: set cover $C$ for ${\cal I}$
1.	Repeat until the chosen sets form a cover:
2.	Choose a set $s$ minimizing the cost of $s$ divided by the number of elements in $s$ not yet covered by chosen sets.
3.	Return the chosen sets.

To prove the theorem, we first analyze the rounding scheme:

Lemma.

Assume the cost of each set is at most $1$ . With non-zero probability, the cost of the random set cover $C$ is less than $1+[1+\ln (n/c\cdot x)]c\cdot x$ .

Proof of lemma.

Define (with foresight) $\alpha = \ln (n/c\cdot x) \, c\cdot x$ . Let $n_ t$ denote the number of elements remaining after $t$ samples. Let stopping time $T$ be the number of samples until the cumulative cost of sampled sets is in the range $[\alpha ,\alpha +1)$ , so that the total cost of the $T$ sets chosen in this first stage is less than $\alpha +1$ . In the second stage, $n_ T$ elements are initially not-yet-covered, and then each set chosen covers at least 1 new element and costs at most 1. So the total cost of the second stage is at most $n_ T$ . Thus, the total cost of all sets chosen is less than $\alpha +1+n_ T$ . To finish the proof, we show that with positive probability $n_ T$ is at most $c\cdot x$ , in which case the upper bound $\alpha +1+n_ T$ on the total cost is at most $\ln (n/c\cdot x) \, c\cdot x + 1 + c\cdot x$ , as desired.

Click for remaining details …

With each of the first $T$ samples, the number of not-yet-covered elements $n_ t$ decreases by at least a $1/|x|$ factor in expectation. By Wald’s for dependent decrements, this implies that

\[ \textrm{E}[T] \, \le \, \textrm{E}\Big[{\textstyle \int _{n_ T}^{n_0}} \frac{|x|}{z} \, dz\Big] \, =\, |x|\big (\ln (n) – \textrm{E}[\ln n_ T]\big ). \]

Since the cumulative cost increases by at most $c\cdot x/|x|$ in expectation with each sample, and increases from 0 to at least $\alpha$ in the first $T$ samples, by Wald’s equation, the expected number of samples in the first stage is $\textrm{E}[T] \ge \alpha \, |x|\, / c\cdot x$ .

Transitively, these two bounds on $\textrm{E}[T]$ imply $|x|\big (\ln (n) - \textrm{E}[\ln n_ T]\big ) \ge \alpha |x|/c\cdot x$ .

That is, $\textrm{E}[\ln n_ T] \le \ln (n) - \alpha /c\cdot x = \ln c\cdot x$ .

Method of conditional probabilities

To prove the theorem, we show that applying the method of conditional probabilities to the lemma yields the algorithm.

Proof of theorem.

Define $T$ , $n_ t$ , and $\alpha$ as in the proof of the lemma. Following the proof of the lemma, in any outcome where $n_ T$ is less than $c\cdot x$ , the performance guarantee holds. We show that the algorithm chooses each of the first $T$ sets to keep the conditional expectation of $\ln n_ T$ below $\ln c\cdot x$ . It does so by keeping the following pessimistic estimator from decreasing:

\[ \phi _ t ~ =~ \ln (n_ t) \, – \, \frac{\alpha -C_ t}{c\cdot x}, \]

where (abusing notation) $C_ t$ denotes the cost of the first $t$ sampled sets.

Click for verification of pessimistic estimator…

Fix any $t \in \{ 0,1,\ldots ,T\}$

. If the algorithm chooses some set $s$

for sample $t$

, the increase in the pessimistic estimator $\phi _{t}-\phi _{t-1}$

\[ \ln \Big( \frac{n_{t}}{n_{t-1}}\Big) + \frac{c_ s}{c\cdot x} ~ \le ~ \frac{c_ s}{c\cdot x} – \frac{n_{t-1}-n_{t}}{n_{t-1}}. \]

In expectation (for random $s$ from $x/|x|$ ) the right-hand side (RHS) above is non-positive. RHS will be non-positive iff (bringing terms that depend on $s$ to the left-hand side (LHS))

\[ \frac{n_{t-1}-n_{t}}{c_ s} ~ \ge ~ \frac{n_{t-1}}{c\cdot x}. \]

The algorithm chooses $s$ to maximize the LHS above, satisfying the inequality, and keeping the pessimistic estimator from increasing. Thus, at time $T$ we have

\[ \ln (n_ T)\, -\, \frac{\alpha -C_ T}{c\cdot x} \, =\, \phi _ T \, \le \, \phi _0 \, =\, \ln (n)\, -\, \frac{\alpha }{c\cdot x} \, =\, \ln c\cdot x, \]

which with $C_ T \ge \alpha$ implies $\ln n_ T \le \ln (c\cdot x)$ as desired.

Click for derivation of pessimistic estimator…

Condition on the first $t$ samples. For each subsequent sample $s=t+1,t+2,\ldots , T$ , we have $\textrm{E}[n_ s\, |\, n_{s-1}] \le n_{s-1}/|x|$ , so by Wald’s for dependent decrements

\[ \textrm{E}[ T-t \, |\, n_ t] \, \le \, \textrm{E}\big [{\textstyle \int _{n_ T}^{n_ t}} \frac{|x|}{z} \, dz\big ] \, =\, |x|\big (\ln (n_ t) – \textrm{E}[\ln n_ T\, |\, n_ t]\big ), \]

so that $\textrm{E}[\ln n_ T \, |\, n_ t]\, \le \, \ln (n_ t)\, - \, \textrm{E}[T-t\, |\, n_ t]/|x|$ .

Each sample costs $c\cdot x/|x|$ in expectation, so, by Wald’s, given $n_ t$ and $C_ t$ (the cost so far) the expected number of additional samples required to bring the cost to $\alpha$ or more is $\textrm{E}[T-t\, |\, n_ t] \ge (\alpha -C_ t)|x|/c\cdot x$ .

Combining the two bounds above, $\textrm{E}[\ln n_ T\, |\, n_ t]$ is at most $\ln (n_ t) \, - \, (\alpha -C_ t)/ c\cdot x = \phi _ t$ .

${\rm H}(d)$ -approximation ratio for Set Cover (localized)

Bibliography

[1]	P. Slavík. A tight analysis of the greedy algorithm for set cover. Journal of Algorithms, 25(2):237–254, 1997. Preliminary version in STOC’96.
[2]	A. Srinivasan. Improved approximation guarantees for packing and covering integer programs. SIAM J. Comput., 29:648–670, 1999. Preliminary version in STOC’95.

Notes on algorithms

Lecture notes on algorithms

Set Cover / alterations

The algorithm

Method of conditional probabilities

Related

Bibliography