Randomly rounding a fractional LP solution to an integer solution.
The idea, introduced by Raghavan and Thompson in 1987, is to use the probabilistic method to round the solution of a linear program, converting it into an approximately optimal integer solution [3]. It’s a broadly useful technique. For many problems, randomized rounding yields algorithms with optimal approximation ratios (assuming P\neq NP). This note describes randomized rounding and gives a few examples.
Trivial example: Vertex Cover
Here is a very simple randomizedrounding scheme for Vertex Cover:
1. Given any graph G, compute a mincost fractional vertex cover x^*.
2. Return the random vertex cover \tilde x where \tilde x_ v = 1 with probability \min (2x^*_ v,1).
The algorithm is a 2approximation algorithm in expectation. Indeed, the rounded solution \tilde x is a cover because for any edge one of its two endpoints v must have x^*_ v \ge 1/2, which forces v into the cover \tilde x. By inspection, the expected cost of \tilde x is at most twice the cost of x^*, which, since the cost of x^* is a lower bound on the cost of the integral opt, is at most 2{\textsc{opt}}.
Relation to the probabilistic method in combinatorics
If you’re already familiar with the probabilistic method, the new element here is that the fractional solution x^* is used as basis for the probability distribution. This allows the cost of the rounded solution to be related to the cost of x^*, and via that (since x^* is an optimal solution to a relaxation) to the cost of opt. In this way, one obtains the perinput worstcase performance guarantees necessary for approximation algorithms.
Standard randomizedrounding scheme for Set Cover
Next is a more sophisticated example of randomized rounding: a standard randomizedrounding scheme for Set Cover. This example is comparable to one in [4, Sec.14.2]. For more examples, see [1, ch.4].
Let x^* be a minimumcost fractional set cover x^* (computed, say, by linear programming). Consider first the most natural scheme for rounding x^*:
For each set s\in S independently, take \tilde x_ s = 1 with probability x^*_ s, otherwise take \tilde x_ s = 0.
With this rounding scheme, the expected cost of the chosen sets is \sum _ s c_ s x^*_ s, the cost of the fractional cover. This is good. Unfortunately the coverage is not good in the worst case, when the set weights are relatively small, for then the probability that an element e is not covered is about
Only a constant fraction of the elements will be covered in expectation. To increase the coverage, the standard rounding scheme scales up the rounding probabilities by an appropriate factor \lambda \gt 1:
For each set s\in S independently, take \tilde x_ s = 1 with probability p_ s = \min (\lambda x^*_ s, 1), otherwise take \tilde x_ s = 0.
Scaling the probabilities increases the expected cost by \lambda , but makes coverage of all elements likely.
The next step is to show that this rounding scheme works with positive probability.
Fix \lambda = \ln (2n), where n is the number of elements. With nonzero probability, the rounding scheme returns a set cover \tilde x of cost at most 2\ln (2 n)\, c\cdot x^* (and thus, of cost O(\log n) times {\textsc{opt}}).
(Note: with care the O(\log n) can be reduced to \ln (n)+O(\log \log n).)
The output \tilde x of the random rounding scheme has the desired properties as long as none of the following “bad” events occur:

the cost c\cdot \tilde x is too high: it exceeds 2\lambda c\cdot x^*, or

\tilde x is not a cover: it fails to cover one of the n elements e.
Regarding the cost, the expectation of each \tilde x_ s is at most \lambda x_ s^*. By linearity of expectation, the expectation of c\cdot \tilde x is at most \sum _ s c_ s\lambda x_ s^*=\lambda c\cdot x^*. Thus, by the Markov bound, the probability of the first bad event (the cost is too high) is at most 1/2.
For the remaining bad events (coverage of each element e), note that, for any given element e, since \sum _{s\ni e} x^*_ s \ge 1, the probability that e is not covered is
(The first step uses p_ s = \min (\lambda x^*_ s,1) and 1z\le e^{z}, which is strict for z\ne 0. The last step uses the definition of \lambda .) Thus, for each of the n elements, the probability that the element is not covered is less than 1/(2n). By the naive union bound, the probability that one of the 1+n bad events happens is less than 1/2 + n/(2n)=1. Thus, with positive probability there are no bad events and \tilde x is a set cover of cost at most 2\lambda c\cdot x^*.
Deconstructing the Set Cover existence proof to find Q
The argument above shows the existence of a set cover of cost O(\log (n)c\cdot x^*). The next step is to apply the method of conditional probabilities to the existence proof. The resulting algorithm will consider each set s\in S in turn, and set \tilde x_ s to either 0 or 1. Instead of setting \tilde x_ s randomly based on x^*, it will set \tilde x_ s to keep the conditional probability of failure (defined with respect to the random experiment) below 1.
Although the concept is simple, implementing it requires a careful calculation bounding the conditional probability of failure as the scheme proceeds. The original existence proof implicitly bounds the (unconditioned) probability of failure by the expectation of
where n_ m is the number of elements left uncovered at the end that is, n_ m is the size of the set U_ m=\{ e\, \, \, \, \forall s\ni e.~ \tilde x_ s = 0 \} .
The form of Q mirrors the probabilistic proof in a systematic way.^{1} The first term in Q comes from applying the Markov bound to bound the probability of the first bad event. It contributes at least 1 to Q if the cost is too high. The second term n_ m counts the number of bad events of the second kind (uncovered elements). It contributes at least 1 to Q if the coverage fails. Since failure implies Q\ge 1, by the Markov bound, the probability of failure is at most E[Q]. Finally (as shown by calculation in the proof of the lemma) E[Q] \lt 1.
Algorithm via the method of conditional probabilities
The algorithm will choose each set to keep the conditional expectation of Q, given the choices made so far, below 1. Consider the state after the tth choice. Let S_ t=(s_1,s_2,\ldots ,s_ t) denote the sets considered so far — for each set s\in S_ t, the value of \tilde x_ s has been set to 0 or 1; the values for the remaining sets have not yet been determined. The conditional expectation of Q, given this state, is
The algorithm will keep \phi _ t (the conditional expectation of Q) below 1 by setting each \tilde x_{s_ t} to minimize the resulting value of \phi _ t. Let s’=s_ t denote the tth set. When the algorithm sets \tilde x_{s’}, each \tilde x_ s for s\in S_{t1} has already been set. Considered as a function of just \tilde x_{s’} (with all other quantities fixed) the quantity \phi _ t is linear in \tilde x_{s’}. Let U= \{ e \, \, \prod _{s\in S_{t1}} (1[e\in s] \tilde x_ s) = 1\} denote the set of elements not covered before s’ is considered. The coefficient of \tilde x_{s’} in \phi _ t is
To minimize \phi _ t, if this coefficient is positive, then the algorithm sets \tilde x_{s’} to 0. Otherwise it sets \tilde x_{s’} to 1:
input: set system S, universe U, cost vector c  
output: set cover \tilde x  
1.  Compute mincost fractional set cover x^*. 
2.  Take \lambda \leftarrow \ln (2n) and set p_ s \leftarrow \max (\lambda x^*_{s},1) for each s. 
3.  For each s’\in S do: 
4.  Let S \leftarrow S – \{ s’\} .
…S contains the notyetdecided sets.

5.  If \displaystyle \frac{c_{s’}}{2\lambda c\cdot x^*} \gt \sum _{e\in s’\cap U} \prod _{s\in S, s\ni e}(1p_ s)
then set \tilde x_ s\leftarrow 0, else set \tilde x_ s\leftarrow 1 and U\leftarrow U – s’. 
6.  Return \tilde x. 
By ensuring \phi _ t \le \phi _{t1} at each step, the algorithm guarantees an outcome with Q\le \textrm{E}[Q]\lt 1, so that \tilde x has cost at most 2\ln (2n)c\cdot x^* and covers all elements:
The algorithm above returns a set cover \tilde x of cost at most 2\ln (2n) times the minimum cost of any (fractional) set cover.
Randomized rounding without solving the linear program
The algorithm’s choice for each \tilde x_{s’} depends intrinsically on the individual values within x^*. (For example, if each element e\in s’ is contained in some notyetconsidered set s with p_{s}=1, then s’ will not be added to the cover even if p_{s’}=1.) This dependence is not surprising, but it is not inevitable. By starting with a different randomized rounding scheme, the method of conditional probabilities can yield an algorithm that, in each step, makes a choice that is the same for any feasible x^*. Thus, the algorithm can compute the approximate cover \tilde x without first computing x^*. You can read more about that here.
Related
Bibliography
Footnotes
 (1) For an application of the Markov bound, (\Pr [X\ge {}c] {\, {\le }\, }E[X]/c), the pessimistic estimator will be the conditional expectation of X/(c\textrm{E}[X]).
(2) For an application of the naive union bound (\Pr [A\text { or } B] {\, {\le }\, }\Pr [A] + \Pr [B]) the pessimistic estimator will be the sum of the pessimistic estimators of the individual events.
(3) For a bad event whose probability is computed (or estimated) directly, the pessimistic estimator will be the conditional probability (or the corresponding estimate).
Can you please explain this line in last line of page 5 “By ensuring ϕt≤ϕt−1ϕt≤ϕt−1 at each step, the algorithm guarantees an outcome with Q≤E[Q]<1, so that x~ has cost at most 2ln(2n)c⋅x∗and covers all elements. "
That line summarizes the reasoning in the previous pages. To break it down, the proof of the lemma on page 2 explains why the randomizedrounding scheme (with nonzero probability) returns $\tilde x$ of at most that cost ($2\ln(2n) \,c\cdot x^*$). The discussion on page 4 explains that to find an outcome where $\tilde x$ achieves that cost bound, it’s enough to find an outcome where $Q$ (as defined there) is less than 1, and why $E[Q]< 1$. The discussion on page 5 explains why $\phi_t$ is the conditional expectation of $Q$ given the first $t$ choices. That page also sketches why the deterministic algorithm there, in each iteration $t$, makes a choice that ensures that $\phi_t$ is at most $\phi_{t1}$, so that the algorithm keeps the conditional expectation of $Q$ below 1, and thus reaches an outcome where $Q<1$, so that $\tilde x$ achieves the desired cost bound. If these comments don't make much sense to you, try going over the background material, linked to at the top. Or, if you have a question about any one of these steps, please post it.