Pessimistic estimators in the method of conditional probabilities.

In applying the method of conditional probabilities, exact conditional probabilities (or expectations) are sometimes hard to compute. Pessimistic estimators can be used instead. We illustrate the idea by example, using Turán’s theorem.

Click for background material…

Existence proof (Turán’s theorem)

Recall that Turán’s theorem states that any $n$ -vertex graph contains an independent set $S$ of size at least $n/(\bar d + 1)$ , where $\bar d$ is the average degree.

We assume familiarity with the proof. It considers the vertices in random order, taking them greedily (taking each vertex unless a neighbor has already been taken). The expected size of the resulting independent set is at least $n/(\bar d+1)$ , because a vertex $v$ with degree $d_ v$ has at least a $1/(d_ v+1)$ chance of being taken, and $\sum _ v 1/(d_ v+1) \ge n/(\bar d + 1)$ .

Applying the method of conditional probabilities

To guarantee a successful outcome, instead of choosing the next vertex randomly from the remaining vertices at each step, the algorithm should choose the vertex deterministically, so as to keep the conditional expectation of the size of the final set at or above $n/(\bar d + 1)$ . The next step is to analyze this conditional expectation.

If some vertex $v$ is added to the independent set at some point, then no neighbors of $v$ will be added after that. Imagine for the analysis that, when $v$ is chosen, $v$ and all its neighbors are deleted from the graph.

Suppose that some $t$ vertices have been considered so far. Let $S_ t$ contain those vertices that have been added to the independent set so far. Let $V_ t$ denote the remaining (undeleted) vertices.

All vertices in $S_ t$ will of course end up in the independent set. Each vertex $v\in V_ t$ will be added to $S$ provided it is considered before any of its remaining neighbors (in $V_ t$ ). This will happen with probability at least $1/(d_ v+1)$ , where $d_ v$ is the degree of $v$ in the original graph. (Indeed, it will happen with probability at least $1/(d’_ v+1)$ , where $d’_ v \le d_ v$ is the degree in the remaining graph.) By linearity of expectation, the conditional expectation of the size $|S|$ of the final independent set $S$ , given $S_ t$ and $V_ t$ , is at least

\[ \Phi _ t ~ =~ |S_ t| + \sum _{v\in V_ t} \frac{1}{d_ v+1}. \]

This is a pessimistic estimator

We’ve calculated above that $\Phi _ t$ is a lower bound on the conditional expectation that we care about. To keep the conditional expectation high enough (at or above $n/(\bar d + 1)$ ), it suffices to keep $\Phi _ t$ high enough. The original proof proved that $\Phi _0$ is high enough. So, to keep $\Phi _ t$ high enough, it suffices to make each choice to ensure $\Phi _{t+1} \ge \Phi _ t$ .

Is this always possible? That is, is there always some choice that ensures $\Phi _{t+1} \ge \Phi _ t$ ? In this case, yes, it is. As we will verify, $\Phi$ is a sub-martingale — in expectation, it is non-decreasing in each step: for each $t$ , the conditional expectation of $\Phi _{t+1}$ , given the current value of $\Phi _ t$ , is at least $\Phi _ t$ . This ensures that there exists a choice that ensures $\Phi _{t+1} \ge \Phi _ t$ .

Verifying the sub-martingale property

Since $\Phi _ t$ is not a conditional expectation per se, one has to verify that is a sub-martingale.¹ Here, as usual, this verification is a routine calculation, reusing the calculations from the original proof.

Suppose the first $t$ vertices have been considered. If the next vertex considered is, say $v$ , how will $\Phi _{t+1}$ compare to $\Phi _ t$ ?

In the case that $v$ is not in $V_ t$ (that is, $v$ is a neighbor of a vertex in $S$ ) then $v$ is not added to $S$ , and by inspection of its definition, $\Phi _{t+1}=\Phi _ t$ .

In the case that $v$ is in $V_ t$ , vertex $v$ is added to $S$ , so $|S_{t+1}|=1+|S_ t|$ . That contributes an increase of 1 to $\Phi$ . But $\Phi$ also decreases because $v$ ’s neighbors no longer contribute: for each remaining vertex $w\in V_ t$ , let “ $[w\in N(v)]$ ” denote 1 if $w$ is $v$ or a neighbor of $v$ , and 0 otherwise. When $v$ is added, the increase in the pessimistic estimator, $\Phi _{t+1} - \Phi _ t$ , is

\begin{equation} \label{turan:eqn} 1 – \sum _{w\in V_ t} \frac{[{w}\in N(v)]}{d_{w}+1}. \end{equation}

The conditional expectation of $[w\in N(v)]$ , given that $v\in V_ t$ , is at most $(d_ w+1)/|V_ t|$ . Thus, the expectation of (\ref{turan:eqn}) above is at least

\[ 1 ~ -~ \sum _{w\in V_ t} \frac{(d_ w+1)/|V_ t|}{d_ w+1} ~ ~ =~ ~ 0. \]

We’ve verified that $\Phi$ is a sub-martingale.

Many ways to keep the pessimistic estimator from decreasing

To guarantee success, the algorithm should choose the next vertex to ensure $\Phi _{t+1} \ge \Phi _ t$ . We’ve verified that this is always possible. More concretely, what choices will ensure this?

If the algorithm next considers some $v$ that has already been deleted (that is, $v\not\in V_ t$ and $v$ has a neighbor in $S$ already), then the vertex $v$ will not be taken in the independent set. By inspection, $\Phi _{t+1}$ equals $\Phi _ t$ . So, considering any deleted vertex (and not taking it) keeps the pessimistic estimator from decreasing.

Otherwise, the algorithm next considers some $v\in V_ t$ . In this case (considering the random experiment), the algorithm must add $v$ to $S$ . As shown above, the increase $\Phi _{t+1}-\Phi _ t$ is at least (\ref{turan:eqn}).

Next are a few different ways to make the choice to ensure that the increase is non-negative. They all lead to minor variations of the same basic greedy algorithm.

Maximizing the pessimistic estimator

Perhaps the most obvious way is, at each step, to choose $v\in V_ t$ so as to maximize $\Phi _{t+1}$ . Since $\Phi$ is a sub-martingale, this ensures $\Phi _{t+1}\ge \Phi _ t$ . More concretely, this gives the following algorithm:

Assign each vertex $v$ a fixed weight $1/(d_ v+1)$ . Repeat until no vertices remain: choose a vertex $v$ minimizing the total weight of $v$ and its current neighbors; add $v$ to $S$ and delete $v$ and its neighbors. Return $S$ .

In order of increasing original degree

Here’s another way to choose $v$ that works: Choose any $v\in V_ t$ whose original degree $d_ v$ is at most the minimum, over $v$ ’s remaining neighbors $w$ , of the original degree $d_ w$ of $w$ . (For example, consider the vertices in order of increasing degree in the original graph.)

By calculation, the lower bound (\ref{turan:eqn}) is non-negative: when a vertex $v$ is added, at most $d_ v+1$ vertices $w$ in $V_ t$ are deleted from the graph, and each of those has $1/(d_ w+1) \le 1/(d_ v+1)$ .

Here is one such algorithm:

Consider the vertices in order of increasing degree. Add each vertex if, when it is considered, none of its neighbors have been added.

In order of increasing current degree

Alternatively, the algorithm can choose vertices based on their degree, say $d’_ v$ , in the remaining graph, instead of the degree $d_ v$ in the original graph:

Choose any $v\in V_ t$ whose current degree $d’_ v$ is at most the minimum, over $v$ ’s remaining neighbors $w$ , of the current degree $d’_ w$ of $w$ .

This choice keeps the increase (\ref{turan:eqn}) non-negative because there are $d’_ v+1$ vertices $w\in V_ t$ that have $w\in N(v)$ , and each of those has $1/(d_ w+1)\le 1/(d_ v+1)\le 1/(d’_ v+1)$ .

Many pessimistic estimators

We’ve seen above that a given pessimistic estimator can justify a variety of algorithms. As the next exercise illustrates, a given existence proof can also give rise to multiple pessimistic estimators. This is useful in designing algorithms.

Exercise 1.

Let $\tilde d_ w$ denote the number of not-yet-considered neighbors of $w$ . (Note $d’_ w \le \tilde d_ w \le d_ w$ .) Define $\tilde\Phi _ t = |S_ t| + \sum _{w\in V_ t} 1/(\tilde d_ w+1)$ . Show that this is also a valid pessimistic estimator. What algorithms can you derive using it?

Exercise 2.

Instead of defining $Q$ to be the size of the independent set $|S|$ , define $Q$ to be the number of vertices $v$ such that $v$ is considered before any of its neighbors. Apply the method of conditional probabilities with this $Q$ . What algorithms can you derive this way?

Exercise 3.

Which of the algorithms in this section do you think might find the largest independent sets in practice? Why?

Exercise 4.

Each algorithm above produces an independent set of size at least $n/(\bar d + 1)$ . Describe examples of graphs for which the algorithms produce independent sets that are much smaller than the maximum possible independent sets.

Wikipedia

the method of conditional probabilities

Bibliography

[1]	P. Erdős. On the graph-theorem of Turán. Math. Lapok., 21:249–251, 1970.

Footnotes

If, for some random variable $Q$ , each $\Phi _ t$ was exactly the conditional expectation of $Q$ , then $\Phi$ would necessarily be a sub-martingale (in fact a martingale).

Notes on algorithms

Lecture notes on algorithms

pessimistic estimators (Turáns theorem)