Submodular Functions:
Theory and Applications I

Set function

Ground set V={cake,dog,ball,}

\( F: 2^{\mathbb{V}} \rightarrow \mathbb{R} \)

\( F(\{ cake, dog \}) = \) cost, utility, probability, ...

Examples

\( \max\limits_{S} F(S) \)

Assumptions

\( F(\varnothing ) = 0 \)

\( F \) is a black box "oracle"

V = variables to observe
F(S) = "information"

V = images, (sentences, ...)
F(S) = "representation"

V = matrix approximation
F(S) = "most representative columns"

\( \min\limits_{S} F(S) \)

V = data points
F(S) = "coherence/separation"

V = data points
F(S) = "coherence/matching"

V = coordinates
F(S) = "coherence"
(Genes, signal noise reduction, ...)

Why are convex functions so interesting?

Occur in many models

Preserved under operations and transformations

Sufficient mathematical structure for an elegant theory

Prove something about one, you prove something about many

Efficient minimisation

Why are submodular functions so interesting?

The same reasons convex functions are!

Definitions

Diminishing gains
(more intuitive)

For all \(A \subseteq B \) and \( s \notin B \)


\(F(A \cup s) - F(A) \geq F(B \cup s) - F(B) \)

Adding sensor to A = {1, 2} is more
informative than adding one to B = {1, 2, 3, 4}

Economies of scale
Cost of fries + drink <= fries and two drinks

Union-intersection
(easier to show)

For all \(T, S \subseteq V \)


\(F(S) + F(T) \geq F(S \cup T) + F(S \cap T) \)

Screen Shot 2018-06-11 at 13.58.01

V = random variables we can observe
F(S) = "reduction in uncertainty" = \( F(A) = H(Y) - H(Y | X) = I(Y; X) \)

Entropy

\( X_1, \cdots, X_n \) discrete random variables

\( F(S) = H(X_S) = \) joint entropy of variables indexed by S

Linear independence

V = columns in matrix
F(S) = rank(S)

Vectors in S are linearly independent
\(\iff \) F is modular on S: F(S) = |S|

Statistical dependence

\( X_i \in S \) statistically independent
\( \iff \) H is modular on S: \( H(X_S) = \sum\limits_{e \in S} H(X_e) \)

The more dependent the variables, the more submodular