Index

Dynamic Programming

Principle of Optimality and Recurrences

If a solution is optimal, any portion of it should be optimal
with respect to all the other possible choices of that particular portion. So, if a solution is not optimal, it shouldn't be further exploited.

Memoization Tables

Memoization tables are a way to store the information of a Dynamic Programming problem. They are used to speed up computing by saving the result of a certain operation, thus preventing running the same operation on the same input values more than once.

Shortest Path Using Dynamic Programming

$c o s t (i, j)$ is the cost of an optimal path from node $j$ at stage $i$ to sink
$c (j, k)$ is the cost associated with the $(j, k)$ edge

c o s t (i, j) = m i n {c (j, k) + c o s t (i + 1, k)}

Why it works? It makes sure that a partial solution is optimal. Using that principle, we can reuse partial optimal solutions to build a global solution.

0/1 Knapsack

$v a l [i, j]$ is the maximum value that is possible to transport with a weight limit of $j$ $(j \leq w)$ , allowing only objects numbered from $1$ to $n$ .
The optimal solution is defined by $v a l [n, W]$

v a l [i, j] = m a x (v a l [i - 1, j], v a l [i - 1, j - w_{i}] + v_{i})

v a l [i, 0] = 0

v a l [0, j] = 0, j \geq 0

v a l [i, j] = - \infty, i < 0

Here, $v a l [i - 1, j]$ means not including object $i$ , and $v a l [i - 1, j - w_{i}] + v_{i}$ means including object $i$ .
This gives a complexity of $O (n W)$ .

Longest Common Sub-Sequence (LCS)

Given 2 sequences $X$ and $Y$ , $Z$ is a common sub-sequence if $Z$ is a sub-sequence of $X$ and $Y$ .
Example:

$X$ = abefcghd
$Y$ = eagbcfdh
$Z$ = abcd
A brute force solution is impractical ( $2^{(n + m)}$ )

c [i, j] = {\begin{cases} 0 & if i = 0 or j = 0 \\ c [i - 1, j - 1] + 1 & if i, j > 0 and x_{i} = y_{j} \\ m a x (c [i - 1, j], c [i, j - 1]) & if i, j > 0 and x_{i} \neq y_{j} \end{cases}

By using this, we get a complexity of $O (n \times m)$

Floyd-Warshall Algorithm for APSP

The goal is to find the shortest path between 2 nodes, passing through every edge.

If all the weights are non negative, we can use the Dijkstra's Algorithm , assuming each vertex as source. This has complexity $O (V E \log V)$ , and if the graph is dense the complexity is $O (V^{3} \log V)$ .
If the are negative weights, we can use the Bellman-Ford-Moore algorithm, assuming each vertex as source. This has complexity $O (V^{2} E)$ , and if the graph is dense the complexity is $O (V^{4})$ .
Although these approaches work, we need to find algorithms with better performance.

Floyd-Warshall Algorithm:

Start by creating an adjacency matrix following these rules:

w_{i j} = {\begin{cases} 0 & if i = j \\ weight of edge (i, j) & if i \neq j and (i, j) \in E \\ \infty & if i \neq j and (i, j) \notin E \end{cases}

In short, the diagonal is 0, if there is an edge from $i$ to $j$ use the weight of that edge, otherwise use $\infty$ .

Now, we need a matrix to use as a memoization table. In this case, we will use a 3D matrix of size $n \times n \times n$ where $d p_{i j}^{(k)}$ means the shortest path from $i$ to $j$ routing through node $0, 1, \dots, k - 1, k$ .
In order to populate the matrix, use the following rules:

d p_{i j}^{(k)} = {\begin{cases} w_{i j} & if k = 0 \\ m i n (d p_{i j}^{(k - 1)}, d p_{i k}^{(k - 1)} + d p_{k j}^{(k - 1)}) & if k \geq 1 \end{cases}

Lets break down the second case:

$d p_{i j}^{(k - 1)}$ is the path from $i$ to $j$ without using $k$
$d p_{i k}^{(k - 1)}$ is the path from $i$ to $k$
$d p_{k j}^{(k - 1)}$ is the path from $k$ to $j$

This gives a space complexity of $O (V^{3})$ , but it is possible to improve.
Since state $k$ depends on state $k - 1$ , it is possible to save one dimension, thus reducing the space complexity to $O (V^{2})$ .
In doing so, we can also simplify the rules to:

d p_{i j} = {\begin{cases} w_{i j} & if k = 0 \\ m i n (d p_{i j}, d p_{i k} + d p k j) & if k \neq 0 \end{cases}

Another matrix that is needed is a $n e x t$ matrix, an $n \times n$ 2D matrix to save the path. It is initialized iterating over $w$ , and if there is a path between $i$ and $j$ then $n e x t_{i j}$ should be $j$ .

How to run the algorithm?

for (int k = 0; k < n; k++){
	for (int i = 0; i < n; i++){
		for (int j = 0; j < n; j++){
			if (dp[i][k] + dp[k][j] < dp[i][j]){
				dp[i][j] = dp[i][k] + dp[k][j];
				next[i][j] = next[i][k];
			}
		}
	}
}

If the graph has negative cycles, run the algorithm again, but if a better path is detected, mark the edges as $- \infty$ and the $n e x t$ edge as $- 1$ .
This has a complexity of $O (V^{3})$ .

Exhaustive Search and Brand-and-Bound

There is not much to say about this.
Exhaustive search is searching every possible solution, and we can use bounding (pruning) to stop searching a certain branch if it is impossible to reach a solution.
Example:
Sum of subsets
Given an integer $M$ and a set of positive integers $E$ , find all subsets of $E$ whose sum is equal to $M$ .
Using backtracking, we can build a binary tree of either using or not using a certain element.
We can use a bounding function to stop searching if we have already exceeded $M$ or if the sum of the remaining elements is smaller than $M$ .

NP and NP-Complete Algorithms

The Halting Problem

Let's imagine we have a machine (or program). Let's call this machine $H$ . $H$ takes a description of a program ( $p$ ) and an input ( $i$ ) for that program. Then, $H$ says if problem $p$ with input $i$ is going to halt (finish) or not (run forever).
Now, let's extend $H$ , calling it $H^{'}$ . If $H$ returns $t r u e$ , then $H^{'}$ should run forever, and if $H$ returns $f a l s e$ , $H^{'}$ should halt.
Let's run $H^{'}$ , feeding itself as its arguments, so $p$ is $H^{'}$ and $i$ is also $H^{'}$ . This generates a contradiction:

If $H$ says that the program halts, then $H +$ will not halt, thus $H$ produced the wrong answer.
If $H$ says that the program runs forever, then $H +$ will halt, thus $H$ produced the wrong answer.

Decision vs Optimization Problems

In decision problems, the answer should be yes/no, but in optimization problems, the answer should be a subset of feasible answers.
Abstract example:

Optimization problem: Find the minimum value $k$ so that some property holds.
Decision problem: Assuming $k$ is some value, does some property holds?
Their difficulty is similar, because if one of them is easier than the other, then we could use it to solve the other.

P Problems (Polynomial Time)

P Problems are decision problems that can be solved in polynomial time ( $O (n^{k})$ ).

NP Problems (Nondeterministic Polynomial Time)

A decision problem is a NP problem if it is possible to check if a solution is valid in polynomial time ( $O (n^{k})$ ).

NP Hard Problems

A problem is NP-hard if an algorithm for it can be transformed into one for solving any NP problem. So, NP problems are at least as hard as any NP problem.

NP Complete Problems

If a problem has a known non-deterministic algorithm then we can classify it as NP-Complete.

Polynomial time reductions

Suppose that $F$ and $G$ are two problems. A polynomial time reduction, or just reduction from $F$ to $G$ is a way to show that $F$ is no harder than $G$ , which means that a polynomial time algorithm for $G$ implies a polynomial time algorithm for $F$ .

Satisfiability (SAT)

CNF Satisfiability serves as a base problem to relate all non-deterministic problems together, in a sense that if one of them is solved, then the others are also solved.
The CNF-SAT problem means taking a set and a propositional calculus formula and finding out which values make the formula true. It is a NP-Hard problem.

Now, we can relate the SAT with other problems, meaning that if we find a polynomial solution to SAT, then all the other problems can also be solved in polynomial time.
Can we relate the 0/1 Knapsack with the SAT?
At first glance, they might seem different, but in reality they are almost the same problem:
Let's imagine we have 3 items that we can use. So, the complexity is also $2^{n}$ , since we can either use the item ( $1$ ) or not use it ( $0$ ). This generates a state tree very similar to the previous tree.
This means that SAT reduces 0/1 Knapsack:

0/1 Knapsack \leq_{p} SAT

o r

SAT \propto 0/1 Knapsack

Since SAT is also NP-Complete (because there is a know non-deterministic algorithm for it), 0/1 Knapsack also is NP-Complete.

Clique Problem

What is a clique?
A clique is a graph / subgraph where there is an edge connecting all pairs of vertices.

Knowing the maximum size of the clique or if there is a clique is a NP-Hard problem, but how do we prove it?
Abdul Bari explained it in a very simple way: https://www.youtube.com/watch?v=qZs767KQcvE&t=10s

Approximation Algorithms

I'm not going to prove the algorithms, the lecture slides have them.
The important concept is the approximation factor ( $ρ (n)$ ):

\begin{matrix} m a x (\frac{C}{C^{*}}, \frac{C^{*}}{C}) \leq ρ (n) \\ where C is the solution produced by the approximation \\ and C^{*} is the optimal solution \end{matrix}

So, a 2-approximation algorithm means that the solution is at most 2 times worse than the optimal solution.

Linear Programming

I highly recommend this video: https://www.youtube.com/watch?v=E72DWgKP_1Y. It explains the concept of linear programming in a very easy way.

Integer Linear Programming

A linear programming program is an integer linear programming if our values ( $x_{1}$ , $x_{2}$ , ...) need to be integers.
We can start by solving it like a normal linear programming problem (ignoring the fact that the values need to be integers).
Then, the optimal solution will have values that are not integers (if that's not the case, then the solution found is the solution for the integer problem).
Now, we need to create different branches by either rounding one value up or down. Usually, we round the biggest value.
Example:

A further explanation of the problem can be found here: https://www.youtube.com/watch?v=upcsrgqdeNQ&t=2s