Editor's note: I'm not going to write full notes for 126. I'm just going to be putting down specifics on what I found confusing from the material.
Also, when I do include data structures they'll just be there as a brief description accompanying the important algorithms that are linked to them.
In the analysis of algorithms we are looking at the efficiency of said algorithms. In particular, we are interested in asymptotic analysis, which is looking at how the running time (or memory use) \(f(n)\) changes as a function of the number of inputs, \(n\).
The most common asymptotic analysis measure is worst case analysis, big-O. However there is also big-omega and big theta, and these are all abstractions of the actual running time function, which we shall call \(f(n)\).
Big-O.
Big-O looks at worst cases. Formally, \(f(n) = O(g(n))\) if \(\exists c \geq 0 : c \cdot g(n) \geq f(n) \; \forall n \geq n_0\), where \(n_0 \in \mathbb{N}\) is some constant such that the prior statement becomes true.
Basically, we say \(f(n)\) is on the order of some \(g(n)\) (\(O(g(n))\)) if you can scale \(g\) (positive scale factor) to make it always above or equal \(f\) as \(n\) increases beyond a certain threshold.
We generally look for the smallest possible function \(g\) such that this is true, which gives our worst case.
Example: Say our running time \(f(n) = 3n^2 + 20\log n + 4\). We can note that the part of the function that increases the quickest is the \(n^2\) term (ignoring coefficients), and so if we scale up the \(n^2\) term enough, say to \(23n^2\), we will always be greater than \(f(n)\) past some point (possibly even at the start). Thus, we get a big-O of \(O(n^2)\).
Big-Omega.
Big-Omega looks at best cases. Formally, \(f(n) = \Omega(g(n))\) if \(\exists c \geq 0 : c \cdot g(n) \leq f(n) \; \forall n \geq n_0\), where \(n_0 \in \mathbb{N}\) is some constant. See something familiar?
In layman's terms, \(f(n)\) is on the order \(\Omega(g(n))\) if there is some positive scale factor that will put \(g\) always below or equal to \(f\) after a certain threshold \(n_0\).
Example: Say our \(f(n) = 2n^3 + 4n + \log n\). The part of the function that increases the slowest is \(\log n\), thus our big omega can be \(\Omega(\log n)\). Of course, it can also be \(\Omega(1)\), or anything that increases slower than log, but it becomes less useful.
Big-Theta
Big-Theta looks at average cases. Formally, \(f(n) = \Theta(g(n))\) if \(\exists c_1, c_2 \geq 0 : c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n) \; \forall n \geq n_0\), where \(n_0 \in \mathbb{N}\).
Here we have two variables: \(c_1, c_2\), and what this means is that we need a function \(g\), where if we scale it by two different values, \(f\) will be squeezed between the two scaled \(g\)s.
If a function has both \(O(h(n))\) and \(\Omega(h(n))\) (same \(h(n)\)!) then that function is \(\Theta(h(n)\).
To summarise:
\(O(n)\)
\(\Omega(n)\)
\(\Theta(n)\)
PQ sort, or Priority Queue sort, sorts data using priority queues.
A priority is a queue storing key value pairs, where the key denotes the priority. It has the following ADT:
interface PriorityQueue<K, V> {
void insert(K k, V v);
V removeMin(); // removes and returns
V min(); // return no remove
int size();
boolean isEmpty();
}
This sort sorts items by inserting them into the PQ, using their values as the the ordering keys. Then, items are removed successively wih removeMin()
.
The running time of said sort depends on the implementation of the PQ.
Selection Sort.
A variation of a PQ sort using an unsorted list.
Thus Selection Sort is \(O(\textrm{horrible})\).
Insertion Sort.
A variation of a PQ sort using a sorted list (sorted on insertion). This is also \(O(\textrm{horrible})\) (\(O(n^2)\)).
Heaps are binary tree structures obeying the follwing rules:
Heaps can be used to implement PQs also.
Insertion. Let \(k\) be the key of the element to insert.
Upheap. Restores heap order property by swapping \(k\) upwards along the path of insertion until it reaches the root, or a node whose parent key is \(\geq k\).
A heap will always have a max hight \(O(\log n)\) so this operation is quite efficient.
Removal. A PQ's removeMin
corresponds to removing the root key.
Downheap. Let \(p\) be the root.
This process terminates when the key being downheaped either reaches a leaf, or is greater than all its children. THis is \(O(\log n)\).
Array-Based Heap. Let us have \(n\) elements. Element \(p\) is stored at sell \(f(p)\) such that \[ f(p) = \begin{cases} p \textrm{ is root} & 0, \\ p \textrm{ left child of } q & 2f(q) + 1, \\ p \textrm{ right child of } q & 2f(q) + 2. \end{cases} \]
Heapsorts sort a sequence S of \(n\) elements that have a total order relation. Each element is inserted into a Heap-based Priority Queue and then dequeued and put back into S in that order.
This takes \(O(n)\) space and \(O(n \log n)\) time.