Home

Blog

About

Donate

CS126

Data Structures and Algorithms

Editor's note: I'm not going to write full notes for 126. I'm just going to be putting down specifics on what I found confusing from the material.

Also, when I do include data structures they'll just be there as a brief description accompanying the important algorithms that are linked to them.

Algorithmic Analysis
Sorting Algorithms

Algorithmic Analysis

In the analysis of algorithms we are looking at the efficiency of said algorithms. In particular, we are interested in asymptotic analysis, which is looking at how the running time (or memory use) \(f(n)\) changes as a function of the number of inputs, \(n\).

Big-O and Friends

The most common asymptotic analysis measure is worst case analysis, big-O. However there is also big-omega and big theta, and these are all abstractions of the actual running time function, which we shall call \(f(n)\).

Big-O.

Big-O looks at worst cases. Formally, \(f(n) = O(g(n))\) if \(\exists c \geq 0 : c \cdot g(n) \geq f(n) \; \forall n \geq n_0\), where \(n_0 \in \mathbb{N}\) is some constant such that the prior statement becomes true.

Basically, we say \(f(n)\) is on the order of some \(g(n)\) (\(O(g(n))\)) if you can scale \(g\) (positive scale factor) to make it always above or equal \(f\) as \(n\) increases beyond a certain threshold.

We generally look for the smallest possible function \(g\) such that this is true, which gives our worst case.

Example: Say our running time \(f(n) = 3n^2 + 20\log n + 4\). We can note that the part of the function that increases the quickest is the \(n^2\) term (ignoring coefficients), and so if we scale up the \(n^2\) term enough, say to \(23n^2\), we will always be greater than \(f(n)\) past some point (possibly even at the start). Thus, we get a big-O of \(O(n^2)\).

Big-Omega.

Big-Omega looks at best cases. Formally, \(f(n) = \Omega(g(n))\) if \(\exists c \geq 0 : c \cdot g(n) \leq f(n) \; \forall n \geq n_0\), where \(n_0 \in \mathbb{N}\) is some constant. See something familiar?

In layman's terms, \(f(n)\) is on the order \(\Omega(g(n))\) if there is some positive scale factor that will put \(g\) always below or equal to \(f\) after a certain threshold \(n_0\).

Example: Say our \(f(n) = 2n^3 + 4n + \log n\). The part of the function that increases the slowest is \(\log n\), thus our big omega can be \(\Omega(\log n)\). Of course, it can also be \(\Omega(1)\), or anything that increases slower than log, but it becomes less useful.

Big-Theta

Big-Theta looks at average cases. Formally, \(f(n) = \Theta(g(n))\) if \(\exists c_1, c_2 \geq 0 : c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n) \; \forall n \geq n_0\), where \(n_0 \in \mathbb{N}\).

Here we have two variables: \(c_1, c_2\), and what this means is that we need a function \(g\), where if we scale it by two different values, \(f\) will be squeezed between the two scaled \(g\)s.

If a function has both \(O(h(n))\) and \(\Omega(h(n))\) (same \(h(n)\)!) then that function is \(\Theta(h(n)\).

To summarise:

\(O(n)\)

Worst case (\(f(n)\) is always lower)

\(\Omega(n)\)

Best case (\(f(n)\) is always higher)

\(\Theta(n)\)

Average case (\(f(n)\) is always between)

Sorting Algorithms

PQ Sort

Priority Queues

PQ sort, or Priority Queue sort, sorts data using priority queues.

A priority is a queue storing key value pairs, where the key denotes the priority. It has the following ADT:

interface PriorityQueue<K, V> {
    void insert(K k, V v);
    V removeMin();  // removes and returns
    V min();  // return no remove
    int size();
    boolean isEmpty();
}

PQ-Sorts

This sort sorts items by inserting them into the PQ, using their values as the the ordering keys. Then, items are removed successively wih removeMin().

The running time of said sort depends on the implementation of the PQ.

Selection Sort.

A variation of a PQ sort using an unsorted list.

Insertion is \(O(n)\)
Removal of one item is \(O(n)\), thus overall removal is \(O(n^2)\)

Thus Selection Sort is \(O(\textrm{horrible})\).

Insertion Sort.

A variation of a PQ sort using a sorted list (sorted on insertion). This is also \(O(\textrm{horrible})\) (\(O(n^2)\)).

Heapsort

Heaps

Heaps are binary tree structures obeying the follwing rules:

Heap Order, where a node's value is always greater than or equal to that of its children (or \(\leq\), just has to be a total order).
A Complete Binary Tree all levels of the tree (save possibly the last) are fully filled, and for the last level all nodes are filled from left to right. Thus the last node is the bottommost rightmost mode.

Heaps can be used to implement PQs also.

Insertion. Let \(k\) be the key of the element to insert.

Insert a new node \(z\) in the last position.
Store \(k\) at \(z\).
Restore the heap order property via upheap.

Upheap. Restores heap order property by swapping \(k\) upwards along the path of insertion until it reaches the root, or a node whose parent key is \(\geq k\).

A heap will always have a max hight \(O(\log n)\) so this operation is quite efficient.

Removal. A PQ's removeMin corresponds to removing the root key.

Replace the root key with the key of the last node, and return the root key.
Remove the last node (key \(\neq\) node).
Restore the heap order property via downheap.

Downheap. Let \(p\) be the root.

If there is no right child of \(p\), then let \(c\) be the left child, else let \(c\) be the smaller child of \(p\).
If \(\textrm{key}(p) \leq \textrm{key}(c)\), then the property already holds, END.
Else if \(\textrm{key}(p) > \textrm{key}(c)\), we swap the entries at \(p, c\), and run the algorithm for \(c\) (i.e. recursively swapping the entry down).

This process terminates when the key being downheaped either reaches a leaf, or is greater than all its children. THis is \(O(\log n)\).

Array-Based Heap. Let us have \(n\) elements. Element \(p\) is stored at sell \(f(p)\) such that \[ f(p) = \begin{cases} p \textrm{ is root} & 0, \\ p \textrm{ left child of } q & 2f(q) + 1, \\ p \textrm{ right child of } q & 2f(q) + 2. \end{cases} \]

Heapsort

Heapsorts sort a sequence S of \(n\) elements that have a total order relation. Each element is inserted into a Heap-based Priority Queue and then dequeued and put back into S in that order.

This takes \(O(n)\) space and \(O(n \log n)\) time.