|
return to content page/computer science home
Topics:
- Complexity Classes
- Induction
- Sorting
- Hashing
- Binary Search Trees
- Red-black Trees
- B-trees
- Graphs
- Dynamic Programming
- P and NP
- Asymtotic Notation
- f = O(g) : g is an upper bound on f
iff there are positive integers c and n0 such that for all n ≥ n0, then f(n) ≤ c*g(n)
- f = Ω(g) : g is a lower bound on f
iff there are positive integers c and n0 such that for all n ≥ n0, then f(n) ≥ c*g(n)
- f = Θ(g) : g and f's rates of growth are equivalent
iff f = O(g) and f = Ω(g)
- f = o(g) : g is an upper bound, and f != Θ(g)
iff f = O(g) and limn→∞f(n)/g(n) = 0
- O(1) - Constant
insertion into a hash table
rotation in a red-black tree
- O(log n) - Logarithmic
binary search, and insertion/deletion into/from bst's and rbt's
searching a red-black tree
- O(n) - Linear
the merge in merge sort, and partition in quicksort (best and average case)
non-comparing sorts: counting, radix and bucket
dynamic programming approach to assembly line scheduling
- O(n log n)
merge sort (n + log n) and quicksort
- O(n2) - Quadratic
insertion sort
- O(2n) - Exponential
brute-force (trial and error) method for assembly line scheduling
- O(n!) - Factorial
generate and test algorithms - eg. brute-force method of finding a Hamilton cycle
return to top
(Basis)prove for some n0
(Step)suppose for k (or if you prefer ni) it is true (the Induction Hypothesis), then prove it is true for k+1 (using the IH)
(Conclusion) thus it is true for every n such that n ≥ n0
return to top
- Insertion Sort
Algorithm:
- starting from the second-to-left and moving to the right of the array:
-    make a pointer to, or copy of, the current value (temp)
-    until you find a value that is ≤ temp, do
-      move each value up in the array
-    then place temp in last occupied position
Analysis: f(n) = n(n-1)/2 → O(n2)
- Merge Sort
Algorithm (recursive):
- if only one element left in array to sort, return
- mergesort left half
- mergesort right half
- merge sorted halves together
Algorithm for the merge part:
- merge array A and array B into array Z
- initialize a, b, z to 0
- while elements still left in arrays A and B
- if A[a] < B[b], copy A[a] into Z[z], incrementing a and z
- else copy B[b] into Z[z], incrementing b and z
- copy rest of unfinished array into Z
Analysis:
- Time to mergesort a single element takes one comparison so T(1) = 1
- else time is equal to time to mergesort two halves (T(n/2) each) plus the time to merge those two halves together which is linear (n) so T(n) = 2*T(n/2) + n
- Solve: (ie. convert to a non-recurrence definition)
| T(1) | = | 1 | | T(n/2) | = | 2T(n/2/2) + n/2 |
| T(n) | = | 2T(n/2) + n | | | = | 2T(n/4) + n/2 |
| | = | 2(2T(n/4) + n/2) + n | | T(n/4) | = | 2T(n/4/2) + n/4 |
| | = | 4T(n/4) + n + n | | | = | 2T(n/8) + n/4 |
| | = | 8T(n/8) + n + n + n |
| | = | 2kT(n/2k) + kn |
| | = | nT(1) + log2n * n | (using 2k = n) |
| | = | n + nlog2n |
- Proof:
| (BASIS) | let n = 1; then T(n) = 1 (by definition of T), and n + nlogn = 1 + 1*0 = 1 |
| so works for n = 1 |
| (STEP) | assume works for some value k ie. T(2k) = 2k + 2klog22k (Induction Hypothesis) |
| does k+1 work? ie. is T(2k+1) = 2k+1 + 2k+1log22k+1? |
| Well, | T(2k+1) | = | 2T(2k) + 2k+1 | (by definition of T) |
| | | = | 2(2k + 2klog22k) + 2k+1 | (by Ind. Hyp.) |
| | | = | 2k+1 + 2k+1k + 2k+1 | (as log22k = k) |
| | | = | 2k+1 + 2k+1(k + 1) |
| | | = | 2k+1 + 2k+1log2(k + 1) |
- For small values of n (eg less than 43) drop to insertion sort
- Quicksort
Algorithm:
- if array has only one element (or less) in it then return (sorted)
- middle_index = partition(array, low_index, high_index)
- quicksort(array, low_index, middle_index)
- quicksort(array, middle_index + 1, high_index)
The Partition Algorithm:
- let the pivot_value = array[begin], and set i and j to be on the outside of the array
- loop:
- decrement j till array[j] ≤ pivot_value
- increment i till array[i] ≥ pivot_value
- if i still less than j, simply swap values array[i] and array[j]
- else return j
Analysis:
- n + nlog n in the best case
- ½n2 + ½n in the worst case - occurs when the pivot value is either the largest or smallest in the array so:
- alternative ways of choosing pivot to minimize worst case scenario:
- pick a random value in between the begin and end values
- take the median of the begin, end and mid values
- or could first check for sorted values
- again drop to insertion sort for small n
- Counting Sort
the first non-comparing sort - sorting in linear time
Algorithm:
- initialize all values of count array (of size k) to 0
- walking through (j++) the array to sort, count_array[array[j]]++
- from i = 1 to k+1, count_array[i] += count_array[i-1]
- working backwards from the end of the original array (j = n-1; j--):
sorted_array[count_array[array[j]]] = array[j], then decrement count ie. count_array[array[j]]--
Analysis: 2(k+n), so O(n)
assumes input are integers in a small range
- Radix Sort
use a stable sort (eg. counting sort) to sequentially sort each digit
eg. sorting 64-bit numbers at 16-bits per pass, so need 4 passes (constant) so O(n) still
disadvantage: requires some uniformity of number
- Bucket Sort
assumes uniform distribution
- for n keys, create n buckets of even subintervals
- calculate bucket key goes into: first f(key)→ [0,1) range (ie. scale), then bucket = n*f(key)
- sort buckets with insertion sort
return to top
- Key Transformation
first convert key (eg. a word) into a natural number, then apply the hashing function
- Universal Hashing (to avoid collisions)
- choose a prime, p, such that p > all keys
- then ha,b(key) = ((a*key + b) % p) % size_table
- choose a and b randomly, such that both < p-1, and > 0
- Perfect Hashing (to avoid/deal with collisions)
- a hash table of hash tables, so O(1) in the worst case
- using universal hashing, can get no collisions in the secondary hash tables
- choose the primary hash function as above, using trial and error to get good values of a and b
- keep count, ni, of number of items for each slot
- p is as for the primary hash function, size_table = ni2 and ai and bi again chosen randomly and checked to ensure NO collisions
- Chaining (to deal with collisions)
- good for an unknown number of keys, and if going to delete keys often
- less efficient insert and search
- list management overhead
- retrieval speed not impeded by deletes
- Linear Probing
- set stepsize (eg. 1) to check from each current location
- Quadratic Probing
- h(key, num_collisions) = (h(key) + num_collisions2)%table_size
ie. we start back at our initial slot and look 1, 2, 4, 8 etc. positions out from there
- Double Hashing
- uses a second hash function to determine the stepsize for each key once a collision has occured
return to top
- Binary Trees to Binary Search Trees
trees better than hash tables for:
- lots of deletions (pointer based so good for dynamic data structures)
- traversals
definition:
- and empty tree
- or a root node containing a key, and data fields, and left and right subtrees such that all key values in the TL are less than key, and all key values in the TR are greater than key
- BST Search
O(log n)
- if T is empty → return "item not found"
- compare x to the key value in T
- if equal, return T
- if x < key, search TL else search TR
- BST Insertion
O(log n)
- if T is empty, make root node with key = key_insert
- else compare key_insert to the key value in T
- if key_insert < key, insert into the TL
- else insert into the TR
- Min and Max Keys
min = find left-most node, max = find right-most node
- Predecessor and Successor
pred = the next smallest key
- if T has a TL, find the max of that
- else find first ancestor in which T is in its right subtree
succ = the next largest key
- if T has a TR, find the min of that
- else find first ancestor in which T is in its left subtree
- BST Deletion
- search till we find the item, k, at the root then
- delete root
- case 1: root has an empty TL or TR, so just replace the root with that child
- case 2: both children non-empty: replace root by it's successor, then recursively delete the successor from the TR
- Tree Traversal
- Inorder
uses: sort (similar to quicksort - we have pivots), print in order
- TL
- root
- TR
- Preorder
uses: get a picture of tree structure (write to file in preorder, so can remake the exact same bst again by inserting in this order), incremental depth
- root
- TL
- TR
- Postorder
uses: compiler for arithmetic (gives equation in Polish notation, operators at nodes, varibles/numbers at leaves)
- TL
- TR
- root
return to top
- Properties of RBTs
- root is black, and dummy leaves are black, all other nodes are either red or black
- no red nodes adjacent to each other (ie. if parent red, both children must be black)
- all paths from a node to the leaves contain the same number of black nodes
- Black-height and Height
black-height = number of black nodes from a node n, to a leaf (inc. leaf but not node itself)
height = count up from (dummy) leaves
- RBT Rotation

- RBT Insertion
- insert as per BST insertion
- colour node x red
- if parent[x] also red, have a violation so fix:
- case 1: x's uncle red
solution: push black down from grandparent to parent and uncle

- case 2: x's uncle black, x is an inside child
solution: rotate x up and x's parent down to get case 3

- case 3: x's uncle black, x is an outside child
solution: swap colour of parent and grandparent, rotate grandparent down and parent up

- RBT Deletion
let z be the node to be deleted
y = the node that gets spliced out
= z (if z is a leaf node) or succ[z]
x be the node that replaces the spliced out node, y
- carry out deletion as per bst deletion, only leave colour field of z the same
- if spliced out node, y (z or succ[z]) is black, give x an extra black and fix (first take care of trivial case where x is red, simply colour black) when x is black:
- case 1: x's sibling is red
solution: swap colour between x's sibling and x's parent, rotate sibling up and parent down

- case 2: x's sibling is black, sibling's children both black
solution: push extra black from x, and black from sibling (so sibling gets coloured red), up to parent

- case 3: x's sibling is black, sibling's closest child is red (and furtherest child black)
solution: swap colour between sibling and closest child, rotate up sibling's closest child and down sibling

- case 4: x's sibling is black, sibling's furtherest child is red
solution: swap colour of sibling and its furtherest child, rotate sibling up and parent down, then push black from x to parent

return to top
- B-tree Properties
minimum degree, t, so
- number of keys is t-1 to 2t-1 (except root can have less)
- number of children is t to 2t (except leaf nodes) eg. b-tree where t=2 is called a 2-3-4 tree as each node can have 2, 3 or 4 children
- B-tree Insertion
- find the appropriate leaf node
- if not full, simply insert the new key
- else if full (ie. already has 2t-1 keys in it), split:
- put the middle key into the parent (if node was the root, create a new root as parent)
- leave smallest t-1 keys in existing node, put biggest t-1 keys into new sibling node
- insert key into appropriate child leaf
- B-tree Deletion
- on way to node containing key to delete (and that node too), strengthen nodes with num_keys < t (ie. nodes that only have t-1 keys), by either:
- borrow from sibling, via parent
- or merge with sibling and intermediate parent key
- once we've found our (strengthened, so num_keys > t-1) node from which to delete a key:
- case 1: node is a leaf node → simply delete the key
- case 2: node is an internal node, and there are at least t keys in node containing the key's pred
→ swap key with pred and recursively delete
- case 3: node is an internal node, and there are at least t keys in the node containing the key's succ
→ swap key with succ and recursively delete
- case 4: → merge key down and out with preceding and succeding child nodes to give one large node from which to delete the key
return to top
- Implementing Graphs
- Adjacency Matrix
- n×n Boolean array A such that if there is an edge from i to j then A[i][j] = 1, and 0 if there is no edge
- mirror line across diagonal for undirected graphs
- used when graph is dense (num_edges close to n2)
- most efficient implementation for determining if an edge between two given vertices exist (O(1) cf. adj. list this is O(n))
- Adjacency List
- array of linked lists, where each list contains all vertices that have edges to the vertex that the array position represents
- used when graph is sparse (num_edges < n)
- most efficient implementation for graph traversal, and finding all vertices adjacent to some vertex j
- Breadth-first Search
- finds the shortest paths from the source node, s, to all other vertices
- Algorithm:
- colour all nodes white, except s: colour grey and enqueue(s)
- while still vertices in the queue
- dequeue vertex, u, from front of queue
- for each vertex, v, adjacent to u, that is white
- colour v grey, enqueue(v), and set distance[v] = distance[u] + 1
- colour u black
- Depth-first Search
- Algorithm
use timestamps: d = discovered time, f = finished time
DFS leads to backtracking
- colour all vertices white, and set time = 0
- for each vertex, u, that is not white do DFS_visit(u)
DFS_visit(u) algorithm (there is a global variable, time):
- colour u grey
- increment time, then set d[u] = this new time
- for each vertex, v, adjacent to u, and that is white:
- colour u black, increment timestamp again, and set f[u] = current time
- Applications
- edge classification
- T = tree edges (grey to white/destination - what we actually follow)
- B = back edges (grey to grey - when a newly discovered vertex is pointing back to it's own ancestor)
- F = forward edges (grey to black - when a vertex is pointing to an already finished decendent)
- C = cross edges (grey to black - black vertex was coloured black by another route - not including the grey vertex)
a directed, or undirected, graph is acyclic iff DFS produces no back edges
- topological sort of a dag
dag = directed acyclic graph (has no directed cycles)
Algorithm (equiv. to sort in order of decreasing finishing times):
- while doing DFS, as finish (ie. colour black) a vertex, add to the front of the list
- connected components of an undirected graph
before calling a new DFS_visit from the main DFS algorithm, set a new connected_component_number
- strongly connected components
sets of vertices in which any two are mutually reachable
Algorithm:
- call DFS on the graph to calculate finishing times for each vertex
- find the transpose of the graph (will have the same strongly connected components)
- call DFS on this transposed graph, calling vertices in order of decreasing finishing time
- each tree of the second DFS is a strongly connected component
- Heaps
use as a priority queue, where value at top/front of the queue is used next
min-heaps and max-heaps - bit like a nearly complete binary tree which can be implemented as an array such that for a node in cell i (cell 0 being empty):
- left child in cell 2i, right child in cell 2i+1
- parent in cell floor(i/2)
Insertion into heaps:
- add item to end of heap/array
- heapify: compare value with parent and percolate up (swapping with parent) as necessary
Extraction from top of heap:
- remove root
- replace by rightmost leaf, val
- heapify: compare val with children and percolate down (swapping with largest child if a max-heap, and smallest child if a min-heap)
insert and extract both O(log n) in the worst case
- Prim's Algorithm
for finding the minimum spanning tree in an undirected weighted graph from some source vertex, s
- initialize all vertices priority values to ∞ (except put that of s to 0), and set all pred[vertex] = -1
- insert all these vertices into a priority queue
- while the priority queue (a min-heap) is non-empty:
- extract the minimum vertex, u, and for each of it's adjacent vertices, u, still in the queue:
- if weight(u, v) < priority[v] then set priority[v] = weight(u, v) and pred[v] = u
- Dijkstra's Algorithm
for finding the single source shortest path from a source vertex, s, to every other vertex in a directed weighted graph
- initialize all vertices distance_from_start, d to ∞ (except put that of s to 0)
- insert all these vertices into a priority queue, based on d
- while the priority queue (a min-heap) is non-empty:
- extract the minimum vertex, u, and for each of it's adjacent vertices, u:
- RELAX: if d[u] + weight(u, v) < d[v] then set d[v] = d[u] + weight(u, v)
return to top
- Elements of Dynamic Programming
- subproblem optimality
- recursion
- memoising (to avoid inefficiency of subproblem overlap eg. Sage)
- bottom-up
- The Fractional Knapsack Problem
easy - just use a greedy algorithm: first put in as much of the item with the best value/cost ratio, then the next best etc.
- The 0-1 Knapsack Problem
- create a num_items × increm_weight value matrix V[k,w] and initialize first row to 0
- for each item, k:
- initialize V[k,0] = 0 (ie. eventually sets first column to zero)
- for each value of the incrementing cost/weight, w:
- if wk > w (ie. item k can't fit) then set V[k,w] = V[k-1, w]
- else if V[k-1,w] > valuek + V[k-1,w-wk] (better value without including item k) then set V[k,w] = V[k-1,w]
- else (better value with item k) V[k,w] = valuek + V[k-1,w-wk]
O(n!) really but here we've got it running at O(nW) time
- Assembly-line Scheduling
brute-force trial-and-error approach is O(2n) (as 2n ways in which stations on line 1 can be chosen)
- calc. optimum timeline1[j] = min(timeline1[j-1] + aline1,j, timeline2[j-1] + aline1,j + transfer_timeline2,j-1)
- repeat for line 2
- find the minimum overall time and the path taken to get it
O(n)
return to top
P = polynomial: solution can be found in O(nk) time
NP = non-deterministic polynomial: solution can be checked in polynomial time
NP-complete: the set of problems whose solutions can be checked in polynomial time, and who are all special cases of each other, and none has had a solution found in polynomial time (and we believe none exists - but this hasn't been proven yet)
NP-complete problems:
- the Hamilton Cycle Problem (HCP) is O(n!) to solve by making every possible permutation of the n vertices and then seeing if there are edges between them all (O(n) to check solution)
- 0-1 Knapsack (solved in O(nW) where W is the total weight allowed) and Bin-packing
- the Timetabling Problem - preventing clashes
return to top
|