|
 |
CS 241: Data Structures—Midterm Exam Review Notes
|
|
|
 |
Important notes!
|
|
|
 |
click on the triangles to the left to expand or contract sections!
|
|
|
 |
use the text size feature in your browser to make this text larger and more readable!
|
|
|
 |
About the exam
|
|
|
 |
the final exam will take place on Friday, May 4th, from 2-5pm, in the usual lecture room (Ford 204)
|
|
|
 |
the exam will be designed so that you should be able to complete it in one-and-a-half to two hours
|
|
|
 |
you will not have access to books, notes, computers, networks or neighbors during the exam (you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)
|
|
|
 |
In preparing for the exam, you should consult:
|
|
|
 |
your textbook (see especially chapters and sections mentioned below)
|
|
|
 |
your notes from lecture and lab—these are likely to be especially helpful
|
|
|
 |
the home page for the course and various links, examples, etc., provided there
|
|
|
 |
… and also especially the lab assignments linked from the home page, as well as your own programs
|
|
|
 |
graded quizzes you completed during lecture (3 of them, hopefully)
|
|
|
 |
Format: the exam will consist of several kinds of questions; typically:
|
|
|
 |
10-15 True-or-False questions at 1-2 points each
|
|
|
 |
8-10 short answer questions at 4-6 points each
|
|
|
 |
2-4 longer answer questions at 8-15 points each
|
|
|
 |
These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!
|
|
|
 |
The final exam is comprehensive!
|
|
|
 |
But the emphasis will be on material from after the midterm
|
|
|
 |
… but here are the main subject areas again (with book chapters):
|
|
|
 |
General background to Data Structures and modularity
|
|
|
 |
Chapter 1, section 1.1; also Chapter 3, section 3.1
|
|
|
 |
Math review and recursion
|
|
|
 |
Chapter 1, sections 1.2-1.3
|
|
|
 |
Java review and new features
|
|
|
 |
Chapter 1, sections 1.4-1.5
|
|
|
 |
Review—see external notes!
|
|
|
 |
Algorithm analysis and asymptotics (O notation)
|
|
|
 |
Chapter X, sections y-z
|
|
|
 |
Arrays, ArrayLists and linked lists
|
|
|
 |
Chapter 3, sections 3.2-3.5
|
|
|
 |
Stacks and Queues
|
|
|
 |
Chapter 3, sections 3.6-3.7
|
|
|
 |
Infix and Postfix expressions
|
|
|
 |
Chapter 3, sections 3.6.3
|
|
|
 |
Introduction to Applets and GUIs
|
|
|
 |
See online examples and resources
|
|
|
 |
online examples at course homepage, near the top
|
|
|
 |
Trees
|
|
|
 |
See Chapter 4 of the text, especially sections 4.1-4.4 and 4.6 (peruse also splay trees in 4.5)
|
|
|
 |
Tree definitions and terminology
|
|
|
 |
A tree is a collection structure which can hold elements in positions called nodes; a tree is a linked structure, but unlike a linked list, each node may have several linked nodes
|
|
|
 |
nodes: the “places” where data items are kept in the tree
|
|
|
 |
links: the connections between nodes (represented as references/“pointers” in Java)
|
|
|
 |
path: a sequence of links between nodes (and the nodes along the way, depending on context)
|
|
|
 |
root: the top node of a tree
|
|
|
 |
sub-tree: a node in a tree considered as a root of its own tree
|
|
|
 |
children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)
|
|
|
 |
parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)
|
|
|
 |
height: the number of links between the root of a tree and its furthest descendant
|
|
|
 |
internal nodes: nodes which have a child
|
|
|
 |
leaf or external node: nodes which have no children (or all null children, in a Java implementation)
|
|
|
 |
fringe: the set of all leaves of a tree, considered as a sequence
|
|
|
 |
ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say
|
|
|
 |
sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles
|
|
|
 |
in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (sub-graphs)
|
|
|
 |
General trees
|
|
|
 |
in a general tree, any node may have an arbitrary number of children
|
|
|
 |
we could represent children in a general tree using any (ordered) collection, but a linked list might be best
|
|
|
 |
we can also use a first-child/next-sibling representation, which shows that a binary-tree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)
|
|
|
 |
Binary trees
|
|
|
 |
when every node in a tree has at most two children and they are distinguished as left- and right-children
|
|
|
 |
in other words, when every node has a left and right child, either of which may be null
|
|
|
 |
Expression trees
|
|
|
 |
since arithmetic is built mostly out of binary operators, we can represent arithmetic expressions as binary trees, where the internal nodes hold operators and the leaves are numeric constants (literals)
|
|
|
 |
expressions can be written in various systematic styles: infix with parentheses (the usual), postfix or prefix
|
|
|
 |
postfix puts the arguments first, then the operator afterwards (post-position); prefix is the opposite
|
|
|
 |
postfix and prefix forms never need parentheses
|
|
|
 |
you should be able to convert between different forms of expressions for the exam!!
|
|
|
 |
various tree traversals (see below) yield similar representations of expressions
|
|
|
 |
except for the parentheses of inorder/infix, which must be generated separately, before and after the left and right subtrees
|
|
|
 |
Tree traversals
|
|
|
 |
a tree traversal is a process visiting all the nodes (or elements thereof) in a tree
|
|
|
 |
typical traversals are implemented recursively: we “visit” the node and recursively traverse the sub-trees, in some order
|
|
|
 |
by varying the relative order of node visitation and sub-tree traversal, we get different standard orders:
|
|
|
 |
left-right pre-order: visit node, traverse left, traverse right
|
|
|
 |
left-right inorder: traverse left, visit node, traverse right
|
|
|
 |
left-right postorder: traverse left, traverse right, visit node
|
|
|
 |
… and so on, for three others but right-to-left (6 possible permutations all together)
|
|
|
 |
if we want to implement the Iterator interface, we can’t use recursion: use an explicit stack instead
|
|
|
 |
if we substitute a queue for the stack, we get breadth-first traversals (also known as level-order)
|
|
|
 |
Binary search trees
|
|
|
 |
a binary search tree is a binary tree whose elements support some ordering, and where:
|
|
|
 |
all the nodes to the left of the root hold values that are less than the the one in the root
|
|
|
 |
all the nodes to the right of the root hold values that are greater than the the one in the root
|
|
|
 |
similarly for all sub-trees (i.e., same rule throughout the tree)
|
|
|
 |
equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)
|
|
|
 |
binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key
|
|
|
 |
insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:
|
|
|
 |
compare value being inserted (or sought) to the element in the root, go left or right as appropriate
|
|
|
 |
height of a bushy tree is O(log n) where n is the total number of elements
|
|
|
 |
int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups
|
|
|
 |
Deletion from BSTs
|
|
|
 |
deletion from a BST depends on some cases:
|
|
|
 |
deletion of a leaf is easy: just set the reference in the parent to null
|
|
|
 |
deletion of a node with a single child is almost as easy: just re-set the reference int he parent to circumvent the deleted node
|
|
|
 |
if a node has two children, look for the “adjacent node” (next smaller or next larger) in the tree; this will be in the “inside corner” of one of the sub-trees (extreme left of right sub-tree or vice versa); replace the element in the deleted node with that from the insider corner node, then delete that node (which is easier)
|
|
|
 |
Balanced trees (AVL)
|
|
|
 |
since “bushy” or balanced trees are important to make BSTs efficient, it is worth considering how to keep trees balanced
|
|
|
 |
many possible definitions of balance, but it is not enough that the tree “balances” around the root like a coat hanger
|
|
|
 |
for AVL trees, the difference in height between sub-trees must be at most one (throughout the tree)
|
|
|
 |
for the exam, you should be able to recognize this property of a tree
|
|
|
 |
when inserting a node, an AVL tree may get out of balance: we correct this by performing rotations
|
|
|
 |
there are several cases to consider, with symmetries, some requiring single and some requiring double rotations
|
|
|
 |
see the textbook pages 123-136 for details, examples and pictures
|
|
|
 |
Other kinds of trees
|
|
|
 |
there are many other kinds of trees that maintain balance using rotations: red-black trees, splay trees, etc.
|
|
|
 |
splay trees in particular use a policy of bringing sought items to the root using rotations (with the understanding that they are likely to be sought more often later)
|
|
|
 |
Sorting
|
|
|
 |
See Chapter 7 of the text, especially sections 7.1-7.8
|
|
|
 |
Introduction to sorting
|
|
|
 |
one of the most common operations
|
|
|
 |
ordered data supports binary search (fast!)
|
|
|
 |
comparison interfaces in Java
|
|
|
 |
Comparable—internal/“natural”: this.compareTo(that)
|
|
|
 |
Comparator—external: compare(this, that)
|
|
|
 |
both return an integer: negative (this is less), zero, positive
|
|
|
 |
Naïve sorting
|
|
|
 |
selection sort: next ordered item selected and added physically next in the result
|
|
|
 |
insertion sort: next physical item inserted into proper place in the ordered result
|
|
|
 |
bubble sort: repeatedly swap items, “bubbling up” to the boundary
|
|
|
 |
average- vs. best- vs. worst-case
|
|
|
 |
time bounds: O(n2)
|
|
|
 |
Better sorting
|
|
|
 |
tree sort: put items into a binary search tree, then traverse
|
|
|
 |
heap sort: put items into a heap, then pull them all out (with repeated deleteMin)
|
|
|
 |
quick sort: split items using pivot, then recursively sort the two parts
|
|
|
 |
merge sort: split items into tow halves, recursively sort, then merge back together
|
|
|
 |
time bounds: O(n log n) in average case (but can be bad for sorted data)
|
|
|
 |
hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)
|
|
|
 |
A rational reconstruction of (many) sorts
|
|
|
 |
hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)
|
|
|
 |
one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)
|
|
|
 |
Aside on priority queues and heaps
|
|
|
 |
a priority queue is like a queue, but we can remove items based on highest priority (deleteMin)
|
|
|
 |
it’s easy to make either add or deleteMin O(1) and the other O(n), using a linear list
|
|
|
 |
but we can get O(1) add and O(log n) deleteMin by using a tree-like structure
|
|
|
 |
… stored in an array!
|
|
|
 |
Hashing
|
|
|
 |
See Chapter 5 of the text, especially sections 5.1-5.4
|
|
|
 |
Introduction
|
|
|
 |
Dictionary interface: look values up by key
|
|
|
 |
often the key is a field in a larger value object
|
|
|
 |
can be implemented in O(log n) time using a BST (see above)
|
|
|
 |
… but it would be better to have O(1)!
|
|
|
 |
solution: keep items in an array
|
|
|
 |
but under what index?
|
|
|
 |
basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)
|
|
|
 |
we want a “messy”, random-seeming (but repeatable!) process which distributes keys well
|
|
|
 |
unfortunately, it may happen that two keys collide under the hash function
|
|
|
 |
use the modulus operator (%) to limit the hash results to the table size (-1)
|
|
|
 |
what do we want from hash function?
|
|
|
 |
speed!
|
|
|
 |
coverage: all slots (not limited in magnitude or ability to hit), even distribution
|
|
|
 |
lack of collisions (as much as possible)
|
|
|
 |
collision resolution strategies
|
|
|
 |
separate chaining: use a list of items hanging off each array entry
|
|
|
 |
internal addressing/probing: try to insert at nearby slots if the preferred one is filled
|
|
|
 |
… but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)
|
|
|
 |
linear probing: try successive entries (but primary clustering)
|
|
|
 |
quadratic probing: square the “attempt number” to get an offset (but secondary clustering)
|
|
|
 |
best bets for hashing
|
|
|
 |
well-chosen table size (about twice as much as there is data, and use a prime number)
|
|
|
 |
use a well-distributed hash function
|
|
|