Topic
V CS 241: Data Structures—Midterm Exam Review Notes
V Important notes!
* click on the triangles to the left to expand or contract sections!
* use the text size feature in your browser to make this text larger and more readable!
V About the exam
* the final exam will take place on Friday, May 4th, from 2-5pm, in the usual lecture room (Ford 204)
* the exam will be designed so that you should be able to complete it in one-and-a-half to two hours
* you will not have access to books, notes, computers, networks or neighbors during the exam
(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)
V In preparing for the exam, you should consult:
* your textbook (see especially chapters and sections mentioned below)
* your notes from lecture and lab—these are likely to be especially helpful
* the home page for the course and various links, examples, etc., provided there
* … and also especially the lab assignments linked from the home page, as well as your own programs
* graded quizzes you completed during lecture (3 of them, hopefully)
V Format: the exam will consist of several kinds of questions; typically:
* 10-15 True-or-False questions at 1-2 points each
* 8-10 short answer questions at 4-6 points each
* 2-4 longer answer questions at 8-15 points each
* These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!
V The final exam is comprehensive!
* But the emphasis will be on material from after the midterm
* For more information on the midterm, see the midterm review notes
V … but here are the main subject areas again (with book chapters):
V General background to Data Structures and modularity
* Chapter 1, section 1.1; also Chapter 3, section 3.1
V Math review and recursion
* Chapter 1, sections 1.2-1.3
V Java review and new features
* Chapter 1, sections 1.4-1.5
V Review—see external notes!
V Algorithm analysis and asymptotics (O notation)
* Chapter X, sections y-z
V Arrays, ArrayLists and linked lists
* Chapter 3, sections 3.2-3.5
V Stacks and Queues
* Chapter 3, sections 3.6-3.7
V Infix and Postfix expressions
* Chapter 3, sections 3.6.3
V Introduction to Applets and GUIs
V See online examples and resources
* online examples at course homepage, near the top
V Trees
* See Chapter 4 of the text, especially sections 4.1-4.4 and 4.6 (peruse also splay trees in 4.5)
V Tree definitions and terminology
* A tree is a collection structure which can hold elements in positions called nodes;
a tree is a linked structure, but unlike a linked list, each node may have several linked nodes
* nodes: the “places” where data items are kept in the tree
* links: the connections between nodes (represented as references/“pointers” in Java)
* path: a sequence of links between nodes (and the nodes along the way, depending on context)
* root: the top node of a tree
* sub-tree: a node in a tree considered as a root of its own tree
* children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)
* parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)
* height: the number of links between the root of a tree and its furthest descendant
* internal nodes: nodes which have a child
* leaf or external node: nodes which have no children (or all null children, in a Java implementation)
* fringe: the set of all leaves of a tree, considered as a sequence
* ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say
* sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles
* in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (sub-graphs)
V General trees
* in a general tree, any node may have an arbitrary number of children
* we could represent children in a general tree using any (ordered) collection, but a linked list might be best
* we can also use a first-child/next-sibling representation, which shows that a binary-tree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)
V Binary trees
* when every node in a tree has at most two children and they are distinguished as left- and right-children
* in other words, when every node has a left and right child, either of which may be null
V Expression trees
* since arithmetic is built mostly out of binary operators, we can represent arithmetic expressions as
binary trees, where the internal nodes hold operators and the leaves are numeric constants (literals)
V expressions can be written in various systematic styles: infix with parentheses (the usual), postfix or prefix
* postfix puts the arguments first, then the operator afterwards (post-position); prefix is the opposite
* postfix and prefix forms never need parentheses
* you should be able to convert between different forms of expressions for the exam!!
V various tree traversals (see below) yield similar representations of expressions
* except for the parentheses of inorder/infix, which must be generated separately, before and after the left and right subtrees
V Tree traversals
* a tree traversal is a process visiting all the nodes (or elements thereof) in a tree
* typical traversals are implemented recursively: we “visit” the node and recursively traverse the sub-trees, in some order
V by varying the relative order of node visitation and sub-tree traversal, we get different standard orders:
* left-right pre-order: visit node, traverse left, traverse right
* left-right inorder: traverse left, visit node, traverse right
* left-right postorder: traverse left, traverse right, visit node
* … and so on, for three others but right-to-left (6 possible permutations all together)
* if we want to implement the Iterator interface, we can’t use recursion: use an explicit stack instead
* if we substitute a queue for the stack, we get breadth-first traversals (also known as level-order)
V Binary search trees
V a binary search tree is a binary tree whose elements support some ordering, and where:
* all the nodes to the left of the root hold values that are less than the the one in the root
* all the nodes to the right of the root hold values that are greater than the the one in the root
* similarly for all sub-trees (i.e., same rule throughout the tree)
* equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)
* binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key
V insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:
* compare value being inserted (or sought) to the element in the root, go left or right as appropriate
* height of a bushy tree is O(log n) where n is the total number of elements
* int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups
V Deletion from BSTs
V deletion from a BST depends on some cases:
* deletion of a leaf is easy: just set the reference in the parent to null
* deletion of a node with a single child is almost as easy: just re-set the reference int he parent to circumvent the deleted node
* if a node has two children, look for the “adjacent node” (next smaller or next larger) in the tree;
this will be in the “inside corner” of one of the sub-trees (extreme left of right sub-tree or vice versa);
replace the element in the deleted node with that from the insider corner node, then delete that node (which is easier)
V Balanced trees (AVL)
* since “bushy” or balanced trees are important to make BSTs efficient, it is worth considering how to keep trees balanced
* many possible definitions of balance, but it is not enough that the tree “balances” around the root like a coat hanger
* for AVL trees, the difference in height between sub-trees must be at most one (throughout the tree)
* for the exam, you should be able to recognize this property of a tree
* when inserting a node, an AVL tree may get out of balance: we correct this by performing rotations
* there are several cases to consider, with symmetries, some requiring single and some requiring double rotations
* see the textbook pages 123-136 for details, examples and pictures
V Other kinds of trees
* there are many other kinds of trees that maintain balance using rotations: red-black trees, splay trees, etc.
* splay trees in particular use a policy of bringing sought items to the root using rotations
(with the understanding that they are likely to be sought more often later)
V Sorting
* See Chapter 7 of the text, especially sections 7.1-7.8
V Introduction to sorting
* one of the most common operations
* ordered data supports binary search (fast!)
V comparison interfaces in Java
* Comparable—internal/“natural”: this.compareTo(that)
* Comparator—external: compare(this, that)
* both return an integer: negative (this is less), zero, positive
V Naïve sorting
* selection sort: next ordered item selected and added physically next in the result
* insertion sort: next physical item inserted into proper place in the ordered result
* bubble sort: repeatedly swap items, “bubbling up” to the boundary
* average- vs. best- vs. worst-case
* time bounds: O(n2)
V Better sorting
* tree sort: put items into a binary search tree, then traverse
* heap sort: put items into a heap, then pull them all out (with repeated deleteMin)
* quick sort: split items using pivot, then recursively sort the two parts
* merge sort: split items into tow halves, recursively sort, then merge back together
* time bounds: O(n log n) in average case (but can be bad for sorted data)
* hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)
V A rational reconstruction of (many) sorts
* hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)
* one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)
V Aside on priority queues and heaps
* a priority queue is like a queue, but we can remove items based on highest priority (deleteMin)
* it’s easy to make either add or deleteMin O(1) and the other O(n), using a linear list
* but we can get O(1) add and O(log n) deleteMin by using a tree-like structure
* … stored in an array!
V Hashing
* See Chapter 5 of the text, especially sections 5.1-5.4
V Introduction
V Dictionary interface: look values up by key
* often the key is a field in a larger value object
* can be implemented in O(log n) time using a BST (see above)
* … but it would be better to have O(1)!
* solution: keep items in an array
* but under what index?
* basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)
* we want a “messy”, random-seeming (but repeatable!) process which distributes keys well
* unfortunately, it may happen that two keys collide under the hash function
* use the modulus operator (%) to limit the hash results to the table size (-1)
V what do we want from hash function?
* speed!
* coverage: all slots (not limited in magnitude or ability to hit), even distribution
* lack of collisions (as much as possible)
V collision resolution strategies
* separate chaining: use a list of items hanging off each array entry
* internal addressing/probing: try to insert at nearby slots if the preferred one is filled
* … but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)
* linear probing: try successive entries (but primary clustering)
* quadratic probing: square the “attempt number” to get an offset (but secondary clustering)
V best bets for hashing
* well-chosen table size (about twice as much as there is data, and use a prime number)
* use a well-distributed hash function