V
CS 241: Data Structures (in Python)—Midterm Exam review—Fall 2020
V
Important note!
*
click on the triangles to the left to expand or contract sections!
*
use the text size feature in your browser to make this text larger and more readable!
>
About the exam
V
Course introduction, basic concepts [see textbook Sections 1.1-1.3]
V
using a medieval-guild analogy: this course helps you to progress from apprentice to journeyman, the achievement celebrated with the attainment of a professional tool-set
*
(in this analogy, your capstone-project / senior thesis is your “masterpiece”)
V
Functions, algorithms, and programs
V
a (mathematical) function is a set of input-output pairs, without regard to how they are computed
*
the inputs and outputs may be numbers, but also data structures, etc.
*
an algorithm is a technique for computing a function: a step-by-step method, but still somewhat abstract
*
a program is a realization of an algorithm in a specific language, with choices of variable names, etc.
*
these three concepts form a hierarchy, with functions the most abstract and programs the most concrete
*
the relationships are many-to-one between algorithms and functions, and between programs and algorithms
*
functions have no notion of running time; algorithms have (abstract, O-notation style) running times; and programs have real-world (clock) running times
V
Modularity, abstraction, and reuse [see textbook Section 1.3.2 and also those for objects below]
*
large software projects are often written and maintained by groups of people
*
we break problems & programs down into manageable pieces called modules, with a clear separation between the behavior seen from the outside, and the implementation “under the hood”
*
we try to minimize the channels of communication or interface between modules for a simpler design—the interface is constrained by the intended behavior
*
a collection of data and some operations (an interface) that display some regular behavior is sometimes called an abstract data type [for math majors, think: “abstract algebra”]
*
different modules which realize or implement the same interface and behavior can be “plugged”, “un-plugged” and “re-plugged” at will, without changing overall behavior
*
ideally, we unplug a slower implementation and plug in a faster one, each implementing the same interface
*
a common language of specification, implementation, and analysis (big-O notation) supports understanding and communication about these artifacts and their properties
*
the use of standard techniques is valued so that people can understand and modify each other's code
*
standard “best practices” solutions to oft-seen problems can involve “tricks” (innovations) that would be hard to come up with on your own
*
ultimately, you can learn deeper principles that will help you adapt these solutions to new situations, and to find innovative techniques of your own
V
Review of Python [see textbook Section 1.4]
*
basic types: bool, int, float, None
*
pairs and tuples: (x,y)
*
lists: [1,2,3,'four']
*
loops: for and while
*
if-else statements (and elif)
*
ranges: range(0,n) starts at 0, stops at n-1, inclusive
*
sequences: includes lists, ranges, strings
*
dictionaries: matched-up key/value pairs (quick lookup) {'a':97, 'b':98, c:'99'}
*
functions: code which consolidates an oft-used pattern, but abstracted with parameters (see also methods below)
*
mutability of data structures and issues of reference—see example in textbook, page 11
V
Classes and methods in Python [see textbook Section 1.4.6]
*
sometimes we see a small cohort of related data items used to collectively manifest an abstraction
*
we can abstract out patterns of code used to manipulate the data as functions (see above)
*
but we can also collect the data together in a “package” called an object (a “scoped environment” of definitions)
V
we can also include the manipulating functions, now re-cast as methods which modify the object’s data; each takes an implicit “self” parameter representing the object-as-a-whole
*
parts of the object are accessed (in the internal scope) via the “dot” notation (.)
*
a class is a template that describes the data and method (and function) “packages”
V
it may include standard methods such as
*
__init__(self, …)
*
__str__(self)
*
__repr__(self)
*
one class may extend another (called sub-classing) via parameterization; the methods of the super-class (parameter class) may be accessed or even over-ridden, and new ones introduced
*
used properly, objects can increase code safety by consolidating access to the data cohort in one place, rather than scattering the data and its manipulation around in other code
*
objects, classes, and methods are the main way we manifest abstract data types in Python (as well as other object-oriented languages)
V
Algorithm analysis and Big-O notation [see textbook Section 2.2]
V
we want a way to describe and compare the running times of algorithms, but:
*
it should be insensitive to particular hardware/machines
*
it should emphasize the behavior that obtains as the problems (or input data sets) grow large
V
we abstract away the details of a function (e.g., a polynomial) representing the running time
*
we eliminate low-order terms: just keep the highest-order term
*
we eliminate constant factors (this has a downside, though)
*
although big-O notation is very useful, it is still a coarse comparison—always consider things like actual constant factors (and data sizes) when assessing an algorithm in practice
V
technically, we say that “f is O(g)” when there is a constant factor c and an initial point n0
*
f(n) is less than or equal to c·g(n) for all n > n0
*
idea: we can pick a constant that, when multiplied by the high-order term, will dominate all the low order terms
V
the hierarchy of O-notation functions typically seen is:
*
1 < log(n) < n < n log(n) < n2 < n3 < ... < cn
*
note that we say “O(1)” for a complexity which is constant (independent of n), since O-notation ignores constant factors (and “0” would cancel out anything else)
V
how do we measure times for O-notation?
*
count individual operations as steps
*
ignore constant amounts of steps: look for loops or traversals that somehow depend on n
*
single loops up to n are O(n) [times the body of the loop]
*
nested loops each to n are O(n2) [times the body of the loop]
*
nested loops with the inner one bounded by the index are still O(n2)—see the “triangle” of summation
*
logs arise from breaking problems into pieces, then each piece into more pieces, etc. (think of the “guess a number between 1 and 100” game)
*
The anagram example in section 2.2.2 is a great example of 4 algorithms of varying complexity—note especially that the preferred solution is O(1)
V
Efficiency of built-in Python data structures [see textbook Section 2.3]
*
you should know the basic behaviors of list and dictionary functions and methods for the exam!
*
See the tables on pages 73 and 76 (and surrounding discussion)
*
if you have a good grasp of how lists are implemented, based on the memory model we discussed in class, you should be able to grasp intuitively how these operations come by their stated complexity (running time in big-O notation)
*
Note especially that the complexity of pop() is O(1), whereas that of pop(i) is O(n)
V
Stacks and (linear) abstract data types [see textbook Sections 3.1-3.3]
*
a stack supports insertion (push), deletion (pop), and a check for emptiness, but with a specific “discipline”
V
push and pop behave in a LIFO way: last in, first out
*
(think of a stack of dinner plates on a spring at Goudy)
V
applications of stacks:
*
matching parentheses (and brackets and braces) in program syntax
*
keeping track of pending method calls
*
converting and evaluating forms of (linguistic, algebraic) expressions
V
implementing a stack with a list
*
push items by appending them to the end (which is O(1))
*
pop items by removing them (also called pop) from the end (which is O(1))
*
check emptiness by comparing to the empty list (or finding length of 0)
V
Numbers, numeral bases, and conversion [see textbook Section 3.3.6]
*
you should be able to convert (small numbers) between base 2 and 10 for the exam!
*
See examples on pages 93-94
*
numbers are abstract concepts, not associated with any particular representation or base
*
numerals are string-like representations of numbers
*
in the usual positional notation, we use a string of digits for a base b
V
the value (number associated with the numeral) is equal to the sum of the values of the digits, each multiplied by an increasing power of the base
*
… starting from a power of 0, and running from right to left
*
we say: one’s place (power 0), b’s place (for base b), b2’s place, etc.
*
Remember: the log of a number (in base b) is approximately the number of digits its numeral has (in base b)
*
in our culture we use base 10, with what are called Arabic digits 0-9 (except by the Arabs, who call them Hindu digits …)
*
computers use base 2, with digits 0 and 1, also called bits (for “binary digits”)
*
people often abbreviate base 2 as base 16, called hexadecimal
V
we can convert a number (in our programming language) into a representation in a base b by:
*
starting a running total as the number
*
taking the total mod (%) the base b, and pushing the result (as a digit character)
*
taking the integer division (//) of the total by the base and setting that to be the new total
*
… continuing until we hit a total of zero (why will this always happen?)
*
popping the digits off successively and appending to the right of a string (thus reversing them)
V
Expressions, bracketing, and conversion [see textbook Sections 3.3.4-3.3.7]
V
we can balance parentheses (as well as brackets, braces, etc.) in a string using a simple stack algorithm
*
we scan the string from the left to the right, starting with an empty stack
*
when we see a left parenthesis (etc.), we push it on the stack
*
when we see a right parenthesis (etc.), we pop the stack and match that (left) item to the (right) one found (round to round, square to square, etc.)—if they don’t match in shape, it is an error!
*
if the stack is empty before we try to pop, it is an error (unmatched right paren)
*
if the stack is non-empty when we are done, it is an error (unmatched left paren)
*
we can process other characters (letters, digits, punctuation) by just ignoring them, at least for the purposes of parenthesis matching
V
Infix form
*
algebraic expressions are usually written with binary operators (like +, - and *), constants (numerals) and variables
*
when we write expressions, we can use parentheses to show structure
V
we leave excess parentheses out according to the usual PEMDAS “order of operations”
*
multiplication and division have higher precedence than addition and subtraction
*
when it matters, we associate to the left: a+b+c really means (a+b)+c, even though the value is unaffected by the order
*
… but sometimes we need to use parentheses, when the default order of operations does not match our intent
*
we can always put extra parentheses in for emphasis
*
this usual way of writing expressions is called infix form
V
Postfix form
*
an alternative style, postfix form, puts the operators after their arguments: 2 3 +
*
in postfix form we never need parentheses
*
postfix form is based on the ideas of Polish logicians and is sometimes called Polish notation
V
Prefix form
*
another style, prefix form, puts the operators before their arguments: + 2 3
*
in prefix form we never need parentheses
V
Converting from infix to postfix
*
you should be able to convert between different forms of expressions for the exam!
*
See examples on pages 97-98
*
you should be able to do this informally (“by eye” and a bit of figuring) for short expressions
*
… but there is an algorithm for longer ones:
V
we use a stack to convert infix to postfix:
*
proceed through the expression from left to right
*
for a constant or variable, simply pass it through by appending to the (accumulating) output
*
for an operator, push it on the stack
*
… but, before pushing: pop any operators of equal or lower precedence and append to output
*
for parentheses, treat as for the matching algorithm, but popping intervening operators and appending to the output
V
Evaluating postfix
V
we can use a stack to evaluate (variable-free) postfix forms:
*
proceed through the expression from left to right
*
for a constant, push it on the stack
*
for an operator, pop off two arguments and apply the operator, then push the result back on
*
be careful of the argument order: the first one popped is the right argument!
*
when done, we should always have exactly one number on the stack (and have never failed to pop two items)
V
What will not be on the exam
*
No circuit elements (textbook Section 1.4.6.2)—still, you should understand a bit about sub-classes and inheritance, which this section illustrates
*
No exception handling (since we did not practice it in lab)
*
There will be nothing about queues on the exam (we didn’t quite reach it in the textbook)