using a medieval-guild analogy: this course helps you to progress from apprentice to journeyman, the achievement celebrated with the attainment of a professional tool-set
(in this analogy, your capstone-project / senior thesis is your “masterpiece”)
Functions, algorithms, and programs
a (mathematical) function is a set of input-output pairs, without regard to how they are computed
the inputs and outputs may be numbers, but also data structures, etc.
an algorithm is a technique for computing a function: a step-by-step method, but still somewhat abstract
a program is a realization of an algorithm in a specific language, with choices of variable names, etc.
these three concepts form a hierarchy, with functions the most abstract and programs the most concrete
the relationships are many-to-one between algorithms and functions, and between programs and algorithms
functions have no notion of running time; algorithms have (abstract, O-notation style) running times; and programs have real-world (clock) running times
Modularity, abstraction, and reuse [see textbook Section 1.3.2 and also those for objects below]
large software projects are often written and maintained by groups of people
we break problems & programs down into manageable pieces called modules, with a clear separation between the behavior seen from the outside, and the implementation “under the hood”
we try to minimize the channels of communication or interface between modules for a simpler design—the interface is constrained by the intended behavior
a collection of data and some operations (an interface) that display some regular behavior is sometimes called an abstract data type[for math majors, think: “abstract algebra”]
different modules which realize or implement the same interface and behavior can be “plugged”, “un-plugged” and “re-plugged” at will, without changing overall behavior
ideally, we unplug a slower implementation and plug in a faster one, each implementing the same interface
a common language of specification, implementation, and analysis (big-O notation) supports understanding and communication about these artifacts and their properties
the use of standard techniques is valued so that people can understand and modify each other's code
standard “best practices” solutions to oft-seen problems can involve “tricks” (innovations) that would be hard to come up with on your own
ultimately, you can learn deeper principles that will help you adapt these solutions to new situations, and to find innovative techniques of your own
Review of Python [see textbook Section 1.4]
basic types: bool, int, float, None
pairs and tuples: (x,y)
lists: [1,2,3,'four']
loops: for and while
if-else statements (and elif)
ranges: range(0,n) starts at 0, stops at n-1, inclusive
functions: code which consolidates an oft-used pattern, but abstracted with parameters (see also methods below)
mutability of data structures and issues of reference—see example in textbook, page 11
Classes and methods in Python [see textbook Section 1.4.6]
sometimes we see a small cohort of related data items used to collectively manifest an abstraction
we can abstract out patterns of code used to manipulate the data as functions (see above)
but we can also collect the data together in a “package” called an object (a “scoped environment” of definitions)
we can also include the manipulating functions, now re-cast as methods which modify the object’s data; each takes an implicit “self” parameter representing the object-as-a-whole
parts of the object are accessed (in the internal scope) via the “dot” notation (.)
a class is a template that describes the data and method (and function) “packages”
it may include standard methods such as
__init__(self, …)
__str__(self)
__repr__(self)
one class may extend another (called sub-classing) via parameterization; the methods of the super-class (parameter class) may be accessed or even over-ridden, and new ones introduced
used properly, objects can increase code safety by consolidating access to the data cohort in one place, rather than scattering the data and its manipulation around in other code
objects, classes, and methods are the main way we manifest abstract data types in Python (as well as other object-oriented languages)
Algorithm analysis and Big-O notation [see textbook Section 2.2]
we want a way to describe and compare the running times of algorithms, but:
it should be insensitive to particular hardware/machines
it should emphasize the behavior that obtains as the problems (or input data sets) grow large
we abstract away the details of a function (e.g., a polynomial) representing the running time
we eliminate low-order terms: just keep the highest-order term
we eliminate constant factors (this has a downside, though)
although big-O notation is very useful, it is still a coarse comparison—always consider things like actual constant factors (and data sizes) when assessing an algorithm in practice
technically, we say that “f is O(g)” when there is a constant factor c and an initial point n0
f(n) is less than or equal to c·g(n) for all n > n0
idea: we can pick a constant that, when multiplied by the high-order term, will dominate all the low order terms
the hierarchy of O-notation functions typically seen is:
1 < log(n) < n < n log(n) < n2 < n3 < ... < cn
note that we say “O(1)” for a complexity which is constant (independent of n), since O-notation ignores constant factors (and “0” would cancel out anything else)
how do we measure times for O-notation?
count individual operations as steps
ignore constant amounts of steps: look for loops or traversals that somehow depend on n
single loops up to n are O(n) [times the body of the loop]
nested loops each to n are O(n2) [times the body of the loop]
nested loops with the inner one bounded by the index are still O(n2)—see the “triangle” of summation
logs arise from breaking problems into pieces, then each piece into more pieces, etc. (think of the “guess a number between 1 and 100” game)
The anagram example in section 2.2.2 is a great example of 4 algorithms of varying complexity—note especially that the preferred solution is O(1)
Efficiency of built-in Python data structures [see textbook Section 2.3]
you should know the basic behaviors of list and dictionary functions and methods for the exam!
See the tables on pages 73 and 76 (and surrounding discussion)
if you have a good grasp of how lists are implemented, based on the memory model we discussed in class, you should be able to grasp intuitively how these operations come by their stated complexity (running time in big-O notation)
Note especially that the complexity of pop() is O(1), whereas that of pop(i) is O(n)
Stacks and (linear) abstract data types [see textbook Sections 3.1-3.3]
a stack supports insertion (push), deletion (pop), and a check for emptiness, but with a specific “discipline”
push and pop behave in a LIFO way: last in, first out
(think of a stack of dinner plates on a spring at Goudy)
applications of stacks:
matching parentheses (and brackets and braces) in program syntax
keeping track of pending method calls
converting and evaluating forms of (linguistic, algebraic) expressions
implementing a stack with a list
push items by appending them to the end (which is O(1))
pop items by removing them (also called pop) from the end (which is O(1))
check emptiness by comparing to the empty list (or finding length of 0)
Numbers, numeral bases, and conversion [see textbook Section 3.3.6]
you should be able to convert (small numbers) between base 2 and 10 for the exam!
See examples on pages 93-94
numbers are abstract concepts, not associated with any particular representation or base
numerals are string-like representations of numbers
in the usual positional notation, we use a string of digits for a base b
the value (number associated with the numeral) is equal to the sum of the values of the digits, each multiplied by an increasing power of the base
… starting from a power of 0, and running from right to left
we say: one’s place (power 0), b’s place (for base b), b2’s place, etc.
Remember: the log of a number (in base b) is approximately the number of digits its numeral has (in base b)
in our culture we use base 10, with what are called Arabic digits 0-9 (except by the Arabs, who call them Hindu digits …)
computers use base 2, with digits 0 and 1, also called bits (for “binary digits”)
people often abbreviate base 2 as base 16, called hexadecimal
we can convert a number (in our programming language) into a representation in a base b by:
starting a running total as the number
taking the total mod (%) the base b, and pushing the result (as a digit character)
taking the integer division (//) of the total by the base and setting that to be the new total
… continuing until we hit a total of zero (why will this always happen?)
popping the digits off successively and appending to the right of a string (thus reversing them)
Expressions, bracketing, and conversion [see textbook Sections 3.3.4-3.3.7]
we can balance parentheses (as well as brackets, braces, etc.) in a string using a simple stack algorithm
we scan the string from the left to the right, starting with an empty stack
when we see a left parenthesis (etc.), we push it on the stack
when we see a right parenthesis (etc.), we pop the stack and match that (left) item to the (right) one found (round to round, square to square, etc.)—if they don’t match in shape, it is an error!
if the stack is empty before we try to pop, it is an error (unmatched right paren)
if the stack is non-empty when we are done, it is an error (unmatched left paren)
we can process other characters (letters, digits, punctuation) by just ignoring them, at least for the purposes of parenthesis matching
Infix form
algebraic expressions are usually written with binary operators (like +, - and *), constants (numerals) and variables
when we write expressions, we can use parentheses to show structure
we leave excess parentheses out according to the usual PEMDAS “order of operations”
multiplication and division have higher precedence than addition and subtraction
when it matters, we associate to the left: a+b+c really means (a+b)+c, even though the value is unaffected by the order
… but sometimes we need to use parentheses, when the default order of operations does not match our intent
we can always put extra parentheses in for emphasis
this usual way of writing expressions is called infix form
Postfix form
an alternative style, postfix form, puts the operators after their arguments: 2 3 +
in postfix form we never need parentheses
postfix form is based on the ideas of Polish logicians and is sometimes called Polish notation
Prefix form
another style, prefix form, puts the operators before their arguments: + 2 3
in prefix form we never need parentheses
Converting from infix to postfix
you should be able to convert between different forms of expressions for the exam!
See examples on pages 97-98
you should be able to do this informally (“by eye” and a bit of figuring) for short expressions
… but there is an algorithm for longer ones:
we use a stack to convert infix to postfix:
proceed through the expression from left to right
for a constant or variable, simply pass it through by appending to the (accumulating) output
for an operator, push it on the stack
… but, before pushing: pop any operators of equal or lower precedence and append to output
for parentheses, treat as for the matching algorithm, but popping intervening operators and appending to the output
Evaluating postfix
we can use a stack to evaluate (variable-free) postfix forms:
proceed through the expression from left to right
for a constant, push it on the stack
for an operator, pop off two arguments and apply the operator, then push the result back on
be careful of the argument order: the first one popped is the right argument!
when done, we should always have exactly one number on the stack (and have never failed to pop two items)
What will not be on the exam
No circuit elements (textbook Section 1.4.6.2)—still, you should understand a bit about sub-classes and inheritance, which this section illustrates
No exception handling (since we did not practice it in lab)
There will be nothing about queues on the exam (we didn’t quite reach it in the textbook)