Midterm review F2020

CS 241: Data Structures (in Python)—Midterm Exam review—Fall 2020

Important note!

click on the triangles to the left to expand or contract sections!

use the text size feature in your browser to make this text larger and more readable!

About the exam

Course introduction, basic concepts [see textbook Sections 1.1-1.3]

using a medieval-guild analogy: this course helps you to progress from apprentice to journeyman, the achievement celebrated with the attainment of a professional tool-set

(in this analogy, your capstone-project / senior thesis is your “masterpiece”)

Functions, algorithms, and programs

a (mathematical) function is a set of input-output pairs, without regard to how they are computed

the inputs and outputs may be numbers, but also data structures, etc.

an algorithm is a technique for computing a function: a step-by-step method, but still somewhat abstract

a program is a realization of an algorithm in a specific language, with choices of variable names, etc.

these three concepts form a hierarchy, with functions the most abstract and programs the most concrete

the relationships are many-to-one between algorithms and functions, and between programs and algorithms

functions have no notion of running time; algorithms have (abstract, O-notation style) running times; and programs have real-world (clock) running times

Modularity, abstraction, and reuse [see textbook Section 1.3.2 and also those for objects below]

large software projects are often written and maintained by groups of people

we break problems & programs down into manageable pieces called modules, with a clear separation between the behavior seen from the outside, and the implementation “under the hood”

we try to minimize the channels of communication or interface between modules for a simpler design—the interface is constrained by the intended behavior

a collection of data and some operations (an interface) that display some regular behavior is sometimes called an abstract data type [for math majors, think: “abstract algebra”]

different modules which realize or implement the same interface and behavior can be “plugged”, “un-plugged” and “re-plugged” at will, without changing overall behavior

ideally, we unplug a slower implementation and plug in a faster one, each implementing the same interface

a common language of specification, implementation, and analysis (big-O notation) supports understanding and communication about these artifacts and their properties

the use of standard techniques is valued so that people can understand and modify each other's code

standard “best practices” solutions to oft-seen problems can involve “tricks” (innovations) that would be hard to come up with on your own

ultimately, you can learn deeper principles that will help you adapt these solutions to new situations, and to find innovative techniques of your own

Review of Python [see textbook Section 1.4]

basic types: bool, int, float, None

pairs and tuples: (x,y)

lists: [1,2,3,'four']

loops: for and while

if-else statements (and elif)

ranges: range(0,n) starts at 0, stops at n-1, inclusive

sequences: includes lists, ranges, strings

dictionaries: matched-up key/value pairs (quick lookup) {'a':97, 'b':98, c:'99'}

functions: code which consolidates an oft-used pattern, but abstracted with parameters (see also methods below)

mutability of data structures and issues of reference—see example in textbook, page 11

Classes and methods in Python [see textbook Section 1.4.6]

sometimes we see a small cohort of related data items used to collectively manifest an abstraction

we can abstract out patterns of code used to manipulate the data as functions (see above)

but we can also collect the data together in a “package” called an object (a “scoped environment” of definitions)

we can also include the manipulating functions, now re-cast as methods which modify the object’s data; each takes an implicit “self” parameter representing the object-as-a-whole

parts of the object are accessed (in the internal scope) via the “dot” notation (.)

a class is a template that describes the data and method (and function) “packages”

it may include standard methods such as

__init__(self, …)

__str__(self)

__repr__(self)

one class may extend another (called sub-classing) via parameterization; the methods of the super-class (parameter class) may be accessed or even over-ridden, and new ones introduced

used properly, objects can increase code safety by consolidating access to the data cohort in one place, rather than scattering the data and its manipulation around in other code

objects, classes, and methods are the main way we manifest abstract data types in Python (as well as other object-oriented languages)

Algorithm analysis and Big-O notation [see textbook Section 2.2]

we want a way to describe and compare the running times of algorithms, but:

it should be insensitive to particular hardware/machines

it should emphasize the behavior that obtains as the problems (or input data sets) grow large

we abstract away the details of a function (e.g., a polynomial) representing the running time

we eliminate low-order terms: just keep the highest-order term

we eliminate constant factors (this has a downside, though)

although big-O notation is very useful, it is still a coarse comparison—always consider things like actual constant factors (and data sizes) when assessing an algorithm in practice

technically, we say that “f is O(g)” when there is a constant factor c and an initial point n0

f(n) is less than or equal to c·g(n) for all n > n0

idea: we can pick a constant that, when multiplied by the high-order term, will dominate all the low order terms

the hierarchy of O-notation functions typically seen is:

1 < log(n) < n < n log(n) < n2 < n3 < ... < cn

note that we say “O(1)” for a complexity which is constant (independent of n), since O-notation ignores constant factors (and “0” would cancel out anything else)

how do we measure times for O-notation?

count individual operations as steps

ignore constant amounts of steps: look for loops or traversals that somehow depend on n

single loops up to n are O(n) [times the body of the loop]

nested loops each to n are O(n2) [times the body of the loop]

nested loops with the inner one bounded by the index are still O(n2)—see the “triangle” of summation

logs arise from breaking problems into pieces, then each piece into more pieces, etc. (think of the “guess a number between 1 and 100” game)

The anagram example in section 2.2.2 is a great example of 4 algorithms of varying complexity—note especially that the preferred solution is O(1)

Efficiency of built-in Python data structures [see textbook Section 2.3]

you should know the basic behaviors of list and dictionary functions and methods for the exam!

See the tables on pages 73 and 76 (and surrounding discussion)

if you have a good grasp of how lists are implemented, based on the memory model we discussed in class, you should be able to grasp intuitively how these operations come by their stated complexity (running time in big-O notation)

Note especially that the complexity of pop() is O(1), whereas that of pop(i) is O(n)

Stacks and (linear) abstract data types [see textbook Sections 3.1-3.3]

a stack supports insertion (push), deletion (pop), and a check for emptiness, but with a specific “discipline”

push and pop behave in a LIFO way: last in, first out

(think of a stack of dinner plates on a spring at Goudy)

applications of stacks:

matching parentheses (and brackets and braces) in program syntax

keeping track of pending method calls

converting and evaluating forms of (linguistic, algebraic) expressions

implementing a stack with a list

push items by appending them to the end (which is O(1))

pop items by removing them (also called pop) from the end (which is O(1))

check emptiness by comparing to the empty list (or finding length of 0)

Numbers, numeral bases, and conversion [see textbook Section 3.3.6]

you should be able to convert (small numbers) between base 2 and 10 for the exam!

See examples on pages 93-94

numbers are abstract concepts, not associated with any particular representation or base

numerals are string-like representations of numbers

in the usual positional notation, we use a string of digits for a base b

the value (number associated with the numeral) is equal to the sum of the values of the digits, each multiplied by an increasing power of the base

… starting from a power of 0, and running from right to left

we say: one’s place (power 0), b’s place (for base b), b2’s place, etc.

Remember: the log of a number (in base b) is approximately the number of digits its numeral has (in base b)

in our culture we use base 10, with what are called Arabic digits 0-9 (except by the Arabs, who call them Hindu digits …)

computers use base 2, with digits 0 and 1, also called bits (for “binary digits”)

people often abbreviate base 2 as base 16, called hexadecimal

we can convert a number (in our programming language) into a representation in a base b by:

starting a running total as the number

taking the total mod (%) the base b, and pushing the result (as a digit character)

taking the integer division (//) of the total by the base and setting that to be the new total

… continuing until we hit a total of zero (why will this always happen?)

popping the digits off successively and appending to the right of a string (thus reversing them)

Expressions, bracketing, and conversion [see textbook Sections 3.3.4-3.3.7]

we can balance parentheses (as well as brackets, braces, etc.) in a string using a simple stack algorithm

we scan the string from the left to the right, starting with an empty stack

when we see a left parenthesis (etc.), we push it on the stack

when we see a right parenthesis (etc.), we pop the stack and match that (left) item to the (right) one found (round to round, square to square, etc.)—if they don’t match in shape, it is an error!

if the stack is empty before we try to pop, it is an error (unmatched right paren)

if the stack is non-empty when we are done, it is an error (unmatched left paren)

we can process other characters (letters, digits, punctuation) by just ignoring them, at least for the purposes of parenthesis matching

Infix form

algebraic expressions are usually written with binary operators (like +, - and *), constants (numerals) and variables

when we write expressions, we can use parentheses to show structure

we leave excess parentheses out according to the usual PEMDAS “order of operations”

multiplication and division have higher precedence than addition and subtraction

when it matters, we associate to the left: a+b+c really means (a+b)+c, even though the value is unaffected by the order

… but sometimes we need to use parentheses, when the default order of operations does not match our intent

we can always put extra parentheses in for emphasis

this usual way of writing expressions is called infix form

Postfix form

an alternative style, postfix form, puts the operators after their arguments: 2 3 +

in postfix form we never need parentheses

postfix form is based on the ideas of Polish logicians and is sometimes called Polish notation

Prefix form

another style, prefix form, puts the operators before their arguments: + 2 3

in prefix form we never need parentheses

Converting from infix to postfix

you should be able to convert between different forms of expressions for the exam!

See examples on pages 97-98

you should be able to do this informally (“by eye” and a bit of figuring) for short expressions

… but there is an algorithm for longer ones:

we use a stack to convert infix to postfix:

proceed through the expression from left to right

for a constant or variable, simply pass it through by appending to the (accumulating) output

for an operator, push it on the stack

… but, before pushing: pop any operators of equal or lower precedence and append to output

for parentheses, treat as for the matching algorithm, but popping intervening operators and appending to the output

Evaluating postfix

we can use a stack to evaluate (variable-free) postfix forms:

proceed through the expression from left to right

for a constant, push it on the stack

for an operator, pop off two arguments and apply the operator, then push the result back on

be careful of the argument order: the first one popped is the right argument!

when done, we should always have exactly one number on the stack (and have never failed to pop two items)

What will not be on the exam

No circuit elements (textbook Section 1.4.6.2)—still, you should understand a bit about sub-classes and inheritance, which this section illustrates

No exception handling (since we did not practice it in lab)

There will be nothing about queues on the exam (we didn’t quite reach it in the textbook)