My personal notes on the book Computer Science Distilled: Learn the Art of Solving Computational Problems.
1. Basics
- Dump all important stuff on paper.
- Probability
-
The formula:
$$P(event) = \frac{\#\ of\ ways\ event\ can\ happen}{\#\ of\ possible\ outcomes}$$ -
Independent events: $$P(A \cap B) = P(A) \cdot P(B)$$
-
Mututally exclusive events: $$P(A \cup B) = P(A) + P(B)$$
-
Complimentary events: $$1 = P(A) + P(B)$$
-
The Gambler’s Fallacy: past events never affect the outcome of an independent event.
-
2. Complexity
- Time complexity, denoted by T(n), gives the number of operations the number of operations the algorithm performs when processing an input of size n.
- When an algorithm can have different values of T(n) for the same value n, we resort to cases:
- Best case
- Worst case
- Average case
- In general, the most important is the worst case (baseline).
- When an algorithm can have different values of T(n) for the same value n, we resort to cases:
- We can find the time complexity of an algorithm by counting the number of basic operations it requires for an hypothetical input of size n.
- We can approximate T(n) by its fastest-growing term, called the dominant term: $$T(n) = n^2 + n - 2 \approx O(n^2)$$
- The Big-O notation
- This notation is used to express the dominant term of an algorithm’s cost function in the worst case. A function with a fastest-growing term of $$2^n$$ or “weaker” is $$O(2^n)$$
- Space complexity measures the working storage for an algorithm.
3. Strategy
- The iterative strategy consists of using loops to repeat a process until a condition is met.
- The recursive strategy consists of using functions that deleted work to “clones of itself”.
- The brute force strategy solves problems by inspecting all off the problem’s possible solution candidates.
- The backtrack strategy involves rolling back the previous step and continuing the search. It works best in problems where the solution is a sequence of choices and making a choice restrains subsequent choices.
- A heuristic (method) leads to a solution without guaranteeing it is the best or optimal one.
- Greedy approach: the opposite of backtracking; it consist in never coming back to previous choices.
- The divide and conquer strategy divides a problem over and over into subproblems, until the become easy enough to solve. The subproblem solutions are combined to obtain the original problem solution.
- Dynaminc programming is identifying repeated subproblems in order to compute them only once.
- Memoization allows the reuse of partial calculations.
- Brand and bound strategies provide a solution in the form of a sequence of choices. It works by detecting and discarding bad choices.
4. Data
- Abstractions are interfaces that let us omit details.
- An Abstract Data Type (ADT) is the specification of a group of operations that make sense for a given data type.
- Common abstractions
- Primitive data types
- Stack
- Queue
- Priority queue
- List
- Sorted list
- Map
- Set
- Structures
- Data structures describe how data is to be organized and accessed in the computer’s memory.
- Array: the simplest way to store stuff in computer memory. It consists in allocation sequential space in the memory, and writing your data sequentially on that space.
- Linked list: data is stored in a chain of cells that doesn’t need to be at sequential memory addresses. Memory for each cell is allocated as needed.
- Double linked list
- Tree: they are similar to lists, but are more suitable to hierarchical data.
- Binary search tree: it’s a special type of tree that can be efficiently searched.
- Red-black tree
- AVL tree
- B-tree
- Binary heap: it’s a special type of search tree, in which we can find the smallest (or biggest) item instantly ($$O(1)$$), because it’s always the root node of the tree.
- Graph: data is freely arranged as nodes and edges; it can represent almost any kind of data.
- Hash table: it’s a data structure that allows finding items in $$O(1)$$ time.
5. Algorithms
- Sorting
- Insertion sort: it’s very efficient at sorting near sorted datasets, regardless of size.
- Quicksort and mergesort, both $$O(n \log n)$$, are efficient for large datasets.
- Searching
- Binary search: $$O(\log n)$$ search in arrays.
- Hash table: $$O(1)$$ search.
- Graphs
- They can be searched in two ways: DFS and BFS.
- Depth-first search (DFS): we keep following edges, going deeper and deeper in the graph; when we reach a node that has no edges to any new nodes, we go back to the previous node and continue the process. We use a stack to keep track of the trail, pushing a node when we explore it, and popping a node when we need to go back.
- Breadth-first search (BFS): it explores the graph per level, first the neighboors of your start node, then its neighboors’ neighboors, and so on. To keep track of nodes we visit, we use a queue: once we explore a node, we enqueue its children, then dequeue the next node to explore.
- Operations research
- It involes defining an objective to “maximize” or “minimize”.
- Linear optimization problems are problems where the objective can be modeled using linear equations.
- The simplex method solves an optimization problem with the input functions and the equations that model its constraints. Example:
- Equation: z = 8x + 12y
- Budget constraints: 10x + 20y <= 140
- Space constraint: 6x + 8y <= 72
- We can’t have negative items: x >=0, y >= 0