Computer Science Distilled: Learn the Art of Solving Computational Problems

My personal notes on the book Computer Science Distilled: Learn the Art of Solving Computational Problems.

1. Basics

  • Dump all important stuff on paper.
  • Probability
    • The formula:
      \(P(event) = \frac{\#\ of\ ways\ event\ can\ happen}{\#\ of\ possible\ outcomes}\)

    • Independent events: \(P(A \cap B) = P(A) \cdot P(B)\)
    • Mututally exclusive events: \(P(A \cup B) = P(A) + P(B)\)
    • Complimentary events: \(1 = P(A) + P(B)\)
    • The Gambler’s Fallacy: past events never affect the outcome of an independent event.

2. Complexity

  • Time complexity, denoted by T(n), gives the number of operations the number of operations the algorithm performs when processing an input of size n.
    • When an algorithm can have different values of T(n) for the same value n, we resort to cases:
      • Best case
      • Worst case
      • Average case
    • In general, the most important is the worst case (baseline).
  • We can find the time complexity of an algorithm by counting the number of basic operations it requires for an hypothetical input of size n.
    • We can approximate T(n) by its fastest-growing term, called the dominant term: \(T(n) = n^2 + n - 2 \approx O(n^2)\)
  • The Big-O notation
    • This notation is used to express the dominant term of an algorithm’s cost function in the worst case. A function with a fastest-growing term of \(2^n\) or “weaker” is \(O(2^n)\)
  • Space complexity measures the working storage for an algorithm.

3. Strategy

  • The iterative strategy consists of using loops to repeat a process until a condition is met.
  • The recursive strategy consists of using functions that deleted work to “clones of itself”.
  • The brute force strategy solves problems by inspecting all off the problem’s possible solution candidates.
  • The backtrack strategy involves rolling back the previous step and continuing the search. It works best in problems where the solution is a sequence of choices and making a choice restrains subsequent choices.
  • A heuristic (method) leads to a solution without guaranteeing it is the best or optimal one.
    • Greedy approach: the opposite of backtracking; it consist in never coming back to previous choices.
  • The divide and conquer strategy divides a problem over and over into subproblems, until the become easy enough to solve. The subproblem solutions are combined to obtain the original problem solution.
  • Dynaminc programming is identifying repeated subproblems in order to compute them only once.
    • Memoization allows the reuse of partial calculations.
  • Brand and bound strategies provide a solution in the form of a sequence of choices. It works by detecting and discarding bad choices.

4. Data

  • Abstractions are interfaces that let us omit details.
  • An Abstract Data Type (ADT) is the specification of a group of operations that make sense for a given data type.
  • Common abstractions
    • Primitive data types
    • Stack
    • Queue
    • Priority queue
    • List
    • Sorted list
    • Map
    • Set
  • Structures
    • Data structures describe how data is to be organized and accessed in the computer’s memory.
    • Array: the simplest way to store stuff in computer memory. It consists in allocation sequential space in the memory, and writing your data sequentially on that space.
    • Linked list: data is stored in a chain of cells that doesn’t need to be at sequential memory addresses. Memory for each cell is allocated as needed.
    • Double linked list
    • Tree: they are similar to lists, but are more suitable to hierarchical data.
    • Binary search tree: it’s a special type of tree that can be efficiently searched.
      • Red-black tree
      • AVL tree
      • B-tree
    • Binary heap: it’s a special type of search tree, in which we can find the smallest (or biggest) item instantly (\(O(1)\)), because it’s always the root node of the tree.
    • Graph: data is freely arranged as nodes and edges; it can represent almost any kind of data.
    • Hash table: it’s a data structure that allows finding items in \(O(1)\) time.

5. Algorithms

  • Sorting
    • Insertion sort: it’s very efficient at sorting near sorted datasets, regardless of size.
    • Quicksort and mergesort, both \(O(n \log n)\), are efficient for large datasets.
  • Searching
    • Binary search: \(O(\log n)\) search in arrays.
    • Hash table: \(O(1)\) search.
  • Graphs
    • They can be searched in two ways: DFS and BFS.
    • Depth-first search (DFS): we keep following edges, going deeper and deeper in the graph; when we reach a node that has no edges to any new nodes, we go back to the previous node and continue the process. We use a stack to keep track of the trail, pushing a node when we explore it, and popping a node when we need to go back.
    • Breadth-first search (BFS): it explores the graph per level, first the neighboors of your start node, then its neighboors’ neighboors, and so on. To keep track of nodes we visit, we use a queue: once we explore a node, we enqueue its children, then dequeue the next node to explore.
  • Operations research
    • It involes defining an objective to “maximize” or “minimize”.
    • Linear optimization problems are problems where the objective can be modeled using linear equations.
    • The simplex method solves an optimization problem with the input functions and the equations that model its constraints. Example:
      • Equation: z = 8x + 12y
      • Budget constraints: 10x + 20y <= 140
      • Space constraint: 6x + 8y <= 72
      • We can’t have negative items: x >=0, y >= 0