Book review – Discovering SQL: A Hands-On Guide for Beginners

I’ve been hidding behind ORMs for a long time, and I finally decided (honestly, I needed ;) to learn basic SQL from a “formal source”. After searching and sorting lots of books, I chose Discovering SQL: A Hands-On Guide for Beginners, by Alex Kriegel.

The book itself is very short, concise and it doesn’t try to teach you relational theory, only pure SQL (the almost standardized SQL). Besides basic SQL syntax, the book explains the nuances between different RDBMs (MySQL, PostgreSQL, SQL Server, IBM DB2, Oracle 11g, and others), with a little more focus on SQL Server (I guess the author is more familiarized with it). Furthermore, Alex has a nice conversational style, and that makes the book much more enjoyable.

I really liked this book, and it’s certainly a great resource to beginners or to whom wants to remember about SQL syntax. By the way, I’ve already recommended it to a disciple of mine who had never heard about SQL, and he is doing great. ;)

Book on Amazon: Discovering SQL: A Hands-On Guide for Beginners (Wrox Programmer to Programmer)

My personal notes

Breaking and entering structured information

  • Entities: the actual physical objects or abstract concepts in your requirements. E.g.: books, authors.
  • Relationships: they model the way the entities interact. E.g.: a book can have many authors.
  • Data types help keep the domain integrity.
  • Character data types: fixed fields (CHAR) are faster, but variable fields (VARCHAR) are more flexible.
  • Numeric data types: INTEGER, SMALLINT, NUMERIC (float), FLOAT, REAL, DOUBLE.
  • Date and time data types: DATE (yyyy-mm-dd), TIME (hh:mm:ss), DATETIME (yyyy-mm-dd hh:mm:ss).
  • Absence of value: NULL.
  • Boolean: true/false, t/f, yes/no, y/n, 1/0 on PostgreSQL.

Components of SQL

  • DDL - Data Definition Language: used to create, modify or destroy objects.
  • DML - Data Manipulation Language: insert, update and delete.
  • DQL - Data Query Language: select statement.
  • TCL - Transaction Control Language: commit, rollback, savepoint, etc.
  • DCL - Data Control Language: grant, revoke privileges, etc.

Refactoring database tables

  • Drop table: DROP TABLE table
  • Create table: CREATE TABLE books( id BIGINT NOT NULL, title VARCHAR(50) NOT NULL );
  • Rename table: ALTER TABLE books RENAME TO BAR
  • Add column: ALTER TABLE books ADD read CHAR(1);
  • Drop column: ALTER TABLE books DROP COLUMN read;

SQL operators: agents of change

  • ALL and ANY/SOME: compares a scalar value with a single column of values.
  • BETWEEN: finds values in range.
  • IN: matches any given value in a list.
  • EXISTS: checks for the existence of rows in a subquery.
  • LIKE: finds things that are partially know. % matches any string of zero or more chars, and _ matches any char in a string.
  • AND and OR: combines two boolean expressions.
  • NOT: negates a boolean input.

Designing a relational database

  • Primary key: unique identifier for a row of data in a table. It helps enforcing integrity.
    • A PK is not null, unique and should not be changed.
    • PKs should be meaningless, i.e., only serve as a PK and not as data.
    • Adding a PK:
CREATE TABLE books (
  id INTEGER NOT NULL,
  title VARCHAR(50) NOT NULL,
  CONSTRAINT bk_pk PRIMARY KEY (id));

ALTER TABLE books
  ADD CONSTRAINT bk_pk PRIMARY KEY (id);
  • Relationship patterns
    • Foreign keys are most likely a primary key in a child table.
    • They don’t need to be unique.
    • Relations: 1:0, N:0, 1:1, N:1, N:N
    • Adding a foreign-key
CREATE TABLE authors (
  id INTEGER NOT NULL,
  name VARCHAR(20) NOT NULL,
  CONSTRAINT au_pk PRIMARY KEY (id)
);

CREATE TABLE books (
  id INTEGER NOT NULL,
  title VARCHAR(50) NOT NULL,
  author_id INTEGER NOT NULL,
  CONSTRAINT bk_pk PRIMARY KEY (id),
  CONSTRAINT fk_authors FOREIGN KEY (author_id) REFERENCES authors(id));

ALTER TABLE books
  ADD CONSTRAINT fk_authors FOREIGN KEY (author_id) REFERENCES authors(id);
  • Domain integrity
    • Domain: range of values that an attribute for an entity (column for a table) will accept.
    • A data type declaration is a constraint.
  • Database normalization
    • The Normal Forms range from 1 to 5.
    • Form rules:
      • 1NF: each row has to be unique; there should be no repeating groups of data and all columns should contain atomic values (an indivisible piece of data). Every 1NF table must have a primary key.
      • 2NF: only columns with full dependency on the primary key are allowed.
      • 3NF: all non-key columns.
  • Specifying constraints
    • DEFAULT: CREATE TABLE users ( id INTEGER NOT NULL, name VARCHAR(50) DEFAULT ‘Bob’ );
    • UNIQUE: ALTER TABLE books ADD CONSTRAINT uq_title UNIQUE (title);
  • Working with multiple tables
    • JOIN: joins two or more tables
    • UNION: combines records return from two or more SELECT statements. The combine values must be compatible.

Overcoming the limitations of SQL

  • Numbers
    • SIGN: returns a number sign (+1 or -1).
    • ABS: returns the absolute value of a number.
    • FLOOR: rounds arguments down to the nearest integer.
    • CEIL: rounds arguments up to the nearest integer.
    • ROUND: rounds a decimal number to an integer.
  • Characters
    • CONCAT, ||: concatenates strings.
    • LENGTH: returns string length.
    • TRIM, LTRIM, RTRIM: removes whitespace.
  • Date and time
    • NOW: current date and time in MySQL/Postgres.
  • Aggregate functions
    • AVG: calculates arithmetic average of the args.
    • COUNT: returns number of records.
    • DISTINCT COUNT: returns number of distinct records.
    • MAX: returns the maximum value in a set.
    • MIN: returns the minimum value in a set.
    • SUM: returns the sum of all values in a column.
  • Conversion functions
    • CAST: converts between datatypes.
SELECT CAST('10/11/2012' AS DATETIME);
SELECT CAST(<EXPRESSION>  AS <TYPE>);
  • Miscelanious functions
    • COALESCT: returns the first argument that is not NULL.
  • Case statement
    • Returns based on conditions:
SELECT title
FROM books
WHERE type = (
  CASE publication_year
    WHEN 2000 THEN 'old'
    WHEN 2001 THEN 'new'
    ELSE NULL
  end
)

Grouping and aggregation

  • Duplicate data can be filtered using the DISTINCT keyword:
SELECT COUNT(DISTINCT publisher) as pub_count
FROM books;
  • Grouped results using GROUP BY can use aggregate functions for post-processing.
SELECT
  publisher,
  SUM(price) AS total
FROM books
GROUP BY publisher;
  • Aggregated data can be select with HAVING clauses.
SELECT
  publisher,
  AVG(price) as total
FROM books
GROUP BY publisher
  HAVING AVG(price) > 20;
  • Sorting query output
    • Results can be ordered in two ways: ascending and descending.
    • The final result set can be ordered using an aggregate function.

A query within a query

  • Subquery usage: “I do not know, but I know someone who does.”
  • Subquery in the WHERE clause:
SELECT
  title,
  publisher
FROM
  books
WHERE id IN (
  SELECT
    fk_bk_loc
  FROM
    location
  WHERE
    shelf = 5
);
  • Subquery in the select list
SELECT
  title,
  (SELECT MAX(price) FROM books) AS max_price
FROM
  books

Combining data sets

  • Joins are used to create a temporary data set from various tables.
  • Joins are made by relating foreign and primary keys.
  • INNER JOIN: it conbines data from several tables and returns a subset where data overlaps. We get data only from rows that match.
SELECT *
FROM books
INNER JOIN location
  ON location.id = books.location_id;
  • N-WAY INNER JOIN: data is combined in a N:N relation.
SELECT
  books.title,
  authors.last_name
FROM books
INNER JOIN books_authors
  ON books.id = books_authors.book_id
INNER JOIN authors
  ON authors.id = books_authors.author_id;
  • LEFT JOIN: joins a table and unmatched rows on the left table are complemented by NULL on the right.
SELECT
  title,
  shelf
FROM books
LEFT OUTER JOIN location
  ON location.fk_bok_loc = books.id;
  • RIGHT JOIN: same as a LEFT JOIN, but unmatched left rows become NULL.
SELECT
  shelf,
  book_title
FROM location
RIGHT OUTER JOIN books
  ON location.fk_book_loc = books.id;
  • VIEW: they are “frozen” queries.
CREATE VIEW new_view
AS
  <SELECT query>;

CREATE VIEW comedies
AS
  SELECT *
  FROM films
  WHERE kind = 'Comedy';

SELECT * FROM new_view;

DROP VIEW new_view;

What else is there, and why?

  • Indexes: they increate retrieval speed.
CREATE INDEX <name> ON <table>(<columns>);

DROP INDEX <name>;
  • Auto-incremented values: they enforce referential integrity (Postgres: SERIAL; MySQL: AUTO_INCREMENT).

Multiuser environment

  • Transactions: used to maintain data consistency and enforce database integrity.
  • Transactions must pass the ACID test in order to be qualified as such.
  • Information Schema: stores objects, relations and other stuff in a RDBMS.

I used to have Disqus enabled on my website, but I have disabled it because of privacy concerns.

If you feel like commenting or asking something, you can still contact me via other means.