Introduction
GROUP BY is one of the most powerful SQL clauses. It allows you to group rows that have the same values in specified columns, then apply aggregate functions to each group separately.
Sample Data
For all examples, we'll use this books table:
| id | title | author_lname | author_fname | pages | released_year |
|---|---|---|---|---|---|
| 1 | Harry Potter 1 | Rowling | J.K. | 309 | 1997 |
| 2 | Harry Potter 2 | Rowling | J.K. | 341 | 1998 |
| 3 | Harry Potter 3 | Rowling | J.K. | 435 | 1999 |
| 4 | Harry Potter 4 | Rowling | J.K. | 636 | 2000 |
| 5 | The Shining | King | Stephen | 447 | 1977 |
| 6 | It | King | Stephen | 1138 | 1986 |
| 7 | Misery | King | Stephen | 370 | 1987 |
| 8 | 1984 | Orwell | George | 328 | 1949 |
Basic Syntax
SELECT column_name, AGGREGATE_FUNCTION(column) FROM table_name GROUP BY column_name;
How GROUP BY Works: Step-by-Step
Let's break down this query:
SELECT author_lname, COUNT(*) AS books_written, SUM(pages) AS total_pages FROM books GROUP BY author_lname;
Step 1: FROM - Load All Rows
SQL first loads all rows from the books table into memory.
Result after Step 1:
| id | title | author_lname | author_fname | pages | released_year |
|---|---|---|---|---|---|
| 1 | Harry Potter 1 | Rowling | J.K. | 309 | 1997 |
| 2 | Harry Potter 2 | Rowling | J.K. | 341 | 1998 |
| 3 | Harry Potter 3 | Rowling | J.K. | 435 | 1999 |
| 4 | Harry Potter 4 | Rowling | J.K. | 636 | 2000 |
| 5 | The Shining | King | Stephen | 447 | 1977 |
| 6 | It | King | Stephen | 1138 | 1986 |
| 7 | Misery | King | Stephen | 370 | 1987 |
| 8 | 1984 | Orwell | George | 328 | 1949 |
8 rows loaded
Step 2: GROUP BY - Create Logical Groups
SQL organizes rows into groups based on unique values in author_lname.
Result after Step 2: 3 Groups Created
┌─────────────────────────────────────────────────────────────────────────────┐ │ GROUP 1: author_lname = 'Rowling' (4 rows) │ ├─────────────────────────────────────────────────────────────────────────────┤ │ id | title | author_lname | author_fname | pages | year │ │ 1 | Harry Potter 1 | Rowling | J.K. | 309 | 1997 │ │ 2 | Harry Potter 2 | Rowling | J.K. | 341 | 1998 │ │ 3 | Harry Potter 3 | Rowling | J.K. | 435 | 1999 │ │ 4 | Harry Potter 4 | Rowling | J.K. | 636 | 2000 │ └─────────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────────┐ │ GROUP 2: author_lname = 'King' (3 rows) │ ├─────────────────────────────────────────────────────────────────────────────┤ │ id | title | author_lname | author_fname | pages | year │ │ 5 | The Shining | King | Stephen | 447 | 1977 │ │ 6 | It | King | Stephen | 1138 | 1986 │ │ 7 | Misery | King | Stephen | 370 | 1987 │ └─────────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────────┐ │ GROUP 3: author_lname = 'Orwell' (1 row) │ ├─────────────────────────────────────────────────────────────────────────────┤ │ id | title | author_lname | author_fname | pages | year │ │ 8 | 1984 | Orwell | George | 328 | 1949 │ └─────────────────────────────────────────────────────────────────────────────┘
Key Point: Each group contains complete rows with ALL columns from the original table.
Step 3: SELECT - Process Each Group
SQL now processes SELECT author_lname, COUNT(*), SUM(pages) for each group:
GROUP 1 (Rowling):
author_lname = 'Rowling' COUNT(*) = 4 (4 rows in group) SUM(pages) = 309 + 341 + 435 + 636 = 1721
→ Result row: | Rowling | 4 | 1721 |
GROUP 2 (King):
author_lname = 'King' COUNT(*) = 3 (3 rows in group) SUM(pages) = 447 + 1138 + 370 = 1955
→ Result row: | King | 3 | 1955 |
GROUP 3 (Orwell):
author_lname = 'Orwell' COUNT(*) = 1 (1 row in group) SUM(pages) = 328
→ Result row: | Orwell | 1 | 328 |
Step 4: Final Result
Each group is collapsed into one row:
| author_lname | books_written | total_pages |
|---|---|---|
| Rowling | 4 | 1721 |
| King | 3 | 1955 |
| Orwell | 1 | 328 |
Summary:
| Original Rows | Groups Created | Final Rows |
|---|---|---|
| 8 | 3 | 3 |
Visual Flow Diagram
┌─────────────────────────────────────────────────────────────────┐ │ Original books table │ │ (8 rows) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ GROUP BY author_lname ┌─────────────────────┼─────────────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Group 1 │ │ Group 2 │ │ Group 3 │ │ "Rowling" │ │ "King" │ │ "Orwell" │ │ (4 rows) │ │ (3 rows) │ │ (1 row) │ │ all columns │ │ all columns │ │ all columns │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ ▼ SELECT ▼ SELECT ▼ SELECT ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ Rowling|4|1721 │ │ King |3|1955 │ │ Orwell|1|328 │ └────────────────┘ └────────────────┘ └────────────────┘ │ │ │ └─────────────────────┼─────────────────────┘ ▼ ┌─────────────────────────────────┐ │ Final Result │ │ author_lname | count | pages │ │ Rowling | 4 | 1721 │ │ King | 3 | 1955 │ │ Orwell | 1 | 328 │ └─────────────────────────────────┘
SQL Execution Order
Written order:
SELECT → FROM → GROUP BY
Actual execution order:
FROM → GROUP BY → SELECT
This is why you can use column aliases in ORDER BY but not in WHERE.
Multiple Aggregate Functions
Since groups contain all columns, you can apply different aggregates:
SELECT author_lname, COUNT(*) AS books_written, SUM(pages) AS total_pages, AVG(pages) AS avg_pages, MIN(released_year) AS first_book, MAX(released_year) AS latest_book FROM books GROUP BY author_lname;
Result:
| author_lname | books_written | total_pages | avg_pages | first_book | latest_book |
|---|---|---|---|---|---|
| Rowling | 4 | 1721 | 430.25 | 1997 | 2000 |
| King | 3 | 1955 | 651.67 | 1977 | 1987 |
| Orwell | 1 | 328 | 328.00 | 1949 | 1949 |
GROUP BY Multiple Columns
You can group by multiple columns to create more specific groups:
SELECT author_lname, author_fname, COUNT(*) AS books_written FROM books GROUP BY author_lname, author_fname;
This creates groups based on unique combinations of author_lname AND author_fname.
The SELECT Rule
After GROUP BY, you can only SELECT:
- Columns in GROUP BY clause (the grouping keys)
- Aggregate functions applied to other columns
-- ❌ ERROR: title is not in GROUP BY and not aggregated SELECT author_lname, title FROM books GROUP BY author_lname; -- ✅ OK: title is aggregated with MAX() SELECT author_lname, MAX(title) AS last_title_alphabetically FROM books GROUP BY author_lname; -- ✅ OK: all non-aggregated columns are in GROUP BY SELECT author_lname, author_fname, COUNT(*) FROM books GROUP BY author_lname, author_fname;
Why? Because each group may contain multiple different values for non-grouped columns. SQL doesn't know which one to pick.
Key Takeaways
| Concept | Description |
|---|---|
| GROUP BY | Splits rows into logical groups based on column values |
| Groups contain | Complete rows with ALL columns from original table |
| Aggregates | Process each group separately, return one value per group |
| Final result | One row per group |
| SELECT rule | Only grouped columns or aggregate functions allowed |
| Execution order | FROM → GROUP BY → SELECT |
Common Use Cases
-- Count items per category SELECT category, COUNT(*) FROM products GROUP BY category; -- Total sales per customer SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id; -- Average rating per product SELECT product_id, AVG(rating) FROM reviews GROUP BY product_id; -- Find most recent order per customer SELECT customer_id, MAX(order_date) FROM orders GROUP BY customer_id;
