SQL GROUP BY Clause > Step by Step Guide

#sql
single

Introduction

GROUP BY is one of the most powerful SQL clauses. It allows you to group rows that have the same values in specified columns, then apply aggregate functions to each group separately.


Sample Data

For all examples, we'll use this books table:

idtitleauthor_lnameauthor_fnamepagesreleased_year
1Harry Potter 1RowlingJ.K.3091997
2Harry Potter 2RowlingJ.K.3411998
3Harry Potter 3RowlingJ.K.4351999
4Harry Potter 4RowlingJ.K.6362000
5The ShiningKingStephen4471977
6ItKingStephen11381986
7MiseryKingStephen3701987
81984OrwellGeorge3281949

Basic Syntax

SELECT column_name, AGGREGATE_FUNCTION(column)
FROM table_name
GROUP BY column_name;

How GROUP BY Works: Step-by-Step

Let's break down this query:

SELECT
    author_lname,
    COUNT(*) AS books_written,
    SUM(pages) AS total_pages
FROM books
GROUP BY author_lname;

Step 1: FROM - Load All Rows

SQL first loads all rows from the books table into memory.

Result after Step 1:

idtitleauthor_lnameauthor_fnamepagesreleased_year
1Harry Potter 1RowlingJ.K.3091997
2Harry Potter 2RowlingJ.K.3411998
3Harry Potter 3RowlingJ.K.4351999
4Harry Potter 4RowlingJ.K.6362000
5The ShiningKingStephen4471977
6ItKingStephen11381986
7MiseryKingStephen3701987
81984OrwellGeorge3281949

8 rows loaded


Step 2: GROUP BY - Create Logical Groups

SQL organizes rows into groups based on unique values in author_lname.

Result after Step 2: 3 Groups Created

┌─────────────────────────────────────────────────────────────────────────────┐
│ GROUP 1: author_lname = 'Rowling'  (4 rows)                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│ id | title                    | author_lname | author_fname | pages | year  │
│  1 | Harry Potter 1           | Rowling      | J.K.         | 309   | 1997  │
│  2 | Harry Potter 2           | Rowling      | J.K.         | 341   | 1998  │
│  3 | Harry Potter 3           | Rowling      | J.K.         | 435   | 1999  │
│  4 | Harry Potter 4           | Rowling      | J.K.         | 636   | 2000  │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ GROUP 2: author_lname = 'King'  (3 rows)                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│ id | title                    | author_lname | author_fname | pages | year  │
│  5 | The Shining              | King         | Stephen      | 447   | 1977  │
│  6 | It                       | King         | Stephen      | 1138  | 1986  │
│  7 | Misery                   | King         | Stephen      | 370   | 1987  │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ GROUP 3: author_lname = 'Orwell'  (1 row)                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│ id | title                    | author_lname | author_fname | pages | year  │
│  8 | 1984                     | Orwell       | George       | 328   | 1949  │
└─────────────────────────────────────────────────────────────────────────────┘

Key Point: Each group contains complete rows with ALL columns from the original table.


Step 3: SELECT - Process Each Group

SQL now processes SELECT author_lname, COUNT(*), SUM(pages) for each group:

GROUP 1 (Rowling):

author_lname = 'Rowling'
COUNT(*)     = 4                              (4 rows in group)
SUM(pages)   = 309 + 341 + 435 + 636 = 1721

→ Result row: | Rowling | 4 | 1721 |

GROUP 2 (King):

author_lname = 'King'
COUNT(*)     = 3                              (3 rows in group)
SUM(pages)   = 447 + 1138 + 370 = 1955

→ Result row: | King | 3 | 1955 |

GROUP 3 (Orwell):

author_lname = 'Orwell'
COUNT(*)     = 1                              (1 row in group)
SUM(pages)   = 328

→ Result row: | Orwell | 1 | 328 |


Step 4: Final Result

Each group is collapsed into one row:

author_lnamebooks_writtentotal_pages
Rowling41721
King31955
Orwell1328

Summary:

Original RowsGroups CreatedFinal Rows
833

Visual Flow Diagram

┌─────────────────────────────────────────────────────────────────┐
│                     Original books table                        │
│                        (8 rows)                                 │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼ GROUP BY author_lname
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Group 1     │    │  Group 2     │    │  Group 3     │
│  "Rowling"   │    │  "King"      │    │  "Orwell"    │
│  (4 rows)    │    │  (3 rows)    │    │  (1 row)     │
│  all columns │    │  all columns │    │  all columns │
└──────────────┘    └──────────────┘    └──────────────┘
        │                     │                     │
        ▼ SELECT              ▼ SELECT              ▼ SELECT
┌────────────────┐  ┌────────────────┐  ┌────────────────┐
│ Rowling|4|1721 │  │ King  |3|1955  │  │ Orwell|1|328   │
└────────────────┘  └────────────────┘  └────────────────┘
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              ▼
              ┌─────────────────────────────────┐
              │         Final Result            │
              │  author_lname | count | pages   │
              │  Rowling      | 4     | 1721    │
              │  King         | 3     | 1955    │
              │  Orwell       | 1     | 328     │
              └─────────────────────────────────┘

SQL Execution Order

Written order:

SELECT → FROM → GROUP BY

Actual execution order:

FROM → GROUP BY → SELECT

This is why you can use column aliases in ORDER BY but not in WHERE.


Multiple Aggregate Functions

Since groups contain all columns, you can apply different aggregates:

SELECT
    author_lname,
    COUNT(*) AS books_written,
    SUM(pages) AS total_pages,
    AVG(pages) AS avg_pages,
    MIN(released_year) AS first_book,
    MAX(released_year) AS latest_book
FROM books
GROUP BY author_lname;

Result:

author_lnamebooks_writtentotal_pagesavg_pagesfirst_booklatest_book
Rowling41721430.2519972000
King31955651.6719771987
Orwell1328328.0019491949

GROUP BY Multiple Columns

You can group by multiple columns to create more specific groups:

SELECT
    author_lname,
    author_fname,
    COUNT(*) AS books_written
FROM books
GROUP BY author_lname, author_fname;

This creates groups based on unique combinations of author_lname AND author_fname.


The SELECT Rule

After GROUP BY, you can only SELECT:

  1. Columns in GROUP BY clause (the grouping keys)
  2. Aggregate functions applied to other columns
-- ❌ ERROR: title is not in GROUP BY and not aggregated
SELECT author_lname, title 
FROM books 
GROUP BY author_lname;

-- ✅ OK: title is aggregated with MAX()
SELECT author_lname, MAX(title) AS last_title_alphabetically
FROM books 
GROUP BY author_lname;

-- ✅ OK: all non-aggregated columns are in GROUP BY
SELECT author_lname, author_fname, COUNT(*)
FROM books 
GROUP BY author_lname, author_fname;

Why? Because each group may contain multiple different values for non-grouped columns. SQL doesn't know which one to pick.


Key Takeaways

ConceptDescription
GROUP BYSplits rows into logical groups based on column values
Groups containComplete rows with ALL columns from original table
AggregatesProcess each group separately, return one value per group
Final resultOne row per group
SELECT ruleOnly grouped columns or aggregate functions allowed
Execution orderFROM → GROUP BY → SELECT

Common Use Cases

-- Count items per category
SELECT category, COUNT(*) FROM products GROUP BY category;

-- Total sales per customer
SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id;

-- Average rating per product
SELECT product_id, AVG(rating) FROM reviews GROUP BY product_id;

-- Find most recent order per customer
SELECT customer_id, MAX(order_date) FROM orders GROUP BY customer_id;
thongvmdev_M9VMOt
WRITTEN BY

thongvmdev

Share and grow together