How to read SQL queries you didn't write
Most internship work eventually involves opening a SQL file someone wrote three years ago and trying to figure out what it does. CS classes teach you to write small queries from scratch. They almost never teach you to read 200-line monsters with five JOINs, a subquery, two CTEs, and a window function. Here's the method that works.
The mental shift: SQL doesn't run in the order it's written
The single insight that makes complex SQL readable. SQL is a declarative language: you describe what you want, the database figures out how. The result is that the keywords don't execute in the order you read them.
The actual execution order:
FROM(andJOIN): get the source tables, smush them togetherWHERE: filter rowsGROUP BY: bucket rowsHAVING: filter bucketsSELECT: pick the columns to return (and compute expressions)DISTINCT: remove duplicatesORDER BY: sort the resultLIMIT/OFFSET: trim
So when you read a query, start from FROM and work your way through that order, not from SELECT downward. The SELECT at the top is the last logical step, even though it appears first.
The five-step reading method
For any query, regardless of complexity:
Step 1: Find the FROM block, identify the source tables
What tables is this query pulling from? List them on a piece of paper or in a comment.
SELECT u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
GROUP BY u.email
HAVING COUNT(o.id) > 0
ORDER BY order_count DESC
LIMIT 50;
Source tables: users (aliased u) and orders (aliased o). Two tables. The LEFT JOIN tells me orders is being attached to users, with users as the primary set.
Step 2: Read the WHERE clause to understand what's being filtered
What rows are being kept?
In the example: u.created_at > '2026-01-01'. Only users created after Jan 1, 2026.
Step 3: Look for GROUP BY and HAVING
If they exist, the query is doing aggregation. The result will be one row per group (per GROUP BY value), not one row per source-table row.
Here: GROUP BY u.email. So we get one row per user email. HAVING COUNT(o.id) > 0 drops users who have no orders. Notice this filters after aggregation; you can't put aggregate conditions in WHERE.
Step 4: Read the SELECT list
Now you can see what's actually being returned. u.email and the count of orders. One column is from a source row, the other is computed.
Step 5: ORDER BY and LIMIT
Sort by order count descending, top 50 results.
Reading order: tables → filter → group → filter groups → pick columns → sort → trim. After a few times this becomes automatic.
JOINs: the four kinds, in plain English
Every relational query you'll read uses some flavor of JOIN. Four to know:
INNER JOIN
"Give me only the rows where both tables have a match."
SELECT u.email, o.total
FROM users u
INNER JOIN orders o ON o.user_id = u.id;
Users with no orders are dropped. Orders with no matching user (data integrity bug) are also dropped.
LEFT JOIN (most common in real code)
"Give me all rows from the left table, plus matching rows from the right. Fill the right side with NULL when there's no match."
SELECT u.email, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;
All users included, even those with no orders. o.total will be NULL for those users. This is what most "show me users plus their orders" queries actually want.
RIGHT JOIN
Mirror image of LEFT JOIN. Almost never used in practice; people just flip the table order and use LEFT.
FULL OUTER JOIN
All rows from both tables, NULLs where there's no match. Rare. When you do see it, the query is usually doing some kind of reconciliation: "show me what's in either table, even if it's not in the other."
CROSS JOIN
Cartesian product: every row of A paired with every row of B. Almost always a bug if you see one accidentally (no ON clause). Used deliberately for things like generating combinations.
The "where did this row come from" trick
When you can't figure out what a query is doing, run a simplified version that returns the joined rows before any aggregation:
-- Original query
SELECT u.email, COUNT(o.id) AS order_count
FROM users u LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
GROUP BY u.email
HAVING COUNT(o.id) > 0;
-- Simplified version to see raw joined rows
SELECT u.id, u.email, u.created_at, o.id AS order_id, o.total
FROM users u LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
LIMIT 30;
Run the second query. You see the raw shape: one row per user-order combination. Now the GROUP BY and COUNT in the original makes intuitive sense. This trick is gold for understanding any aggregation query.
Practice reading and fixing real broken SQL
InternQuest's data-track missions include real-shaped SQL bugs: SQL injection in Express routes, broken filters, wrong JOINs that double-count rows. Spot the bug, write the fix. Free virtual SWE internship simulator.
Try a data mission →Subqueries vs CTEs
Both are queries inside queries. They look different but compile to similar things.
Subquery (inline)
SELECT email
FROM users
WHERE id IN (
SELECT user_id
FROM orders
WHERE total > 1000
);
"Users who have placed an order over $1000." The inner SELECT returns a list of user_ids. The outer query uses that list to filter users.
Subqueries can also appear in the SELECT list (called scalar subqueries) or in the FROM clause (then they act like a temporary table).
CTE (Common Table Expression)
WITH big_spenders AS (
SELECT user_id
FROM orders
WHERE total > 1000
)
SELECT u.email
FROM users u
INNER JOIN big_spenders bs ON bs.user_id = u.id;
Same query, written cleaner. CTEs are WITH name AS (...) at the top, then referenced by name in the main query. They make complex SQL much more readable because you can name and label intermediate results.
When you see a long query with multiple CTEs, read them top to bottom: each one builds on the previous, and the final SELECT at the bottom is the answer.
Window functions: the third hard concept
Window functions let you compute things like "rank" or "running total" without losing your row-level detail (the way GROUP BY would).
SELECT
user_id,
order_id,
total,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC) AS rank
FROM orders;
Reading this: for each user (PARTITION BY user_id), order their orders by total descending, and number them. Result: one row per order, with a "rank" column showing it's the user's #1 biggest order, #2, etc.
Window functions you'll see most:
ROW_NUMBER()— sequential number within the partitionRANK()/DENSE_RANK()— like ROW_NUMBER but ties get the same rankLAG()/LEAD()— peek at the previous or next rowSUM() OVER (...)— running totalsAVG() OVER (...)— moving averages
If a query has OVER (...) in it, it's a window function. Read the OVER clause carefully: PARTITION BY tells you what's being grouped, ORDER BY inside OVER tells you the row order within each group.
Common bugs in unfamiliar SQL
JOIN multiplies rows
If users has 1 user and orders has 5 orders for that user, the joined result has 5 rows, not 1. If you then do SUM(u.balance), you'll sum the balance 5 times. This is the most common SQL bug at every level.
NULL in WHERE
WHERE x != 'foo' doesn't include rows where x is NULL. Have to write WHERE x != 'foo' OR x IS NULL if you want them.
NULL in aggregate
COUNT(column) ignores NULLs. COUNT(*) doesn't. Trips people up.
GROUP BY missing column
-- This errors in strict-mode databases
SELECT user_id, email, COUNT(*)
FROM orders
GROUP BY user_id;
Every column in SELECT must either be in GROUP BY or be inside an aggregate. email isn't either, so this is ambiguous (which email? you have many rows per user_id).
Implicit cross join
SELECT * FROM users, orders;
The comma-separated FROM list with no JOIN condition is an implicit CROSS JOIN. If you have 1000 users and 1000 orders, the result is 1,000,000 rows. People do this by accident and wonder why their query is slow.
ORDER BY on a non-deterministic field
If you ORDER BY a non-unique field (say, a date), and then LIMIT 1, you'll get a random one of the tied rows each time. Add a tiebreaker: ORDER BY created_at DESC, id DESC.
Tools that pay off
- Format the query first. A wall of unformatted SQL is unreadable. Use a formatter (sqlformat.org, the formatter built into your IDE, or your database's own pretty-print) before you start reading.
- EXPLAIN / EXPLAIN ANALYZE. Most databases will show you the execution plan: which indexes get used, how rows flow. Even if you can't read execution plans yet, the structure helps.
- Run pieces of it. Comment out everything below the FROM, run, see what shape you're starting with. Then uncomment one block at a time.
- Test with a small dataset. Add a
LIMIT 10at the end while exploring; remove it when you're done.
The rule that makes you look senior in code review
When you encounter SQL in a PR, ask "what's the cardinality of the result?" Specifically: how many rows do you expect this to return? Is it one row per user, one row per order, one row per user-order pair?
This single question catches more bugs than any other SQL review heuristic. If the author can't articulate the expected cardinality, the query probably has a JOIN bug they haven't noticed.
The minimum SQL literacy bar for an intern
- Read cold: simple SELECTs, basic INNER and LEFT JOINs, GROUP BY with aggregates, ORDER BY, LIMIT, simple WHERE filters.
- Read with effort: CTEs, scalar subqueries, basic window functions like ROW_NUMBER and RANK.
- Google when needed: recursive CTEs, advanced window functions (LAG/LEAD with offsets), JSON operators, full-text search, lateral joins.
- Defer to seniors: query optimization, index design, query plans for scale, partitioning strategy.
If you can read the first category fluently and recognize the second, you're already ahead of most interns.
Drill SQL-reading on real bugs
InternQuest's data-track missions include broken JOINs, missing tiebreakers, NULL bugs, and SQL injection patterns. Real intern-grade SQL, automated PR review, free virtual software engineering internship simulator.
Try a SQL mission →