How to read SQL queries you didn't write

Most internship work eventually involves opening a SQL file someone wrote three years ago and trying to figure out what it does. CS classes teach you to write small queries from scratch. They almost never teach you to read 200-line monsters with five JOINs, a subquery, two CTEs, and a window function. Here's the method that works.

The mental shift: SQL doesn't run in the order it's written

The single insight that makes complex SQL readable. SQL is a declarative language: you describe what you want, the database figures out how. The result is that the keywords don't execute in the order you read them.

The actual execution order:

  1. FROM (and JOIN): get the source tables, smush them together
  2. WHERE: filter rows
  3. GROUP BY: bucket rows
  4. HAVING: filter buckets
  5. SELECT: pick the columns to return (and compute expressions)
  6. DISTINCT: remove duplicates
  7. ORDER BY: sort the result
  8. LIMIT / OFFSET: trim

So when you read a query, start from FROM and work your way through that order, not from SELECT downward. The SELECT at the top is the last logical step, even though it appears first.

The five-step reading method

For any query, regardless of complexity:

Step 1: Find the FROM block, identify the source tables

What tables is this query pulling from? List them on a piece of paper or in a comment.

SELECT u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
GROUP BY u.email
HAVING COUNT(o.id) > 0
ORDER BY order_count DESC
LIMIT 50;

Source tables: users (aliased u) and orders (aliased o). Two tables. The LEFT JOIN tells me orders is being attached to users, with users as the primary set.

Step 2: Read the WHERE clause to understand what's being filtered

What rows are being kept?

In the example: u.created_at > '2026-01-01'. Only users created after Jan 1, 2026.

Step 3: Look for GROUP BY and HAVING

If they exist, the query is doing aggregation. The result will be one row per group (per GROUP BY value), not one row per source-table row.

Here: GROUP BY u.email. So we get one row per user email. HAVING COUNT(o.id) > 0 drops users who have no orders. Notice this filters after aggregation; you can't put aggregate conditions in WHERE.

Step 4: Read the SELECT list

Now you can see what's actually being returned. u.email and the count of orders. One column is from a source row, the other is computed.

Step 5: ORDER BY and LIMIT

Sort by order count descending, top 50 results.

Reading order: tables → filter → group → filter groups → pick columns → sort → trim. After a few times this becomes automatic.

JOINs: the four kinds, in plain English

Every relational query you'll read uses some flavor of JOIN. Four to know:

INNER JOIN

"Give me only the rows where both tables have a match."

SELECT u.email, o.total
FROM users u
INNER JOIN orders o ON o.user_id = u.id;

Users with no orders are dropped. Orders with no matching user (data integrity bug) are also dropped.

LEFT JOIN (most common in real code)

"Give me all rows from the left table, plus matching rows from the right. Fill the right side with NULL when there's no match."

SELECT u.email, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;

All users included, even those with no orders. o.total will be NULL for those users. This is what most "show me users plus their orders" queries actually want.

RIGHT JOIN

Mirror image of LEFT JOIN. Almost never used in practice; people just flip the table order and use LEFT.

FULL OUTER JOIN

All rows from both tables, NULLs where there's no match. Rare. When you do see it, the query is usually doing some kind of reconciliation: "show me what's in either table, even if it's not in the other."

CROSS JOIN

Cartesian product: every row of A paired with every row of B. Almost always a bug if you see one accidentally (no ON clause). Used deliberately for things like generating combinations.

The "where did this row come from" trick

When you can't figure out what a query is doing, run a simplified version that returns the joined rows before any aggregation:

-- Original query
SELECT u.email, COUNT(o.id) AS order_count
FROM users u LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
GROUP BY u.email
HAVING COUNT(o.id) > 0;

-- Simplified version to see raw joined rows
SELECT u.id, u.email, u.created_at, o.id AS order_id, o.total
FROM users u LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
LIMIT 30;

Run the second query. You see the raw shape: one row per user-order combination. Now the GROUP BY and COUNT in the original makes intuitive sense. This trick is gold for understanding any aggregation query.

Practice reading and fixing real broken SQL

InternQuest's data-track missions include real-shaped SQL bugs: SQL injection in Express routes, broken filters, wrong JOINs that double-count rows. Spot the bug, write the fix. Free virtual SWE internship simulator.

Try a data mission →

Subqueries vs CTEs

Both are queries inside queries. They look different but compile to similar things.

Subquery (inline)

SELECT email
FROM users
WHERE id IN (
  SELECT user_id
  FROM orders
  WHERE total > 1000
);

"Users who have placed an order over $1000." The inner SELECT returns a list of user_ids. The outer query uses that list to filter users.

Subqueries can also appear in the SELECT list (called scalar subqueries) or in the FROM clause (then they act like a temporary table).

CTE (Common Table Expression)

WITH big_spenders AS (
  SELECT user_id
  FROM orders
  WHERE total > 1000
)
SELECT u.email
FROM users u
INNER JOIN big_spenders bs ON bs.user_id = u.id;

Same query, written cleaner. CTEs are WITH name AS (...) at the top, then referenced by name in the main query. They make complex SQL much more readable because you can name and label intermediate results.

When you see a long query with multiple CTEs, read them top to bottom: each one builds on the previous, and the final SELECT at the bottom is the answer.

Window functions: the third hard concept

Window functions let you compute things like "rank" or "running total" without losing your row-level detail (the way GROUP BY would).

SELECT
  user_id,
  order_id,
  total,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC) AS rank
FROM orders;

Reading this: for each user (PARTITION BY user_id), order their orders by total descending, and number them. Result: one row per order, with a "rank" column showing it's the user's #1 biggest order, #2, etc.

Window functions you'll see most:

If a query has OVER (...) in it, it's a window function. Read the OVER clause carefully: PARTITION BY tells you what's being grouped, ORDER BY inside OVER tells you the row order within each group.

Common bugs in unfamiliar SQL

JOIN multiplies rows

If users has 1 user and orders has 5 orders for that user, the joined result has 5 rows, not 1. If you then do SUM(u.balance), you'll sum the balance 5 times. This is the most common SQL bug at every level.

NULL in WHERE

WHERE x != 'foo' doesn't include rows where x is NULL. Have to write WHERE x != 'foo' OR x IS NULL if you want them.

NULL in aggregate

COUNT(column) ignores NULLs. COUNT(*) doesn't. Trips people up.

GROUP BY missing column

-- This errors in strict-mode databases
SELECT user_id, email, COUNT(*)
FROM orders
GROUP BY user_id;

Every column in SELECT must either be in GROUP BY or be inside an aggregate. email isn't either, so this is ambiguous (which email? you have many rows per user_id).

Implicit cross join

SELECT * FROM users, orders;

The comma-separated FROM list with no JOIN condition is an implicit CROSS JOIN. If you have 1000 users and 1000 orders, the result is 1,000,000 rows. People do this by accident and wonder why their query is slow.

ORDER BY on a non-deterministic field

If you ORDER BY a non-unique field (say, a date), and then LIMIT 1, you'll get a random one of the tied rows each time. Add a tiebreaker: ORDER BY created_at DESC, id DESC.

Tools that pay off

The rule that makes you look senior in code review

When you encounter SQL in a PR, ask "what's the cardinality of the result?" Specifically: how many rows do you expect this to return? Is it one row per user, one row per order, one row per user-order pair?

This single question catches more bugs than any other SQL review heuristic. If the author can't articulate the expected cardinality, the query probably has a JOIN bug they haven't noticed.

The minimum SQL literacy bar for an intern

If you can read the first category fluently and recognize the second, you're already ahead of most interns.

Drill SQL-reading on real bugs

InternQuest's data-track missions include broken JOINs, missing tiebreakers, NULL bugs, and SQL injection patterns. Real intern-grade SQL, automated PR review, free virtual software engineering internship simulator.

Try a SQL mission →