Dataslope logoDataslope

Filtering and Slicing

Using WHERE, BETWEEN, IN, and pattern matching to carve a dataset into the exact slice a question needs — and why filtering early is the analyst's habit.

Profiling tells you what the whole dataset looks like. Most questions, though, are about a part of it: this region, this month, these categories. Carving out that part is slicing, and the tool is the WHERE clause you already know — used with an analyst's intent. This page sharpens your filtering vocabulary and explains why analysts filter as early as possible.

A slice is a question in disguise

Every business question hides a filter. "How are sales in the West?" means WHERE region = 'West'. "What happened in Q1?" means a date range. Learning to hear the filter inside a question is half of analytical SQL.

Ranges with BETWEEN

Numeric and date ranges come up constantly — a price band, a date window, an age bracket. BETWEEN reads naturally and is inclusive on both ends.

SQL
DuckDB 1.32.0

BETWEEN includes both endpoints

amount BETWEEN 100 AND 200 keeps rows where amount is 100, 200, and everything in between. If you need to exclude an endpoint, use explicit >= / < comparisons instead.

Membership with IN

When a column must match one of several values, IN is cleaner than a chain of ORs.

SQL
DuckDB 1.32.0

category IN ('Office', 'Electronics') is exactly category = 'Office' OR category = 'Electronics', but it scales gracefully to a long list.

Text patterns with LIKE and ILIKE

Analysts often filter text by shape rather than exact value: names starting with "A", emails at a domain, codes containing a substring. LIKE uses % (any run of characters) and _ (one character). DuckDB's ILIKE is the case-insensitive version.

SQL
DuckDB 1.32.0

Combining conditions — and the danger of AND/OR

Real slices combine conditions. Be careful mixing AND and OR: AND binds tighter than OR, so without parentheses the query may not mean what you intended. Make your grouping explicit.

SQL
DuckDB 1.32.0

Without the parentheses, amount > 100 AND region = 'West' OR region = 'East' would also return every East order regardless of amount — a classic, silent analytical bug.

Why analysts filter early

Filtering early is both a performance habit and a clarity habit:

  • Less data to process. A slice of 10,000 rows summarizes faster than a table of 10 million. On a column store, a tight WHERE lets the engine skip whole chunks.
  • Clearer reasoning. Once the data is reduced to "the rows this question is about," every later step — grouping, joining, averaging — is easier to think about.

The rule of thumb: slice down to what the question is about as soon as you can.

Check your understanding

QuestionSelect one

Which rows does amount BETWEEN 100 AND 200 keep?

Rows where amount is strictly between 100 and 200, excluding both.

Rows where amount is 100, 200, or anything in between (both endpoints included).

Only rows where amount equals exactly 100 or exactly 200.

No rows, because BETWEEN needs three values.

QuestionSelect one

Why is region IN ('West', 'East') preferred over region = 'West' OR region = 'East' for longer lists?

Because IN checks ranges while OR checks equality.

They are equivalent, but IN is more readable and scales cleanly as the list of values grows.

IN is faster because it ignores NULLs differently and returns more rows.

OR cannot be used with text columns.

QuestionSelect one

A query has WHERE amount > 100 AND region = 'West' OR region = 'East' with no parentheses. Why is this risky?

It is a syntax error and will not run.

AND binds tighter than OR, so it returns big West orders plus every East order, which is probably not intended.

It always returns zero rows.

OR is not allowed in a WHERE clause.

You can now carve any slice you need. Often the next move is to sort a slice and look at its top or bottom — the subject of the next page.

On this page