Sets
Unordered collections of unique, hashable items
A set is an unordered collection of unique, hashable items. Membership testing is O(1) on average, which makes sets the right tool for deduplication, fast lookups, and mathematical set operations. In data work, sets are everywhere: removing duplicate IDs, checking if a user is in an allowed list, finding the intersection of two tag lists, filtering out stop words, and more.
Every snippet on this page runs in your browser — no setup required.
How sets work
Like dicts, sets are implemented as hash tables. A set is essentially a dict with only keys (no values). This gives O(1) average-case membership testing, insertion, and deletion.
Sets and dicts share the same underlying implementation: hash tables. A set is like a dict where you only care about the keys. This is why both require hashable elements/keys and both give O(1) average-case lookups.
Creating sets
The literal syntax uses curly braces, but without colons.
Empty set syntax
{} is an empty dict, not an empty set. For an empty set, use set().
The set() constructor converts any iterable into a set, automatically deduplicating.
Adding and removing
.add inserts a single item.
Adding an item that already exists does nothing (sets are unique by definition).
.discard removes an item without raising an error if it is missing.
.remove removes an item but raises KeyError if it is missing.
.pop removes and returns an arbitrary element. The choice is unpredictable because sets are unordered.
Set algebra
This is the killer feature: classic mathematical set operations are built in with both methods and operators.
| Operation | Operator | Method |
|---|---|---|
| Union | a | b | a.union(b) |
| Intersection | a & b | a.intersection(b) |
| Difference | a - b | a.difference(b) |
| Symmetric difference | a ^ b | a.symmetric_difference(b) |
| Subset | a <= b | a.issubset(b) |
| Superset | a >= b | a.issuperset(b) |
| Disjoint | n/a | a.isdisjoint(b) |
Real-world set operations
In data analysis, you constantly ask questions like "which users visited both pages?" (intersection), "which tags are in dataset A but not B?" (difference), "which emails appear in any of these lists?" (union). Sets make these queries one-liners.
Sets vs lists for lookup
This is where sets shine. Membership testing in a list is O(n); in a set it is O(1).
Rule of thumb: if you are checking membership more than once or twice, convert to a set first.
Items must be hashable
Same rule as dict keys: ints, strings, tuples of hashables, and frozensets are fine; lists and dicts are not.
The requirement is hashability, not immutability. A tuple containing a mutable object (like a list) is itself unhashable, even though tuples are normally immutable.
frozenset
If you need an immutable, hashable set (for example to use as a dict key or another set's element), use frozenset.
Challenges
Define common_interests(a, b) that takes two lists of interest strings and returns a set of the interests that appear in both, case-insensitive.
Define unique_vowels(text) that returns the set of lowercase vowels (a, e, i, o, u) that appear in text, ignoring case.
What does {1, 2, 3} & {2, 3, 4} evaluate to?
{1, 2, 3, 4}
{2, 3}
{1, 4}
{1}
Which of these creates an empty set?
{}
set()
[]
()
What is the time complexity of item in s where s is a set?
O(1) average case
O(n)
O(log n)
O(n²)
We have now seen every major collection. Next: how to direct flow through them with conditionals and loops.