Dataslope logoDataslope

Sets

Unordered collections of unique, hashable items

A set is an unordered collection of unique, hashable items. Membership testing is O(1) on average, which makes sets the right tool for deduplication, fast lookups, and mathematical set operations. In data work, sets are everywhere: removing duplicate IDs, checking if a user is in an allowed list, finding the intersection of two tag lists, filtering out stop words, and more.

Every snippet on this page runs in your browser — no setup required.

How sets work

Like dicts, sets are implemented as hash tables. A set is essentially a dict with only keys (no values). This gives O(1) average-case membership testing, insertion, and deletion.

Sets and dicts share the same underlying implementation: hash tables. A set is like a dict where you only care about the keys. This is why both require hashable elements/keys and both give O(1) average-case lookups.

Creating sets

The literal syntax uses curly braces, but without colons.

Code Block
Python 3.13.2

Empty set syntax

{} is an empty dict, not an empty set. For an empty set, use set().

Code Block
Python 3.13.2

The set() constructor converts any iterable into a set, automatically deduplicating.

Code Block
Python 3.13.2
Code Block
Python 3.13.2

Adding and removing

.add inserts a single item.

Code Block
Python 3.13.2

Adding an item that already exists does nothing (sets are unique by definition).

Code Block
Python 3.13.2

.discard removes an item without raising an error if it is missing.

Code Block
Python 3.13.2

.remove removes an item but raises KeyError if it is missing.

Code Block
Python 3.13.2

.pop removes and returns an arbitrary element. The choice is unpredictable because sets are unordered.

Code Block
Python 3.13.2

Set algebra

This is the killer feature: classic mathematical set operations are built in with both methods and operators.

OperationOperatorMethod
Uniona | ba.union(b)
Intersectiona & ba.intersection(b)
Differencea - ba.difference(b)
Symmetric differencea ^ ba.symmetric_difference(b)
Subseta <= ba.issubset(b)
Superseta >= ba.issuperset(b)
Disjointn/aa.isdisjoint(b)
Code Block
Python 3.13.2
Code Block
Python 3.13.2

Real-world set operations

In data analysis, you constantly ask questions like "which users visited both pages?" (intersection), "which tags are in dataset A but not B?" (difference), "which emails appear in any of these lists?" (union). Sets make these queries one-liners.

Sets vs lists for lookup

This is where sets shine. Membership testing in a list is O(n); in a set it is O(1).

Code Block
Python 3.13.2

Rule of thumb: if you are checking membership more than once or twice, convert to a set first.

Items must be hashable

Same rule as dict keys: ints, strings, tuples of hashables, and frozensets are fine; lists and dicts are not.

Code Block
Python 3.13.2
Code Block
Python 3.13.2

The requirement is hashability, not immutability. A tuple containing a mutable object (like a list) is itself unhashable, even though tuples are normally immutable.

Code Block
Python 3.13.2

frozenset

If you need an immutable, hashable set (for example to use as a dict key or another set's element), use frozenset.

Code Block
Python 3.13.2
Code Block
Python 3.13.2

Challenges

Challenge
Python 3.13.2
Common interests

Define common_interests(a, b) that takes two lists of interest strings and returns a set of the interests that appear in both, case-insensitive.

Challenge
Python 3.13.2
Unique vowels

Define unique_vowels(text) that returns the set of lowercase vowels (a, e, i, o, u) that appear in text, ignoring case.

QuestionSelect one

What does {1, 2, 3} & {2, 3, 4} evaluate to?

{1, 2, 3, 4}

{2, 3}

{1, 4}

{1}

QuestionSelect one

Which of these creates an empty set?

{}

set()

[]

()

QuestionSelect one

What is the time complexity of item in s where s is a set?

O(1) average case

O(n)

O(log n)

O(n²)

We have now seen every major collection. Next: how to direct flow through them with conditionals and loops.

On this page