2.10. Sets¶
A set is an unordered collection of unique items. Adding a value that is already present has no effect; iteration yields each value exactly once. Sets are the right tool when membership and de-duplication matter, and ordering does not.
2.10.1. Creating a set¶
Use curly braces for a non-empty set, or set() for an empty one:
colours = {"red", "green", "blue"}
empty = set()
The braces look like a dict literal; {} on its own is an empty dict, not an empty set – one of Python’s historical accidents. Use set() for the empty case.
set() also builds a set from any iterable, which is the standard way to drop duplicates from a sequence:
nums = [1, 2, 2, 3, 1, 4]
unique = set(nums)
print(unique)
Output:
{1, 2, 3, 4}
The print order may vary – sets do not promise to iterate in any particular order.
2.10.2. Set vs dict¶
Sets and dicts both store unique items in a hash table. What each item carries with it is the difference:
A
dictstores key-value pairs. Looking up a key returns its value.A
setstores just the items. Looking up an item tells you whether it is there.
The choice between the two is about whether the value alongside each item means anything:
Reach for a set when no value belongs next to each item – you only care whether the item is present, or you are combining groups of unique items with union / intersection.
Reach for a dict when each item is paired with data the lookup is meant to retrieve – a config map, a cache, a counter keyed by name.
The two types share a lot of surface syntax, which is where most of the confusion comes from. The differences in one block:
set | dict | |
|---|---|---|
holds | unique items | unique keys, each with a value |
populated literal |
|
|
empty literal |
|
|
membership test |
|
|
fetch a value | n/a |
|
add an item |
|
|
iterate | yields items | yields keys (use |
The asymmetry between the populated and empty literals is the gotcha worth calling out:
Braces with items in them –
{1, 2, 3}– are a set literal; braces with key-value pairs –{"a": 1}– are a dict literal. The parser tells them apart by what is inside.Braces with nothing inside –
{}– are an empty dict, not an empty set. Dicts came first; the empty literal belongs to them. An empty set has no braces literal at all and must be writtenset().
A common pattern when only the keys of a dict are ever read is to switch to a set – it makes the intent obvious and trims the unused values out of memory.
2.10.3. Adding and removing¶
set.add()– insert one item.set.discard()– remove an item if it is present, do nothing if it is not.set.remove()– remove an item; raiseKeyErrorif it is missing.set.clear()– empty the set.
s = {1, 2, 3}
s.add(4)
s.discard(99) # silent: 99 not in s
s.remove(2)
print(s)
Output:
{1, 3, 4}
2.10.4. Membership¶
The in operator tests for membership. On a set it is roughly constant time regardless of size – which is the main reason to choose a set over a list when you only need to ask “is this value in there”:
if "red" in colours:
print("colour is allowed")
A list with the same contents would scan from the start each time, which is fine for ten items but slow for ten thousand.
2.10.5. Set operations¶
Two sets can be combined with the usual mathematical operations. Each has both an operator form and a method form:
a | bora.union(b)– everything in either set.a & bora.intersection(b)– only what appears in both.a - bora.difference(b)– inabut not inb.a ^ bora.symmetric_difference(b)– in one but not both.
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print(a | b)
print(a & b)
print(a - b)
print(a ^ b)
Output:
{1, 2, 3, 4, 5, 6}
{3, 4}
{1, 2}
{1, 2, 5, 6}
The operator forms are read-only; the method forms accept any iterable on the right, not just another set (a.union([5, 6])). Pick whichever reads better in context.
2.10.6. What can go in a set¶
Set elements must be hashable – the same constraint as dict keys. int, float, str, bool, bytes, and tuple (when its contents are themselves hashable) all work. list and dict do not; trying to add one raises TypeError.
2.10.7. frozenset¶
A regular set is mutable: every call to add / remove / discard changes the object in place. That mutability disqualifies it from being hashable, so a set cannot be used as a dict key or as a member of another set.
frozenset is the immutable counterpart. It has the same lookups and operators (in, |, &, -, ^) as set, but no add / remove and no methods that mutate. Because nothing can ever change its contents, the hash of a frozenset is well-defined – so it is hashable:
primary = frozenset({"red", "green", "blue"})
secondary = frozenset({"yellow", "purple", "orange"})
palettes = {
primary: "RGB",
secondary: "mixed",
}
print(palettes[primary])
Output:
RGB
Construct a frozenset from any iterable – frozenset() for the empty case, frozenset(some_set) to take an immutable snapshot of an existing set:
snapshot = frozenset(s) # immutable copy of s
s.add("new") # snapshot does not change
Two common reasons to reach for it:
Use as a dict key or set member. Anywhere a single value cannot capture what you need, a
frozensetof values can – “the set of features supported by this driver”, “the set of pins this profile uses”.Lock down a constant. A module-level
frozensetof allowed names cannot be accidentally mutated by a caller; a regularsetcan. Preferfrozensetfor anything that is meant to be read-only after construction.