5.10. Sets¶
A set is an unordered collection of unique items. Adding a value that is already present has no effect; iteration yields each value exactly once. Sets are the right tool when membership and de-duplication matter, and ordering does not.
5.10.1. Creating a set¶
Use curly braces for a non-empty set, or set() for an
empty one:
colours = {"red", "green", "blue"}
empty = set()
The braces look like a dict literal; {} on its own
is an empty dict, not an empty set – one of Python’s
historical accidents. Use set() for the empty case.
set() also builds a set from any iterable, which is the
standard way to drop duplicates from a sequence:
nums = [1, 2, 2, 3, 1, 4]
unique = set(nums)
print(unique)
Output:
{1, 2, 3, 4}
The print order may vary – sets do not promise to iterate in any particular order.
5.10.2. Set vs dict¶
Sets and dicts both store unique items in a hash table. What each item carries with it is the difference:
A
dictstores key-value pairs. Looking up a key returns its value.A
setstores just the items. Looking up an item tells you whether it is there.
The choice between the two is about whether the value alongside each item means anything:
Reach for a set when no value belongs next to each item – you only care whether the item is present, or you are combining groups of unique items with union / intersection.
Reach for a dict when each item is paired with data the lookup is meant to retrieve – a config map, a cache, a counter keyed by name.
The two types share a lot of surface syntax, which is where most of the confusion comes from. The differences in one block:
set |
dict |
|
|---|---|---|
holds |
unique items |
unique keys, each with a value |
populated literal |
|
|
empty literal |
|
|
membership test |
|
|
fetch a value |
n/a |
|
add an item |
|
|
iterate |
yields items |
yields keys (use
|
The asymmetry between the populated and empty literals is the gotcha worth calling out:
Braces with items in them –
{1, 2, 3}– are a set literal; braces with key-value pairs –{"a": 1}– are a dict literal. The parser tells them apart by what is inside.Braces with nothing inside –
{}– are an empty dict, not an empty set. Dicts came first; the empty literal belongs to them. An empty set has no braces literal at all and must be writtenset().
A common pattern when only the keys of a dict are ever read is to switch to a set – it makes the intent obvious and trims the unused values out of memory.
5.10.3. Adding and removing¶
set.add()– insert one item.set.discard()– remove an item if it is present, do nothing if it is not.set.remove()– remove an item; raiseKeyErrorif it is missing.set.clear()– empty the set.
s = {1, 2, 3}
s.add(4)
s.discard(99) # silent: 99 not in s
s.remove(2)
print(s)
Output:
{1, 3, 4}
5.10.4. Membership¶
The in operator tests for membership. On a set it is
roughly constant time regardless of size – which is the main
reason to choose a set over a list when you only need
to ask “is this value in there”:
if "red" in colours:
print("colour is allowed")
A list with the same contents would scan from the
start each time, which is fine for ten items but slow for ten
thousand.
5.10.5. Set operations¶
Two sets can be combined with the usual mathematical operations. Each has both an operator form and a method form:
a | bora.union(b)– everything in either set.a & bora.intersection(b)– only what appears in both.a - bora.difference(b)– inabut not inb.a ^ bora.symmetric_difference(b)– in one but not both.
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print(a | b)
print(a & b)
print(a - b)
print(a ^ b)
Output:
{1, 2, 3, 4, 5, 6}
{3, 4}
{1, 2}
{1, 2, 5, 6}
The operator forms are read-only; the method forms accept any
iterable on the right, not just another set
(a.union([5, 6])). Pick whichever reads better in context.
5.10.6. What can go in a set¶
Set elements must be hashable – the same constraint as
dict keys. int, float, str,
bool, bytes, and tuple (when its
contents are themselves hashable) all work. list and
dict do not; trying to add one raises
TypeError.
5.10.7. frozenset¶
A regular set is mutable: every call to add /
remove / discard changes the object in place. That
mutability disqualifies it from being hashable, so a set
cannot be used as a dict key or as a member of
another set.
frozenset is the immutable counterpart. It has the
same lookups and operators (in, |, &, -, ^)
as set, but no add / remove and no methods
that mutate. Because nothing can ever change its contents, the
hash of a frozenset is well-defined – so it is
hashable:
primary = frozenset({"red", "green", "blue"})
secondary = frozenset({"yellow", "purple", "orange"})
palettes = {
primary: "RGB",
secondary: "mixed",
}
print(palettes[primary])
Output:
RGB
Construct a frozenset from any iterable – frozenset() for
the empty case, frozenset(some_set) to take an immutable
snapshot of an existing set:
snapshot = frozenset(s) # immutable copy of s
s.add("new") # snapshot does not change
Two common reasons to reach for it:
Use as a dict key or set member. Anywhere a single value cannot capture what you need, a
frozensetof values can – “the set of features supported by this driver”, “the set of pins this profile uses”.Lock down a constant. A module-level
frozensetof allowed names cannot be accidentally mutated by a caller; a regularsetcan. Preferfrozensetfor anything that is meant to be read-only after construction.