Hash Sets
A hash set is a hash table that stores only keys — no associated values. Python’s built-in set type is a hash set. Its key superpower is O(1) average-case membership testing, compared to O(n) for a list.
What Is a Hash Set?
A set is an unordered collection of unique hashable elements. It trades away ordering and duplicates in exchange for fast membership, insertion, and deletion.
# Membership: O(1) for set vs O(n) for list
haystack_list = list(range(10_000_000))
haystack_set = set(range(10_000_000))
# Both find the same answer, but set is vastly faster
print(9_999_999 in haystack_list) # True — scans up to 10M elements
print(9_999_999 in haystack_set) # True — one hash lookup
Creating Sets
# Set literal — curly braces, no colons
fruits = {"apple", "banana", "cherry"}
# From an iterable — deduplicates automatically
nums = set([1, 2, 2, 3, 3, 3])
print(nums) # {1, 2, 3}
# Empty set — must use set(), not {}
empty = set() # {} would create an empty dict
# Set comprehension
squares = {x**2 for x in range(1, 6)} # {1, 4, 9, 16, 25}
# Add and remove
fruits.add("date")
fruits.discard("banana") # no error if missing
fruits.remove("apple") # raises KeyError if missing
popped = fruits.pop() # removes an arbitrary element
Set Operations
Python sets support all standard mathematical set operations using both operators and methods.
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
# Union — all elements from both
print(a | b) # {1, 2, 3, 4, 5, 6}
print(a.union(b)) # same
# Intersection — elements in both
print(a & b) # {3, 4}
print(a.intersection(b)) # same
# Difference — in a but not b
print(a - b) # {1, 2}
print(a.difference(b)) # same
# Symmetric difference — in either but not both
print(a ^ b) # {1, 2, 5, 6}
# Subset / superset checks
print({1, 2} <= a) # True — is {1,2} a subset of a?
print(a >= {1, 2}) # True — is a a superset of {1,2}?
print({1, 2}.isdisjoint({5, 6})) # True — no shared elements
Union and symmetric difference are O(m+n). Intersection and difference are O(min(m,n)). Subset check is O(m) where m is the smaller set.
frozenset
A frozenset is an immutable set — it cannot be modified after creation but can be used as a dict key or stored inside another set.
fs = frozenset([1, 2, 3])
print(2 in fs) # True
# Useful as a dict key (regular set is unhashable)
graph = {
frozenset({"A", "B"}): 4,
frozenset({"B", "C"}): 2,
}
# Supports all read-only set operations
print(fs | frozenset([4])) # frozenset({1, 2, 3, 4})
Interview Patterns
Sets solve a surprising number of interview problems instantly by replacing slow list scans with O(1) lookups.
# Pattern 1: Two-sum — O(n) instead of O(n²)
def two_sum(nums, target):
seen = set()
for n in nums:
if target - n in seen:
return True
seen.add(n)
return False
print(two_sum([2, 7, 11, 15], 9)) # True (2+7)
# Pattern 2: Remove duplicates while preserving order
def unique_ordered(lst):
seen = set()
return [x for x in lst if not (x in seen or seen.add(x))]
print(unique_ordered([3, 1, 4, 1, 5, 9, 2, 6, 5]))
# [3, 1, 4, 5, 9, 2, 6]
# Pattern 3: Longest consecutive sequence — O(n)
def longest_consecutive(nums):
num_set = set(nums)
best = 0
for n in num_set:
if n - 1 not in num_set: # start of a sequence
length = 1
while n + length in num_set:
length += 1
best = max(best, length)
return best
print(longest_consecutive([100, 4, 200, 1, 3, 2])) # 4 (1-2-3-4)
Complexity Table
| Operation | Average | Worst | Note |
|---|---|---|---|
| Add | O(1) | O(n) | Amortized; resize is rare |
| Discard / Remove | O(1) | O(n) | |
Membership (in) | O(1) | O(n) | Worst case: many collisions |
| Union | O(m+n) | O(m+n) | |
| Intersection | O(min(m,n)) | O(min(m,n)) | |
| Iteration | O(n) | O(n) | Unordered |