Data Structures

Hash Sets

● Beginner ⏱ 7 min read data-structures

A hash set is a hash table that stores only keys — no associated values. Python’s built-in set type is a hash set. Its key superpower is O(1) average-case membership testing, compared to O(n) for a list.

What Is a Hash Set?

A set is an unordered collection of unique hashable elements. It trades away ordering and duplicates in exchange for fast membership, insertion, and deletion.

Python
# Membership: O(1) for set vs O(n) for list
haystack_list = list(range(10_000_000))
haystack_set  = set(range(10_000_000))

# Both find the same answer, but set is vastly faster
print(9_999_999 in haystack_list)   # True — scans up to 10M elements
print(9_999_999 in haystack_set)    # True — one hash lookup

Creating Sets

Python
# Set literal — curly braces, no colons
fruits = {"apple", "banana", "cherry"}

# From an iterable — deduplicates automatically
nums = set([1, 2, 2, 3, 3, 3])
print(nums)    # {1, 2, 3}

# Empty set — must use set(), not {}
empty = set()  # {} would create an empty dict

# Set comprehension
squares = {x**2 for x in range(1, 6)}   # {1, 4, 9, 16, 25}

# Add and remove
fruits.add("date")
fruits.discard("banana")    # no error if missing
fruits.remove("apple")      # raises KeyError if missing
popped = fruits.pop()       # removes an arbitrary element

Set Operations

Python sets support all standard mathematical set operations using both operators and methods.

Python
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

# Union — all elements from both
print(a | b)              # {1, 2, 3, 4, 5, 6}
print(a.union(b))         # same

# Intersection — elements in both
print(a & b)              # {3, 4}
print(a.intersection(b))  # same

# Difference — in a but not b
print(a - b)              # {1, 2}
print(a.difference(b))    # same

# Symmetric difference — in either but not both
print(a ^ b)              # {1, 2, 5, 6}

# Subset / superset checks
print({1, 2} <= a)        # True  — is {1,2} a subset of a?
print(a >= {1, 2})        # True  — is a a superset of {1,2}?
print({1, 2}.isdisjoint({5, 6}))  # True — no shared elements
💡
Set operations on two sets of sizes m and n

Union and symmetric difference are O(m+n). Intersection and difference are O(min(m,n)). Subset check is O(m) where m is the smaller set.

frozenset

A frozenset is an immutable set — it cannot be modified after creation but can be used as a dict key or stored inside another set.

Python
fs = frozenset([1, 2, 3])
print(2 in fs)    # True

# Useful as a dict key (regular set is unhashable)
graph = {
    frozenset({"A", "B"}): 4,
    frozenset({"B", "C"}): 2,
}

# Supports all read-only set operations
print(fs | frozenset([4]))    # frozenset({1, 2, 3, 4})

Interview Patterns

Sets solve a surprising number of interview problems instantly by replacing slow list scans with O(1) lookups.

Python
# Pattern 1: Two-sum — O(n) instead of O(n²)
def two_sum(nums, target):
    seen = set()
    for n in nums:
        if target - n in seen:
            return True
        seen.add(n)
    return False

print(two_sum([2, 7, 11, 15], 9))   # True (2+7)

# Pattern 2: Remove duplicates while preserving order
def unique_ordered(lst):
    seen = set()
    return [x for x in lst if not (x in seen or seen.add(x))]

print(unique_ordered([3, 1, 4, 1, 5, 9, 2, 6, 5]))
# [3, 1, 4, 5, 9, 2, 6]

# Pattern 3: Longest consecutive sequence — O(n)
def longest_consecutive(nums):
    num_set = set(nums)
    best = 0
    for n in num_set:
        if n - 1 not in num_set:    # start of a sequence
            length = 1
            while n + length in num_set:
                length += 1
            best = max(best, length)
    return best

print(longest_consecutive([100, 4, 200, 1, 3, 2]))  # 4 (1-2-3-4)

Complexity Table

OperationAverageWorstNote
AddO(1)O(n)Amortized; resize is rare
Discard / RemoveO(1)O(n)
Membership (in)O(1)O(n)Worst case: many collisions
UnionO(m+n)O(m+n)
IntersectionO(min(m,n))O(min(m,n))
IterationO(n)O(n)Unordered