Fluent Python Chapter 3: Dictionaries and Sets
This chapter surveys the various mappings and sets available in the Standard library.
· 10 min read
Syntax #
Dictionary comprehensions #
dictcomps build dict
instances using key:value
pairs from any iterable.
For example:
>>> student_ids = [
(111, 'Yuki'),
(222, 'David'),
(333, 'Tom')
]
>>> nom_roll = (_id: name for _id, name in student_ids)
>>> nom_roll
{111: 'Yuki', 222: 'David', 333: 'Tom'}
The | operator for map merging #
Since Python 3.9, we are able to merge mapping using the |
and |=
operators.
>>> d1 = {"Apples": 2, "Oranges": 3}
>>> d2 = {"Apples": 1, "Oranges": 5, "Bananas": 3}
>>> d1 | d2
{'Apples': 1, 'Oranges': 5, 'Bananas': 3}
Note how the second appearance of “Oranges” overwrites the first.
Pattern matching with mappings #
The match/case
statament works with mapping patterns as well.
def get_ingredients(food: dict) -> list:
match food:
case {'type': 'Fast Food', 'api': 2, 'ingredients': [*ingredients]}:
return ingredients
case {'type': 'Fast Food', 'api': 1, 'ingredient': ingredient}:
return [ingredient]
case {'type': 'Fast Food'}:
raise ValueError(f"Invalid 'food' record: {food!r}")
case {'type': 'Japanese', 'ingredients': [*ingredients]}:
return ingredients
case _:
raise ValueError(f"Invalid record: {food!r}")
For api v2
, a food with type == 'Fast Food'
should have a list of ingredients.
For api v1
, a food with type == 'Fast Food'
should have a single ingredient.
A Japanese food should have a key ingredients
, which is a list of ingredients.
>>> f1 = dict(api=1, type='Fast Food', name='Hamburger', ingredient='Beef')
>>> get_ingredients(f1)
['Beef']
>>> f2 = {'type':'Fast Food', 'ingredient':'Beef'}
>>> get_ingredients(f2)
Traceback (most recent call last):
...
raise ValueError(f"Invalid 'food' record: {food!r}")
ValueError: Invalid 'food' record: {'type': 'Fast Food', 'ingredient': 'Beef'}
>>> f3 = "I am hungry"
>>> get_ingredients(f3)
Traceback (most recent call last):
...
raise ValueError(f"Invalid record: {food!r}")
ValueError: Invalid record: 'I am hungry'
Unlike sequence patterns, mapping patterns allow for partial matches. In the example above,
f1
has a key name
, which does not appear in any case
statement, yet it matches.
You can also capture extra key-value pairs using **
.
>>> food = {'type': 'Fast Food', 'name': 'Hamburger', 'price': '$10.00', 'calories': '700kca\
l'}
>>> match food:
... case {'type': 'Fast Food', **details}:
... print(f'Fast food details: {details}')
...
Fast food details: {'name': 'Hamburger', 'price': '$10.00', 'calories': '700kcal'}
Standard API of Mapping Types #
The collections.abc
module offers the Mapping
and MutableMapping
ABCs.
These serve as the standard interfaces for mappings, and as criteria for isinstance
tests in code.
>>> import collections.abc as abc
>>> d = {}
>>> isinstance(d, abc.Mapping)
True
>>> isinstance(d, abc.MutableMapping)
True
Using the isinstance
test with an ABC is often better than checking whether a function argument is of the concrete type dict
so that alternative mapping types can be used. (More details in Chapter 13)
Whem implementing a custom mapping, the author recommends extending collections.UserDict
or to wrap a dict
by composition, instead of subclassing these ABCs. All concrete mapping classes including collections.UserDict
encapsulate the basic dict
in their implementation, which means all keys must be hashable.
Note that in Python, an object is hashable if it has a hash code that never changes during its lifetime (implements a __hash__()
method), and can be compared to other objects (implements a __eq__()
method).
In practice, custom implementations of __hash__()
and __eq__()
should only take into account instance attributes that never change duirng the life of the object.
Common Mapping Methods #
See page 86 for a list of methods defined on dict
, collections.defaultdict
, and collections.OrderedDict
.
The way d.update(m)
handles m
is a good example of duck typing (If it quacks, it’s a duck).
If m
has a keys
method, it assumes m
is a mapping. else, it will iterate over m
, asssuming its items are (key, value)
form.
setdefault()
avoids redundant key look ups when we want to update the value of an item in place.
For example,
d.setdefault(key, []).append(new_value)
is equivalent to
if key not in d:
d[key] = []
d[key].append(new_value)
but the latter does at least two searches for the key
, three if it isn’t found, while setdefault
does it all with a single lookup.
Automatic handling of missing keys #
Sometimes we would like to have mappings that return some predetermined value when a missing key is searched. The author discusses two main approaches to achieve this.
Approach 1: defaultdict #
When using defaultdict
, we pass a callable to produce a default value when a missing key is searched using
the d[k]
syntax (which uses __getitem__
).
The callable is held in an instance attribute called default_factory
.
For example:
index = collections.defaultdict(list)
Here, if some key word
is not in the index
, the default_factory
is called to product an empty list, which is assigned to index[word]
and returned.
Note that if dd
is a defaultdict
and k
is a missing key, then dd[k]
will create a default value, but dd.get(k)
will not.
>>> import collections
>>> dd = collections.defaultdict(list)
>>> dd["key1"]
[]
>>> dd.get("key2")
>>> dd.get("key2") is None
True
Approach 2: __missing__ #
By providing a __missing__
method, the standard dict.__getitem__
(called via d[k]
), will call it whenever a key is not found.
See page 92 for an example of a custom class that implements __missing__
.
Also see page 94 for a comparison of how diffferent standard library mappings treat missing keys.
Variations of dict #
Here we list mapping types included in the standard library, besides defaultdict
collections.OrderedDict #
Since Python 3.6, the built-in dict
has been able to keep the keys ordered.
The author lists some differences between dict
and OrderedDict
, aside OrderedDict
being backward compatibele with earlier Python versions.
- the equality operation for
OrderedDict
checks for matching order. OrderedDict
was designed to be good at reordering operations - it can handle frequent reordering operations better thandict
.
See this for more info.
collections.ChainMap #
A ChainMap
holds references to each input mapping in the constructor call. Lookups will be performed on the input mappings in order, while insertions and updates are only affect the first input mapping.
>>> from collections import ChainMap
>>> d1 = dict(a=1, b=2)
>>> d2 = dict(a=3, b=4, c=5)
>>> chain = ChainMap(d1, d2)
>>> chain['a']
1
>>> chain['c']
5
>>> chain['c'] = -1
>>> d1
{'a': 1, 'b': 2, 'c': -1}
>>> d2
{'a': 3, 'b': 4, 'c': 5}
The author explains on page 96 how ChainMap
is useful when implementing interpreters for languages with nested scopes.
Also see Chapter 18 for a ChainMap
subclass used to implement an interpreter for a subset of Scheme.
collections.Counter #
Counter
maintains an integer count for each key.
For example,
>>> from collections import Counter
>>> ct = Counter('yukihide')
>>> ct
Counter({'i': 2, 'y': 1, 'u': 1, 'k': 1, 'h': 1, 'd': 1, 'e': 1})
>>> ct.update('yuki')
>>> ct
Counter({'i': 3, 'y': 2, 'u': 2, 'k': 2, 'h': 1, 'd': 1, 'e': 1})
>>> ct.most_common(3)
[('i', 3), ('y', 2), ('u', 2)]
Counter
can be used as a multi-set too - consider each key as an element in the set, and the count as the number of times it appears.
shelve.Shelf #
The shelve
module provides persistent storage for a mapping of string keys to Python objects serialised in the pickle
binary format.
(Why shelve
? Because pickle jars are stored on shelves, says the author.)
See more on the Python documentation.
Also see Ned Batchelder’s “Pickle’s nine flaws” for a discussion about the caveats of using pickle
Subclassing UserDict instead of Dict #
The author suggests the it’s ‘better to subclass UserDict
rather than dict
’.
“Subclassing Built-in Types is Tricky’ on page 492 describes the precise problem with subclassing dict
(and other built-in types), but essentially dict
has some implemention shortcuts that forces the user to override methods that can be inherited from UserDict
.
See page 98 for a comparision of two versions of a custom StrKeyDict
class - one that extends dict
and another extending colletions.UserDict
. The latter has a more succint implementation.
Immutable Mappings #
All mapping types provided by the standard library are mutable, but it may be important to prevent users from changing a mapping by accident.
The MappingProxyType
from the types
module can help to achieve this. Given a mapping, MappingProxyType
returns a mappingproxy
instance that is a read-only but dynamic view of the original mapping - this means that updates to the original mapping are reflected to the mappingproxy
, but updates cannot be made directly to the mappingproxy
.
See page 100 for an example.
Dictionary Views #
The dict
instance methods .keys()
, .values()
, .items()
return instances of read-only but dynamic views called dict_keys
, dict_values
, and dict_items
respectively.
They have less memory overhead compared to their counterparts in Python 2, which returned lists that duplicated data in the target dict.
An example usage:
>>> d = dict(a=1, b=2, c=3)
>>> values = d.values()
>>> values
dict_values([1, 2, 3])
>>> len(values)
3
>>> list(values)
[1, 2, 3]
>>> reversed(values)
<dict_reversevalueiterator object at 0x103362e30>
>>> values[0]
Traceback (most recent call last):
File "<python-input-48>", line 1, in <module>
values[0]
~~~~~~^^^
TypeError: 'dict_values' object is not subscriptable
>>> d['z'] = 99
>>> d
{'a': 1, 'b': 2, 'c': 3, 'z': 99}
>>> values
dict_values([1, 2, 3, 99])
Lastly, note that we cannot create a view object from scratch.
>>> values_class = type({}.values())
>>> v = values_class()
Traceback (most recent call last):
File "<python-input-53>", line 1, in <module>
v = values_class()
TypeError: cannot create 'dict_values' instances
See page 102 for some practical implications of how dict works. Some pointers are that
- dicts have a significant memory overhead compared to an array of pointers.
- hash tables (the underlying data structure of dicts) have a third of their space empty for efficiency.
Also see this for a deep dive into the internals of sets and dicts
Set Theory #
A set is a collection of unique objects. Set elements need to be hashable.
While the set
type is not hashable, its immutable counterpart, frozenset
, is.
The set types implements several infix set operations, such as union (|
), intersection(&
), and symmetric difference (^
).
Set Literals #
In Python 3, the syntax of set literals always uses the {...}
notation, except for the empty set which uses set()
.
>>> s = {'a'}
>>> type(s)
<class 'set'>
>>> s
{'a'}
>>> s.pop()
'a'
>>> s
set()
Set literals like {1,2,3}
is faster than calling the constructor like set([1,2,3])
because in the latter case, Python will look up the set
name to fetch the constructor, then build a list, and finally pass the list to the constructor.
On the other hand, in processing a literal like {1,2,3}
, Python executes a specialized BUILD_SET
bytecode.
Also try to import the dis
function and compare the disassembled bytecodes:
dis('{1}])
versusdis('set([1])')
Practical Consequences of How Sets Work #
Like dict
, the set
and frozenset
types are implemented with a hash table.
Thus, all set elements must be hashable.
Element ordering is not very useful, since for two elements with the same hash code, their position depends on which element was added first. Also adding elements to a set may change the order of existing elements due to table resizing.
See page 107 for more details, as well as this
Set Operations #
Page 108 ~ 109 lists the various operators supported by mutable and immutable sets derived from mathematical set theory, e.g. Intersection, Union, Difference, Symmetric Difference, Set Comparision, and Membership Testing.
Note that the infix operators require that both operans be sets, but all other methods only require the object its called on to be a set, and take in one or more iterable arguments.
For example,
>>> s = {1,2,3}
>>> l = [2,3]
>>> s.intersection(l)
{2, 3}
In addition, Python offers other pratical set methods, like .add()
, .clear()
, and .pop()
See page 110 for more information.
Set Operators on Dict Views #
Dict view objects returned by the .keys()
and .items()
methods share common APIs with frozenset
.
In particular dict_keys
and dict_items
support &
, |
, -
, and ^
.
For example,
>>> d1 = dict(a=1, b=2, c=3)
>>> d2 = dict(b=20, c=30, d=40)
>>> d1.keys() & d2.keys()
{'c', 'b'}
We get a set
as a return value. Also, the set operators in dict views work with set
instances.
>>> s = {'a', 'd', 'e'}
>>> d1.keys() & s
{'a'}
>>> d1.keys() | s
{'a', 'e', 'c', 'd', 'b'}
Finally, note that for a dict_values
view to support set operators, all values in the dict
need to be hashable.
>>> d3 = dict(a=1, b=[])
>>> d3.values() & {1}
Traceback (most recent call last):
File "<python-input-19>", line 1, in <module>
d3.values() & {1}
~~~~~~~~~~~~^~~~~
TypeError: unsupported operand type(s) for &: 'dict_values' and 'set'
In contrast, a dict_keys
view can always be used as a set since every key is hashable by definition.