+1

Dictionary Merge and Update Operators

Mayfest2023

Python 3.9 was released on Oct. 5, 2020 and it introduces some neat features and optimizations including PEP 584, Union Operators in the built-in class dict; the so-called Dictionary Merge and Update Operators. In this blog post we will go over the new operators to see if there are any advantages or disadvantages of using them over the earlier ways of merging and updating dictionaries.

Different Ways to Merge Dictionaries

1. dict.update()

d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d1.update(d2)
print (d1)
#Output:{'a': 1, 'b': 9999, 'c': 3}

d1.update(d2) update the dictionary d1 with the key/value pairs from d2, overwriting existing keys, return None. - python docs

But the problem when we use the update() method is that it modifies one of the dictionaries. If we wish to create a third dictionary without modifying any of the other dictionaries, we cannot use this method, you would have to make a copy of one of your existing dictionaries first.

d1={'a':1,'b':2}
d2={'c':3,'b':9999}
from copy import copy
d3=copy(d1)
d3.update(d2)
print (d3)
#Output:{'a': 1, 'b': 9999, 'c': 3}

Also, you can only use this method to merge two dictionaries at a time. If you wish to merge three dictionaries, you first need to merge the first two, and then merge the third one with the modified dictionary.

d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d3={'e':4,'f':[1, 3]}
d1.update(d2)
d1.update(d3)
#Output:{'a':1, 'b':9999, 'c':3, 'e':4,'f':[1, 3]}

2. Dictionary unpacking

d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d3={**d1,**d2}
print (d3)
#Output:{'a': 1, 'b': 9999, 'c': 3}

A double asterisk ** denotes dictionary unpacking.

It will expand the contents of dictionaries d1 and d2 as a collection of key-value pairs and update the dictionary d3. - python docs

However, {**d1, **d2} ignores the types of the mappings and always returns a dict.type(d1) ({**d1, **d2}) fails for dict subclasses such as defaultdict that have an incompatible __ init __ method:

from collections import defaultdict
d1 = defaultdict(None, {0: 'a'})
d2 = defaultdict(None, {1: 'b'})
{**d1, **d2}
#Output: {0: 'a', 1: 'b'}

This way of merging two dictionaries feels unnatural and hardly obvious.

As Guido Van Rossum said:

I’m sorry for PEP 448, but even if you know about **d in simpler contexts, if you were to ask a typical Python user how to combine two dicts into a new one, I doubt many people would think of {**d1, **d2}. I know I myself had forgotten about it when this thread started!

3. collections.ChainMap

d1={'a':1,'b':2}
d2={'c':3,'b':9999}
from collections import ChainMap
d3=ChainMap(d1,d2)
print (d3)
#Output:ChainMap({'a': 1, 'b': 2}, {'c': 3, 'b': 9999})
print (dict(d3))
#Output:{'c': 3, 'b': 2, 'a': 1}

chainmap: A ChainMap groups multiple dictionaries or other mappings together to create a single, updateable view.- python docs

collections.ChainMap(maps) return type is collections.ChainMap. We can convert to dict using the dict() constructor. ChainMap is unfortunately poorly-known and doesn’t qualify as “obvious”. It also resolves duplicate keys in the opposite order to that expected (“first seen wins” instead of “last seen wins”).

Like dictionary unpacking{**d1,**d2}, It also ignores the types of mappings and always returns a dict. For the same reason,type(d1) (ChainMap(d2, d1)) fails for some subclasses of dict.

It probably is even less straightforward than the previous two methods and unfortunately modifies the underlying dictionaries if you update the ChainMap object:

d1 = {'a': 1, 'b': 2}
d2 = {'b': 3}
from collections import ChainMap
d3 = ChainMap(d1, d2)
d3
ChainMap({'a': 1, 'b': 2}, {'b': 3})
d3['b'] = 4
d3
ChainMap({'a': 1, 'b': 4}, {'b': 3})
d1
{'a': 1, 'b': 4}
d2
{'b': 3}

4. dict(d1,d2)

d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d3=dict(d1,**d2)
print (d3)
#Output:{'a': 1, 'b': 9999, 'c': 3}

d3 will contain key-value pairs from d1 and d2. Keys that are common in d1 and d2 will contain values from d2. However, this only works for dictionaries that have all keys of type string:

d1={'a':1,'b':2}
d2={'a':99,1:3}
d3=dict(d1,**d2)
print (d3)
#Output:TypeError: keywords must be strings

New Ways Introduced in Python 3.9

Two union operators, merge | and update |=, have been introduced for dict.

The Dictionary Merge Operator

If you want to create a new dict based on two dictionaries you already have, you can do the following:

d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d3=d1|d2
print (d3)
#Output:{'a': 1, 'b': 9999, 'c': 3}

“Dict union will return a new dict consisting of the left operand merged with the right operand, each of which must be a dict (or an instance of a dict subclass). If a key appears in both operands, the last-seen value (i.e. that from the right-hand operand) wins.”

To demonstrate the usefulness of the merge operator,|, let's take a look at the following example using defaultdict:

from collections import defaultdict

user_not_found_message = 'Could not find any user matching the specified user id.'

ceo = defaultdict(
    lambda: user_not_found_message,
    {'id': 1, 'name': 'Jose', 'title': 'Instructor'}
)

author = defaultdict(
      lambda: user_not_found_message,
      {'id': 2, 'name': 'Vlad', 'title': 'Teaching Assistant'}
)

By using the double asterisk, **, merging the two dictionaries will work, but the method is not aware of the class object so we will end up with a traditional dictionary instead:

print({**author, **ceo})
# {'id': 2, 'name': 'Jose', 'title': 'Author', 'title': 'Instructor'}

The power of the merge operator | is that it is aware of the class objects. As such, a defaultdict will be returned:

print(author | ceo)
# defaultdict(<function <lambda> at 0x000002212125DE50>, {'id': 2, 'name': 'Jose', 'title': 'Instructor'})

The Dictionary Update Operator

d1|=d2 will modify d1 in place. It also accepts anything implementing the Mapping protocol (more specifically, anything with the keys and getitem methods) or iterables of key-value pairs. Compared to dict.update, we can achieve the same functionality with a cleaner syntax:

d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d1|=d2
print (d1)
#Output:{'a': 1, 'b': 9999, 'c': 3}

Just a slight difference in code execution time!

timeit.timeit("d1={'a':1,'b':2}; d2={'c':3,'b':9999}; res = dict(d1); res.update(d2)", number=10000) 0.0034674000926315784

timeit.timeit("d1={'a':1,'b':2}; d2={'c':3,'b':9999}; res = {**d1,**d2}", number=10000) 0.0028335999231785536

timeit.timeit("d1={'a':1,'b':2}; d2={'c':3,'b':9999}; res = d1 | d2", number=10000) 0.0027796999784186482

Summary

The new operators are not here to replace the existing ways of merging and updating, but rather to complement them. Some of the major takeaways are:

  • the merge operator, |, is class aware, offers a better syntax and it creates a new object.
  • the update operator, |=, operates in-place, catches common errors before they happen and it doesn't create a new object.
  • the operators are new features in Python 3.9

Resources


All rights reserved

Viblo
Hãy đăng ký một tài khoản Viblo để nhận được nhiều bài viết thú vị hơn.
Đăng kí