Looping through Dict with SetDefault

There’s this great Pythonista, Jeff Knupp, (who offers mentorship, but ignores my emails), that actually lives in Hoboken, where I grew up.

But all of that is beside the point. Like a lot of programmers, he seems to enjoy making newbies like myself contort our brows.

He’s got a great tutorial that explains referencing in Python. And typically, his examples include challenging side details. In the following example he populates a dictionary’s keys and default values with a loop using Python’s .setdefault method.


def add_to_tree(root, value_string):
    """Given a string of characters `value_string`, create or update a
    series of dictionaries where the value at each level is a dictionary of
    the characters that have been seen following the current character.

    Example:
    >>> my_string = 'abc'
    >>> tree = {}
    >>> add_to_tree(tree, my_string)
    >>> print(tree['a']['b'])
    {'c': {}}
    >>> add_to_tree(tree, 'abd')
    >>> print(tree['a']['b'])
    {'c': {}, 'd': {}} 
    >>> print(tree['a']['d'])
    KeyError 'd'
    """
    print("Reference id of tree is: {}".format(id(tree)))   
    print("Root before .setdefault call matches tree: {}".format(root))
    for character in value_string:
        print("Root is assigned a new ref id for each round: {}".format(id(root)))
        root = root.setdefault(character, {})
        print("Now root is: {}".format(root))
        print("Now tree is: {}".format(tree))

The above version has a bunch of added print statements, ’cause I’m a neanderthal and need lots of concrete examples to be able to see what’s going on.

We have to remember (or look up) exactly what .setdefault does:

setdefault(key[, default])

If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

So when we look at the results of our printout:


In [6]: add_to_tree(tree, my_string)
Reference id of tree is: 4327475192
Root before .setdefault call matches tree: {}
Root is assigned a new ref id for each round: 4327475192
Now root is: {}
Now tree is: {'a': {}}
Root is assigned a new ref id for each round: 4327544544
Now root is: {}
Now tree is: {'a': {'b': {}}}
Root is assigned a new ref id for each round: 4327474352
Now root is: {}
Now tree is: {'a': {'b': {'c': {}}}}

In [7]: add_to_tree(tree, 'abd')
Reference id of tree is: 4327475192
Root before .setdefault call matches tree: {'a': {'b': {'c': {}}}}
Root is assigned a new ref id for each round: 4327475192
Now root is: {'b': {'c': {}}}
Now tree is: {'a': {'b': {'c': {}}}}
Root is assigned a new ref id for each round: 4327544544
Now root is: {'c': {}}
Now tree is: {'a': {'b': {'c': {}}}}
Root is assigned a new ref id for each round: 4327474352
Now root is: {}
Now tree is: {'a': {'b': {'c': {}, 'd': {}}}}

So the first time through, .setdefault{}) each time because the key (‘a’, ‘b’, and ‘c’) is not found, and inserts that value ({}), assigned to the key, submitted (argument 1).

But wait! Why does the function produce {'a': {'b': {'c': {}, 'd': {}}}} and not {'a': {}, 'b': {}, 'c': {}}?

Because we’re looping within a function and reassigning the the result to root each time, it’s kind of like in the TV infomercials where they keep saying, “but wait! there’s more!”. So when .setdefault gets called on 'a', before that gets returned, it’s result, {'a': {}} is held {inside the loop} while it’s run on 'b', which yields {'b': {}}, within {'a': {}} and that is held to the side and {'a': {}} is run, then the whole thing is returned from the loop and applied to tree. Note that each time, what is actually returned by .setdefault IS the default, which in this case is {}. Check this out from the Python Visualizer:

What’s returned by .setdefault the second time through (when ‘abd’ is argued) is a little different because the first two times the key is in the dictionary, so .setdefault does nothing to the dictionary, but returns it’s value.

Also note the use of the id() function showing that that root is rebound to a new reference number each time through the loop.