Make More of Python Dictionary Keys with a Bidictionary

    By Dan Childs, Principal Technologist at BJSS

    Dan Childs

    The Python dictionary data structure is powerful. But to get the most out of it, you need to think about the data a dictionary stores and how you want to use it. In other words, as with all programming, you’ve got to think from the outside in, not the inside out. You’ve got to know what the point of all the data in the dictionary is, what it’s going to be used for, and how you want to structure and query it.

    With that in mind, let’s take a look at a data structure problem using Python dictionary keys and values, with a really practical example.

     

    what is a dictionary for?

    books-RX2JT8T

    First, a digression. Let’s think about the word ’‘literally'. That guy on the Zoom call who said he is ‘literally melting in this heat'. Is he in need of urgent medical attention, or is there a different meaning of this word?

    In the old days, to check the meaning of ‘literally’ you would pull out a dictionary, a meaty doorstep of knowledge. How would you find a single word in something that has 3,000 pages?

    One way would be to start at the first page, check all the words on that page, and then try the next page until you find the page with your word on it. But this would be crazy! There is a more efficient algorithm for using a paper dictionary. All the words are listed alphabetically, so you’d open the book roughly halfway, where you guess the ‘L’ section might be, flip back and forth a little until you get to words starting with ’‘Li', then look more closely for ‘literally’.

    The time difference between starting at page one and carefully scanning every word and flicking to where you know ‘literally’ will be in the dictionary is huge. Seconds versus many hours.

    Parallels in python

     

    This has parallels in Python data structures.

    If you try to find something in a List or an Array, you will progress through items in a linear way, looking for it.

    But a dictionary in computer science works in a similar way to a paper dictionary. It allows hopping back and forth within a small range rather than checking the whole data set because the program has a good idea of how data is organised.

    The dictionary-based approach provides incredibly fast lookups because data is keyed on a particular item, and the keys are organised so that data can be looked up in a few hops.

    Here’s the thing, though. In all data structures, there are trade-offs. Some might increase memory usage, some might reduce the speed with which new data can be added, and some will allow one way of looking at things but exclude other ways. Let’s look at a real-world example.

    The dictionary flip is a flop

    Let’s say you’re writing a Python program to help you remember the periodic table. You start by setting out three elements and their symbols. This seems like a perfect fit for a dictionary:

    >>> elements = {"O": "oxygen", "H": "hydrogen", "C": "carbon"}

    >>> elements["O"]

    ’‘oxygen'

    This setup will work if you want to find out what ‘O’ stands for. But it won’t let you find out what symbol is used for ‘oxygen’. So, you might consider flipping the dictionary. This swaps the data around so that Keys become Values and Values become Keys. This isn’t a built-in Python feature, but you can always try Stack Overflow for advice. Try this:

    flipped_dictionary = {v: k for k, v in original_dictionary.items()}

    In essence, this creates a new dictionary by iterating through every item in the dictionary and adding a new record, where the Value becomes the Key, and the Key becomes the Value.

    >>> flipped_elements = {v: k for k, v in elements.items()}          

    >>> flipped_elements["oxygen"]

    'O'

    This will allow you to find the symbol for ‘Oxygen’.

    But if we go back to the physical dictionary analogy, this is the equivalent of going through page by page and copying out each item, swapping the definition for the word as you go. It is cumbersome and messy. In computing, if something is cumbersome and messy, it feels wrong. If something feels wrong, it probably is wrong, and there is a better solution.

    Let’s look at the problem again.

    We need a bidictionary

    The problem before us in this example is to be able to look up what ‘O’ stands for in the Periodic table and look up what the symbol is for ‘Oxygen’. Two lookups. In other words, we want to look items up by both Key and Value.

    The neatest, and most effective way to achieve this is to use a data structure called a bidictionary, or a bimap.

    In Python, somebody has already built one for us. Here’s how to use it:

    >>> From bidict import bidict

    >>> element_by_symbol = bidict({"O": "oxygen", "H": "hydrogen", "C": "carbon"})

    >>> element_by_symbol['O']

    'oxygen'

    >>> element_by_symbol.inverse['oxygen']

    'O'

    Now we have what we wanted, without the extra overhead of looping through a dictionary, copying everything out again and trying to keep it all synchronised.

    It is literally awesome.

    Lead-magnet-recruitment