Floating Point Keys in Python: Hashing using Modular Division

Like most other scripting languages Python comes with a built-in dictionary type based on hash tables. Unlike other languages the rules for hashing different types are not baked into the language definition or the dictionary implementation. Instead, the object-oriented nature of Python is leveraged to allow types, including built-in types, to define their own hash functions. Objects are said to be hashable if they define a hash function and a matching comparison operator.

As we discussed in the introductory post of this series, the hash function and comparison operator cannot be defined independently: hash tables only work correctly if any two objects that are “equal” also have the same hash key. This poses a special challenge for numeric types: Python can represent numbers in a variety of ways, including arbitrary-precision integers, rationals, binary and decimal floating point numbers, or complex numbers. Since all of these number types can be compared to each other, each type must have a hash function that ensures that numbers that are numerically equal also map to the same hash key. For example, the number “two” should have the same hash key, regardless of whether it’s stored as an integer “2”, as a floating point number “2.0”, as a fraction “4 / 2”, or as a complex number “2.0 + 0.0i”.

In the previous article we saw how Lua solves this problem by switching between different hash function on the fly: integral floating point numbers and integers use one hash function and fractional numbers another. Python takes a different approach: it defines a unified hash function for all numeric types. In this article I will explain how this hash function works and how it can be computed efficiently for the special case of binary floating point numbers. The resulting algorithm is an ingenious demonstration of how to combine floating point arithmetic, modular arithmetic, and bit operations. Let’s find out how it works!

Read More »