Map which doesn’t convert all keys to strings first. But
Map still treats floating point keys differently than all the other hash tables we have seen so far: it defines a custom comparison operator for floating point numbers that differs from
===. One important property of this comparison operator is that it avoids the problems with NaN keys we saw with other programming languages, but at the cost of introducing yet another notion of equality to the language.
Let’s take a look at
Map hashing in turn.
Read More »
The Go Programming Language has a built-in map datatype that is implemented using hash tables. The only types that can be used as map keys are those for which the language defines an equality operator
==; this includes primitive types like Booleans, strings, numbers, and pointers, but also composite types like arrays and structures. For all these types Go automatically generates a suitable hash function. Maps are not extensible, however: if you need a hash table that supports custom comparison or hash algorithms you have to write your own.
Floating point numbers can be used as map keys, and since Go follows the normal conventions for comparison, so maps treat all NaN keys as distinct since NaN ≠ NaN. As a consequence NaN keys can be inserted into maps but are unretrievable; in fact, any number of NaN keys can be inserted, and because they are all distinct it’s possible to end up with a map in which every entry is unretrievable. Normally this would be merely a curiosity, an interesting side effect of the quirky semantics of floating point numbers. But this curiosity can lead to an exploitable performance issue if the hash function being used maps every NaN to a single hash code. In this case all the NaN keys collide in the hash table and the runtime cost of inserting n of them drops to O(n2), down from the O(n) we would expect for uniformly distributed hash codes.
Overloading servers by causing hash table collisions has become a a real security threat in recent years. Many languages and hashing algorithms were found to be susceptible to these so-called hash-flooding attacks and had to be updated. In the following I will discuss how Go prevents hash-flooding attacks, in particular for the special of case floating point keys.
Read More »
Like most other scripting languages Python comes with a built-in dictionary type based on hash tables. Unlike other languages the rules for hashing different types are not baked into the language definition or the dictionary implementation. Instead, the object-oriented nature of Python is leveraged to allow types, including built-in types, to define their own hash functions. Objects are said to be hashable if they define a hash function and a matching comparison operator.
As we discussed in the introductory post of this series, the hash function and comparison operator cannot be defined independently: hash tables only work correctly if any two objects that are “equal” also have the same hash key. This poses a special challenge for numeric types: Python can represent numbers in a variety of ways, including arbitrary-precision integers, rationals, binary and decimal floating point numbers, or complex numbers. Since all of these number types can be compared to each other, each type must have a hash function that ensures that numbers that are numerically equal also map to the same hash key. For example, the number “two” should have the same hash key, regardless of whether it’s stored as an integer “2”, as a floating point number “2.0”, as a fraction “4 / 2”, or as a complex number “2.0 + 0.0i”.
In the previous article we saw how Lua solves this problem by switching between different hash function on the fly: integral floating point numbers and integers use one hash function and fractional numbers another. Python takes a different approach: it defines a unified hash function for all numeric types. In this article I will explain how this hash function works and how it can be computed efficiently for the special case of binary floating point numbers. The resulting algorithm is an ingenious demonstration of how to combine floating point arithmetic, modular arithmetic, and bit operations. Let’s find out how it works!
Read More »
Lua is an economical programming language: it’s fast, compact, and both the language and its standard library are small but surprisingly powerful. There are only a handful of built-in data types and just one language feature for structuring data: tables can hold arbitrary key/value pairs, and represent, depending on the choice of keys and values, everything from arrays, sets, dictionaries, and structures to classes and objects. An array, for example, is a table indexed with integer keys, and a structure is a table indexed with strings that represent the field names.
All built-in types can be used as table keys and values. In this article I will explain how Lua handles numeric keys, and in particular floating point keys. In keeping with Lua’s tradition, the solution is practical and efficient. But, also in keeping with Lua’s tradition, the implementation isn’t trivial and narrowly skirts undefined behavior. Let’s take a look.
Read More »
This article addresses a few general questions: what is the difference between floating point numbers and other types of hash keys? What are common pitfalls when defining hash functions for floats? How do we handle special values such as minus zero, infinity, and NaN?
Read More »