Floating Point Keys in Go: How to Make Hash Functions Unpredictable

The Go Programming Language has a built-in map datatype that is implemented using hash tables. The only types that can be used as map keys are those for which the language defines an equality operator ==; this includes primitive types like Booleans, strings, numbers, and pointers, but also composite types like arrays and structures. For all these types Go automatically generates a suitable hash function. Maps are not extensible, however: if you need a hash table that supports custom comparison or hash algorithms you have to write your own.

Floating point numbers can be used as map keys, and since Go follows the normal conventions for comparison, so maps treat all NaN keys as distinct since NaN ≠ NaN. As a consequence NaN keys can be inserted into maps but are unretrievable; in fact, any number of NaN keys can be inserted, and because they are all distinct it’s possible to end up with a map in which every entry is unretrievable. Normally this would be merely a curiosity, an interesting side effect of the quirky semantics of floating point numbers. But this curiosity can lead to an exploitable performance issue if the hash function being used maps every NaN to a single hash code. In this case all the NaN keys collide in the hash table and the runtime cost of inserting n of them drops to O(n2), down from the O(n) we would expect for uniformly distributed hash codes.

Overloading servers by causing hash table collisions has become a a real security threat in recent years. Many languages and hashing algorithms were found to be susceptible to these so-called hash-flooding attacks and had to be updated. In the following I will discuss how Go prevents hash-flooding attacks, in particular for the special of case floating point keys.

Read More »

Floating Point Keys in Lua: a Simple Ad-Hoc Solution

Lua is an economical programming language: it’s fast, compact, and both the language and its standard library are small but surprisingly powerful. There are only a handful of built-in data types and just one language feature for structuring data: tables can hold arbitrary key/value pairs, and represent, depending on the choice of keys and values, everything from arrays, sets, dictionaries, and structures to classes and objects. An array, for example, is a table indexed with integer keys, and a structure is a table indexed with strings that represent the field names.

All built-in types can be used as table keys and values. In this article I will explain how Lua handles numeric keys, and in particular floating point keys. In keeping with Lua’s tradition, the solution is practical and efficient. But, also in keeping with Lua’s tradition, the implementation isn’t trivial and narrowly skirts undefined behavior. Let’s take a look.

Read More »

Using Floating Point Numbers as Hash Keys

Strings and integers are the most common keys for hash tables, but in principle every type can be used as a hash key, provided we define a suitable comparison operator and hash function. Most languages that support hash tables also provide hash functions for all built-in types, including floating point numbers. In this series of articles I will take a look at how languages like Lua, Python, JavaScript, and C++ handle floating point keys. It turns out that there are significant differences between these languages, both in the way they handle special values like NaN and in the way they define their hash functions.

This article addresses a few general questions: what is the difference between floating point numbers and other types of hash keys? What are common pitfalls when defining hash functions for floats? How do we handle special values such as minus zero, infinity, and NaN?

Read More »