Floating Point Keys in JavaScript:

In this article we take a look at how JavaScript handles hash keys that are floating point numbers. JavaScript differs from other languages in that it provides two associative data structures that handle floating point keys differently. Normal objects are the most common associative data structure in JavaScript: internally objects convert all keys to character strings. In the case of floating point numbers this seems to be a convenient solution because it doesn’t require a special hash function for floats, but it means that the float-to-string conversion must be defined very carefully.

In addition, modern versions of JavaScript have a dedicated associative data structure called Map which doesn’t convert all keys to strings first. But Map still treats floating point keys differently than all the other hash tables we have seen so far: it defines a custom comparison operator for floating point numbers that differs from == and ===. One important property of this comparison operator is that it avoids the problems with NaN keys we saw with other programming languages, but at the cost of introducing yet another notion of equality to the language.

Let’s take a look at Object and Map hashing in turn.

In principle, JavaScripts handling of hash keys is quite simple and uniform. Objects are really just glorified hash tables with some extra magic to make prototypical inheritance work. They can have an arbitrary number of properties, and every data type can be used to access these properties:

o = {}
o[1.0] = "Hello"
o[true] = "World"
o["2.1"] = "!"

Internally all keys are converted to strings using the Object.toString method. We can see this by printing the value of the object o defined above:

{ '1': 'Hello', true: 'World', '2.1': '!' }

In effect, the character strings returned by toString have two different purposes in JavaScript: they represent objects in human-readable form, but they also serve as hash keys when accessing Object properties. This second purpose must be kept in mind when defining toString: similar to hash functions, toString must map equal values to equal strings and should map distinct values to distinct strings.

The JavaScript specification contains fairly strict requirements for the implementation of toString for numbers. The following details are interesting:

  • Both +0 and -0 are converted to "0" and therefore represent the same hash key.
  • All NaN values are converted to "NaN", so JavaScript objects can have (exactly one) “NaN” entry. This way, JavaScript avoids the usual complications that come with NaN keys.
  • Other floating point numbers are encoded in such a way that the printed representation is as short as possible and that the original number can be recovered exactly from the string.

Note that toString has to handle the same special cases (signed zero and NaN) that are also handled by hash functions for floating point numbers. In fact, toString really is a kind hash function in this case, except that it doesn’t return an integer as usual but a string. It’s also worth noting that implementing toString correctly and efficiently is challenging, mainly because converting floating point numbers to decimal is a nontrivial problem; if you are interested in the details, you can take a look at V8’s implementation in fast-dtoa.cc and bignum-dtoa.cc.

As of ECMAScript 2015, JavaScript programmers have access to a second kind of hash table called Map which solves a few issues that make object error-prone. A few important differences between Map and Object are:

  • Map is a deterministic hash table, i.e., iteration visits entries in insertion order. (In practice, all JavaScript implementations also iterate over Object properties in insertion order, but this isn’t mandated by the standard.)
  • Unlike Object, Map is a plain data structure; in particular, it doesn’t interact with the language’s implementation of inheritance. Code that uses Object as an associative data structure often has to use Object.hasOwnProperty to filter out inherited properties; this isn’t necessary when using Map.
  • Map keys aren’t implicitly converted to strings. Only keys that are strictly equal (according to the === operator) map to the same entry. For example, true and "true" refer to different Map entries.
  • Numeric keys don’t use === for comparison. Instead, JavaScript defines a special SameValueZero predicate that is only used by Map.

The last item in this list is particularly interesting: Map is the first hash table we’ve seen in this series that defines a special comparison operator for numeric keys. SameValueZero is almost identical to the normal strict equality comparison via ===, except that it treats all NaN values as equal:

SameValueZero(NaN, NaN) = true
SameValueZero(x, y) = (x === y)

Using this comparison predicate instead of === has two benefits: it avoids the usual problems that result from NaN !== NaN like unretrievable hash table entries, and it ensures that Map handles NaN keys in the same way Object handles NaN properties.

The main disadvantage of this solution is that it adds yet another equality predicate to the language. JavaScript already has ==, ===, and Object.is, which all subtly differ in the way they treat special values like null, undefined, ±0 and NaN, and adding SameValueZero to the list doesn’t make things easier. But overall I think this is a reasonable price to pay for making the semantics of Map simpler and more predictable.

I won’t go into the implementation of hashing in Map in detail. If you are interested, take a look at V8’s implementation. Like Lua, V8 uses different hash functions for small integral numbers and fractional numbers. For fractional numbers, the 8-byte IEEE representation of the number is treated as a 64-bit integer and is hashed using ComputeLongHash, which is defined as follows:

inline uint32_t ComputeLongHash(uint64_t key) {
  uint64_t hash = key;
  hash = ~hash + (hash  31);
  hash = hash * 21;
  hash = hash ^ (hash >> 11);
  hash = hash + (hash  22);
  return static_cast(hash);

This function belongs to a set of integer hash functions designed by Thomas Wang; his original article seems to have disappeared from the Internet, but you can still read it the archived version. The operations in this functions are carefully chosen so that collisions are rare and changing one bit in the input affects many bits in the output, a property known as avalanche.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s