At Factual, we have a lot of geospatial data that we want to send to a browser. It’s often possible to use polygons or images to transmit this data efficiently, but a couple of weeks ago we ran into a case where we needed a floating-point quantity for each map tile pixel. We started by sending the data to the client as JSON, but soon we found that the browser was CPU-bound and spending almost half of the time in the native JSON decoder.

After spending some time looking for an existing solution, we decided to write our own because the problem was just too much fun to ignore. So we created Flotsam, a library whose goal is to provide insanely fast floating-point number serialization for Java and Javascript (and a common format between them). It provides two functions:

// Javascript:
var xs = [...];         // Float64Arrays also work
var encoded = flotsam.encode(xs);
var ys = flotsam.decode(encoded);
// Java:
double[] xs = ...;
String encoded = org.flotsam.Flotsam.encode(xs);
double[] ys = org.flotsam.Flotsam.decode(encoded);

We benchmarked it on V8 and found that Flotsam was between 5x and 10x faster than JSON for decoding, and slightly faster than JSON for encoding. In addition, the serialized format was about 55% smaller in Flotsam (albeit totally unreadable by humans).

How It Works

Flotsam works by reducing each floating-point number to a binary quantity and encoding it as a ten-digit base-94 number. This keeps the representation printable and avoids UTF-8 encoding (which would increase the byte length in transit), but gives only 20% overhead above a straight bitwise format.

The JVM provides methods that convert floating-point numbers to and from binary, so the Java encoder was easy to write. Javascript, on the other hand, doesn’t have a good way to do this, at least not one that works reliably across browsers. This meant that we’d have to derive the bits using floating-point math, ideally not losing any to rounding in the process.

Doing this turned out to be faster and easier than we expected. Here’s how 64-bit floats are represented:

63      62........52  51.........0
sign(1) exponent(11)  mantissa(52)

Depending on the exponent, there are three cases for how the value is interpreted:

  • If exponent = 2047, then the number is Infinity or NaN (Flotsam doesn’t support these values)
  • If 1 <= exponent <= 2046, then the number is normal: (-1)sign * 2exponent – 1023 * (1 + 2-1 m51 + 2-2 m50 + … + 2-52 m0).
  • If exponent = 0, then the number is subnormal: (-1)sign * 2-1022 * (0 + 2-1 m51 + 2-2 m50 + … + 2-52 m0).

To encode a number, Flotsam first determines the sign, then the exponent, and finally the mantissa:

var sign = x < 0;
if (sign) x = -x;
// Binary-split for the observed exponent. We converge on the exact value in
// 11 iterations.
for (var l = 0, u = 2048, m, i = 0; i < 11; ++i)
  if (x < float_powers[m = l + u >> 1]) u = m; else l = m;
var exponent = l;

float_powers is a static lookup table that we generate up front.

float_powers[1023] = 1.0;
for (var i = 1024, base = 1.0; i <  2047; ++i) float_powers[i] = base *= 2.0;
for (var i = 1022, base = 1.0; i >= 1;    --i) float_powers[i] = base *= 0.5;
float_powers[0]    = float_powers[1];
float_powers[2047] = float_powers[2046];
var mantissa_norm = float_powers[1023 + 52];

It’s possible to use Math.log to find the exponent, but this was slow and subject to rounding error. We had better luck generating a table of powers of two, then using 11 iterations to binary-split the exponent space. Once the exponent is known, we can divide by that power of two, subtract the implied 1 for normal numbers, and multiply by 252 to get the mantissa bits to behave like an integer. We also needed to save the sign and exponent into two base-64 digits:

var mantissa = exponent !== 0
             ? (x * float_powers[2046 - exponent] - 1.0) * mantissa_norm
             :  x * float_powers[2045]                   * mantissa_norm;
var result   = digits[sign << 5 | exponent >> 6 & 0x1f]
             + digits[            exponent      & 0x3f];

Base-94 encoding the result was just a radix-coding problem, but we couldn’t outperform the JSON encoder until we used a lookup table to encode two digits at a time (the magic numbers here are successive powers of 942 = 8836):

var d78 =  mantissa                        * (1.0 / 689869781056) | 0;
var d56 = (mantissa -= d78 * 689869781056) * (1.0 / 78074896)     | 0;
var d34 = (mantissa -= d56 * 78074896)     * (1.0 / 8836)         | 0;
var d12 = (mantissa -= d34 * 8836)                                | 0;
return result + (digit_pairs[d12] + digit_pairs[d34]
              +  digit_pairs[d56] + digit_pairs[d78]);

These division and multiplication operations introduce rounding errors, but because we’re integer-flooring each intermediate quantity they don’t end up impacting the result.

Decoding Flotsam-encoded floats is fast because we use a lookup table to grab the value for each base-94 digit:

var base = 1;
for (var i = 0; i < 8; ++i) 
   for (var j = 0; j < 94; ++j)
      base94_decode[i << 7 | j + 32] = base * j / mantissa_norm;
   base *= 94;
}

Then the decoder logic is a matter of summing up the table entries and reintroducing the exponent and sign:

var result = base94_decode[0 << 7 | s.charCodeAt(i + 2)]
           + base94_decode[1 << 7 | s.charCodeAt(i + 3)]
           + base94_decode[2 << 7 | s.charCodeAt(i + 4)]
           + base94_decode[3 << 7 | s.charCodeAt(i + 5)]
           + base94_decode[4 << 7 | s.charCodeAt(i + 6)]
           + base94_decode[5 << 7 | s.charCodeAt(i + 7)]
           + base94_decode[6 << 7 | s.charCodeAt(i + 8)]
           + base94_decode[7 << 7 | s.charCodeAt(i + 9)]
           + (exponent ? 1 : 0);

if (sign_exponent & 0x800) result = -result;
result *= float_powers[exponent];

Still reading? If so, you should come work with us!

 

- Spencer Tipping
(Tilting at windmills so you don’t have to)