Announcing Tailored Location Segments, a New Geopulse Audience Feature Enabling Marketers to Create Custom Location-Based Audience Segments

Today, we launched the latest enhancement to Geopulse Audience, Tailored Location Segments. Tailored Location Segments allow marketers to create completely custom audience segments from scratch based on any definition of place and time. These user-created segments can then be combined with any pre-defined segment in Geopulse Audience, enabling marketers unprecedented ability to create the precise audience that best meets their needs. Traditionally, marketers have been forced to choose from pre-defined audiences in a picklist, limiting their campaigns to the availability of targeting – we are reversing that model, enabling marketers to craft their targeting to best fit their ideal campaign.

For example, two brands target audiences may be fitness enthusiasts. However, one brand’s ideal fitness enthusiasts may be people who go to the gym or other fitness centers, while another’s may be those who primarily engage in outdoors activities. In the traditional audience model, both would have to settle for a pre-defined “fitness enthusiast” that mixes their ideal audience up with others they are less interested in reaching. With Tailored Location Segments, each brand can create their ideal segment. The first brand could create their own segment of “fitness enthusiasts” reaching people who visit gyms, yoga studios, spinning centers, etc. The second brand can tailor their campaign to reach people in parks / recreation areas, cycling stores, running stores, outdoor tracks and the like. And if one of the brands wanted to narrow it down to “fitness moms” they could add Factual’s “moms” segment and further refine their audience.

Location-based ad targeting is only as good as the data underpinning it. As part of Geopulse Audience, Tailored Location Segments use only accurate and precise user location data, mapped to our Global Places data, the highest quality places data in mobile marketing, covering over 65 million places in 50 countries.

With our easy-to-use self-serve tools, we’re making this capability broadly available in both programmatic and direct sales environments, removing the need for ad ops or engineers to create complex queries on backend databases. Tailored Location Segments is extremely accessible and is being made available to marketers through all of our media partners including InMobi and StrikeAd.

For a guided tour of the Tailored Location Segments check out this video:

Factual Tailored Location Segments from Factual on Vimeo.

We caught up with our partners StrikeAd and InMobi who have used Tailored Location Segments in campaigns and here is what they had to say:

“The level of customization available through Factual’s Tailored Location Segments is what really stands out from the other location based audience offerings in the marketplace,” said Alex Rahaman, CEO of StrikeAd, a leading cross-device, programmatic buyer. “Marketers are now truly able to build the audiences they want to reach, based on physical real world consumer behaviors.”

“InMobi has been able to enrich its mobile consumer data of over one billion monthly active users with Factual’s geo-data and geo-expertise,” stated Anne Frisbie, InMobi’s VP/GM for Global Alliances to GeoMarketing. “By doing this, InMobi​ — in partnership with Factual — is able to offer brands the ability to buy enhanced mobile audiences that are informed by geo-intelligence. InMobi believes that mobile consumer geo-intelligence is an important consumer signal for improving overall mobile audience targeting.”

If you’re interested in learning more about Tailored Location Segments, Geopulse Audience, or any of our ad targeting tools, please contact us.

How Geohashes Work

We use geohashes all the time at Factual, so one day I got curious and read through the canonical Java implementation to figure out exactly how they work. It was an enlightening read, but along the way I encountered some unfortunate bits of coding horror that got me wondering about the fastest way to encode and decode geohashes.

The geohash format

A geohash is a series of bits that repeatedly bisects a search space of latitudes and longitudes. The first bit bisects the longitude, the next one latitude, the next longitude, etc. As a result, some geohashes specify square-ish (in spherical coordinates) regions, while others specify 2:1 rectangles. Because geohashes are fundamentally binary data, they’re often written in base-32. The alphabet is a little unusual in that it omits some letters that might be confused with numbers.

Like floating-point numbers, geohashes contain some error depending on how many bits they have. A geohash library often returns the range of latitude and longitude values you get when decoding, since geohash error is typically much larger than the underlying floating-point error. You can’t technically decode a geohash to a specific point, but you can usually approximate by taking the midpoint of the error box. Wikipedia has a great example that illustrates how this works.

Typical encoding strategy

Geohashing is basically storing the intermediate results of two binary searches. The basic idea looks something like this in code:

long encodeGeohash(double lat, double lng, int bits) {
  double minLat = -90,  maxLat = 90;
  double minLng = -180, maxLng = 180;
  long result = 0;
  for (int i = 0; i < bits; ++i)
    if (i % 2 == 0) {                 // even bit: bisect longitude
      double midpoint = (minLng + maxLng) / 2;
      if (lng < midpoint) {
        result <<= 1;                 // push a zero bit
        maxLng = midpoint;            // shrink range downwards
      } else {
        result = result << 1 | 1;     // push a one bit
        minLng = midpoint;            // shrink range upwards
      }
    } else {                          // odd bit: bisect latitude
      double midpoint = (minLat + maxLat) / 2;
      if (lat < midpoint) {
        result <<= 1;                 // push a zero bit
        maxLat = midpoint;            // shrink range downwards
      } else {
        result = result << 1 | 1;     // push a one bit
        minLat = midpoint;            // shrink range upwards
      }
    }
  return result;
}

Decoding is similar; instead of bisecting based on midpoint comparisons, you just go backwards by setting either the min or max value to the midpoint based on the bits from the geohash.

Optimizing the algorithm

The code above is already pretty fast. We aren’t allocating any memory or making any method calls, so it’s a handful of primitive operations and conditions per bit we want to encode. The only way to make it substantially faster is to get rid of the loop; this involves finding a sublinear way to do both the bisections and the interleaving.

The latitude and longitude are in some sense bisected already since they’re specified as doubles. The only problem is that the mantissa bisects the range between two powers of two, not the limits of latitude or longitude. But we can change that easily enough with an add and a divide:

double adjustedLat = (lat + 90) / 180.0;
double adjustedLng = (lng + 180) / 360.0;

Now our numbers are both in the range [0, 1]. We can convert them to bits either by using Double.doubleToLongBits, or by multiplying and doing a cast:

long latBits = (long) (adjustedLat * 0x80000000l) & 0x7fffffffl;
long lngBits = (long) (adjustedLng * 0x80000000l) & 0x7fffffffl;

At this point we have the two binary search results, each stored in the low 31 bits of the respective longs. (Technically 30 is all we need since most geohashes end up written in base-32.) Now we need to interleave them, also without looping.

It turns out that this is a well-studied problem with some very cool solutions. My favorite one is from the indispensable Stanford bithacks website:

static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
static const unsigned int S[] = {1, 2, 4, 8};
unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x
unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.  
                // x and y must initially be less than 65536.
x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];
y = (y | (y << S[3])) & B[3];
y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];
z = x | (y << 1);

We need a 64-bit Morton number, so we’ll need one more stage. Here’s the code I ended up with to gap the bits:

static long widen(long low32) {
  low32 |= low32 << 16; low32 &= 0x0000ffff0000ffffl;
  low32 |= low32 << 8;  low32 &= 0x00ff00ff00ff00ffl;
  low32 |= low32 << 4;  low32 &= 0x0f0f0f0f0f0f0f0fl;
  low32 |= low32 << 2;  low32 &= 0x3333333333333333l;
  low32 |= low32 << 1;  low32 &= 0x5555555555555555l;
  return low32;
}

Then to finalize:

widen(lngBits) | widen(latBits) >>> 1;

This result can then be right-shifted to get the desired precision.

Calculating error bounds (decoding)

I mentioned earlier that geohash libraries typically return bounding boxes when decoding. Normally we’d have the required information available after the decoding loop, but the loopless version returns results without actually constructing the lat/lng ranges. We could trivially store these ranges in a lookup table since there are only 64 levels of precision, but there’s a fun algorithmic solution that’s almost as fast.

The latitude and longitude error are related: if #bits is even then longitude error = 2x latitude error, otherwise they’re equal. Given that, it suffices to calculate the latitude error first and then figure out the longitude error. Only every other bit impacts the latitude error, so there are 32 distinct possibilities. Specifically, the latitude error is 180 * 2-floor(#bits/2).

Math.pow() is the obvious solution, but it’s boring and is generally considered slow. Given that our exponent is always a 5-bit integer, we can use repeated squaring to very quickly come up with an exact answer:

double latError = 1;
for (int b = 0; b < 5; ++b) {
  latError *= latError;
  if ((bits & 2 << b) != 0)       // int bits is the geohash precision (0 to 63, inclusive)
    latError *= 0.5;
}
latError *= 90;

This can be optimized a bit further by unrolling and saving a couple of multiplies:

double error = 1.0;
if ((bits & 32) != 0) error *= 0.25;
if ((bits & 16) != 0) error *= 0.5; error *= error;
if ((bits & 8)  != 0) error *= 0.5; error *= error;
if ((bits & 4)  != 0) error *= 0.5; error *= error;
if ((bits & 2)  != 0) error *= 0.5;

double latError = error * 90;
double lngError = error * ((bits & 1) != 0 ? 90 : 180);

Final notes: decoding base-32

A common strategy is to use a HashMap or similar to do reverse-lookups on characters. That’s how the reference implementation does it, for example. The problem is that Java generic data structures always work on boxed values, so every invocation is going to require allocating the boxed type, calculating its hashcode, doing a .equals() check, and then unboxing the result. Not only is this a lot of work at runtime, but the lookup structure itself is quite large due to the nontrivial per-object overhead imposed by the JVM.

Even though it seems wasteful, a primitive array is a much more efficient way to represent the reverse lookup table:

static final char[] BASE32 =
  { '0', '1', '2', '3', '4', '5', '6', '7',
    '8', '9', 'b', 'c', 'd', 'e', 'f', 'g',
    'h', 'j', 'k', 'm', 'n', 'p', 'q', 'r',
    's', 't', 'u', 'v', 'w', 'x', 'y', 'z' };

static final byte[] BASE32_INV = new byte[(int) 'z' + 1];

static {
  for (int i = 0; i < BASE32.length; ++i)
    BASE32_INV[(int) BASE32[i]] = (byte) i;
}

Check out the code on Github

This code is available in the Factual beercode-open repository, which is a collection of libraries whose correctness is informally guaranteed by beer. And if Morton codes and repeated squaring are your thing, you might be interested in working with us.

– Spencer “keeping it primitive” Tipping, Software Engineer

Factual and Metamarkets Partner to Combine Geo-Behavioral Data with Real-Time Auction Analytics

We are excited to announce a new partnership with Metamarkets, the leader in analytics for programmatic marketing. We are integrating Factual’s rich, global datasets into the Metamarkets platform in order to give mobile buyers and exchanges instant visibility into available inventory and campaign performance, integrated with Factual’s audiences and geographies.

Metamarkets will be utilizing Factual’s Geopulse Platform to deliver location-based audiences to media buyers and sellers. Exchanges such as MoPub at Twitter will be able to provide this audience-level insight into campaign performance allowing buyers to see the interaction an ad has with various geographic, behavioral, and retail segments.

This partnership between Factual and Metamarkets provides marketers and buyers a focused lens into campaign reach as well as the compositional makeup of users that can be found across multiple major supply sources. By adding Factual’s audience insights, which are based on interactions with the physical world, to Metamarkets’ exceptional platform, we can provide a very novel and powerful mobile ad solution.

Here is what Metamarkets had to say about this new partnership:

“Mobile is the fastest-growing segment of programmatic advertising,” said Mike Driscoll, Co-founder & CEO, Metamarkets. “A big part of that growth is from the revolution in location-based marketing pioneered by companies like Factual. By pairing Factual’s rich location and audience data with real-time market intelligence, we hope to give our clients unprecedented visibility into mobile audiences and to help them find the most productive ways to reach them.”

We also heard from StrikeAd, a partner that is utilizing this offering:

“For those of us on the forefront of mobile marketing, identifying and reaching audiences based on location is the promised land,” said Alex Rahaman, CEO of StrikeAd, a leading cross-device, programmatic buyer. “The Metamarkets-Factual partnership is building the analytics foundation to drive our mobile campaigns to even higher levels of success for our clients.”

Metamarkets helps the world’s top ad marketplaces operate at peak performance and share insights with their clients through providing real-time visibility into marketplace activity. With patented innovations in streaming data ingestion, storage, and visualization, the Metamarkets platform is built to handle the extreme scale and complexity of programmatic ad data. The company’s full-stack approach ensures the data is fresh, instantly available, and easily explored by business users and data scientists alike.

Contact sales to learn more about Factual’s mobile ad targeting products.

Using Data to Improve User Experience with Factual- Tyler Bell [Video Interview]

Factual VP of Product, Tyler Bell sat down with The Application Developers Alliance during CES to discuss the role data plays in the process of creating an engaging app. Tyler explains how app developers can use data to efficiently put consumers first while also increasing app monetization.

Highlights Include:

  • How data plays into apps to create an engaging experience for consumers
  • How publishers can utilize data to create contextual relevance
  • How app developers can get started with Factual to personalize an app, create targeted content, and better enhance the participatory value of an app

Drake after Two Years: “Barely Famous”

Drake cool ducky logoWe released Drake (“Data workflow tool, like a ‘Make for data’”) two years ago. The impetus behind Drake was the classic “scratch your own itch”. We craved a text-based data workflow tool that would make it easier to build and manage our data processing tasks.

We open sourced Drake in the hope that others would find it useful and maybe even be willing to help. We were delighted when the original blog post attracted attention. A fair number of relevant bug reports and feature requests flowed in. “Are you awash in fame and fortune?” my boss, Boris, asked me. “No,” I replied, “we’re awash in github tickets.” Most exciting, over the past two years we’ve seen substantial bug fixes and contributions to the project from outside Factual. This includes S3 support and a Clojure front end (thanks, all!).

Early on we applied Drake internally with an almost reckless abandon, trying it out on just about any project that looked even remotely like a data workflow. Over time, however, we developed a better feel for the strengths and weaknesses of Drake’s design choices.

Drake makes it refreshingly easy to spin up proof of concepts. We like to use Drake to express evolving data workflows that will be shared across a wide team. And Drake is especially useful for tying together heterogeneous scripts and command line calls, thereby taming an otherwise hodgepodge of tasks. We also like Drake for helping us manage chains of Hadoop jobs that relate to each other.

On the other hand: we’ve found Drake is not great if you just need to glue together some existing same-language code. That can usually be done more simply by staying within the borders of that language. Doubly so if you plan to share your work with others who don’t already know and understand Drake workflows.

Also, Drake does not fully answer all problems in the data workflow space. A leading example: how to manage long running tasks? For tasks that take a long time and do a large amount of work, you often want features like resumability and trackability. At Factual, we have a custom built task management service called Vineyard that addresses these and other similar issues for us. We have glue that allows Vineyard and Drake to work together in various ways, but Drake out-of-the-box doesn’t offer these kinds of features for long running tasks.

Earlier this year Factual welcomed Clojure aficionado Alan Malloy to our engineering ranks. Alan showed interest in Drake and invested time and expertise in maintaining the codebase and responding to community requests. This was no surprise given Alan’s Clojure chops and generous willingness to help people. We invited Alan to become the primary owner of Drake’s codebase and it was super great that he accepted.

We hope that Drake’s future is bright and that the project continues to evolve to better serve users. We’re encouraged by the bit of traction that the project has seen so far — I like to think of Drake as “barely famous”: Drake was given its very own chapter in the recently published “Data Science at the Command Line”, and Artem’s YouTube tutorial for Drake has been viewed over 5,000 times, or as Artem puts it: “Since launch, people spent cumulative 639.8 hours watching Drake tutorial on Youtube, which is not Apache Hadoop, of course, but still pretty neat. :)”.

If you’re a current user of Drake, we hope you’ll let us know what you think and tell us what’s missing. If you’ve never used Drake but have always wanted a ‘Make for data’, we hope you’ll give it a go.

And if you’ve ever filed a ticket or, even better, sent us a pull request… Thank you!

Yours,

Aaron Crow
Factual engineer and Drake contributor