Factual On Location Data, Privacy And Trust In The Mobile Space – Tyler Bell [Podcast]

Factual VP of Product, Tyler Bell joined Jacob Goldstein of Application Developers Alliance for a Voice of the Industry podcast to discuss the present and future of location data, the increasing amounts of personal information available to developers, and how to maintain the trust between developer and user while making use of that data.

Highlights include:

  • The role of location data in establishing context— the more granular the data, the more valuable.
  • Understanding general patterns with broad brushes about where the user tends to operate.
  • Privacy and the importance of respecting the bond between a user and a brand, publisher, app, etc..

Emergent Behaviors in Factual’s Geopulse Audience Profiles

Factual’s Geopulse Audience product assembles real-world profiles for millions of smart-phone users around the world. A suite of sophisticated geo-fencing, machine-learning, and heuristic methods are used to convert the user input, a set of lat/long records for a particular device, into a colorful description of the user. This description includes demographic, behavioral, and geographic information, such as a user’s age, income, ethnicity, whether they are a likely golfer, mattress shopper or electronics buyer, and which places they have visited over the past year.

As part of our ongoing QA of Geopulse Audience profiles, we had the opportunity to delve into this rich dataset of users in search of emergent consumer patterns. We asked questions such as who is likely to visit which places, are there places that are often visited in concert, and which people exhibit what consumer behaviors?

We calculated the correlation between 85 places, 9 demographic descriptors, and 25 behavioral segments for a set of more than 30 million user records making use of a Clojure library for distributed statistics.

First, we looked at a birds-eye-view of the space — the correlations for a Cartesian join on the full set of places, behavioral segments, and demographics. While at first glance this plot looks inscrutable, there are a couple of noteworthy observations. Sanity check: records are perfectly correlated with themselves. Second, there are a lot more purple squares (positive correlations) that brown squares (negative correlations). This reflects a positive bias in the way that we gather information about a particular device. An application that streams geotags more often has more geotags for ubiquitous places like Starbucks, ATMs, and movie theaters. Because the number of visits to common places is a function of the amount of geotags, place visits to Starbucks, ATMs, etc. are positively correlated. These positive correlations are more likely to surface in Audience profiles than are true negative correlations, i.e. a person who often visits McDonalds is that much more unlikely to visit Burger King. This skew toward positive correlations can also apply to behavioral segments that are learned in part based on place visits.

Also, there are several fields that look vaguely like white stripes, i.e. appear to be evenly distributed across most other fields. These white stripes include affluent consumer, age, college student, commuter, entertainment enthusiast, financial customer, female, income, leisure seeker, and NFL enthusiast. We expect income, age, and gender, to be more or less equally distributed across places because most of the places present in these records are visited by a wide range of people (McDonalds, CVS, etc.) It is also plausible that the behavioral segments like affluent consumer (one who frequents non-chain stores), financial customers (one who uses banks or ATMs), entertainment enthusiast (one who frequents movie theaters, dance clubs, etc.), date nighter (one who frequents restaurants and bars), commuter (one who travels more than five miles to get to work), leisure seeker (one who goes to playgrounds, parks, and pools), and NFL enthusiast are somewhat equally distributed across demographics, other segments, and visits to various places.

Next, we zoom in on some interesting features. We grab the 15 fields with the highest mean absolute correlation (most correlated across the board) and the 15 fields with the highest standard deviation in correlation (a gross proxy for several highly correlated fields). Birds of a feather flock together it would appear. Different car dealerships feature as a block of highly correlated fields, as car dealerships tend to be located near one another and if you’re shopping for a car you are likely to visit more than one lot. Clothing stores, such as Old Navy, JC Penney, Abercrombie and Fitch, Gap, Banana Republic, H&M, American Eagle Outfitters, and Victoria’s Secret cluster together as well, likely for the same reason (what we see in this block of highly correlated retail stores is a topology of the typical American Mall!).

Rather than beg the question whether a user is more likely to shop at two stores that are in close proximity, we can look for stores that we do not expect to be near one another but are still significantly correlated. In this case, we are interested in the elements off of the block diagonal in the above image such as the correlation between Dairy Queen and Jeep and Dodge dealers, JC Penney visitors and Mitsubishi dealers, and curious possibility that consumers are not using cash to purchase their Hyundais.

We partition places into a couple of sets (retail stores, car dealerships, ATMs/Starbucks/Movie Theaters) and identify correlations between places from different groups. We plot correlations between some of these places below. It looks like if you’re going to Napa Auto Parts you’re likely to eat burgers at Sonic, shopping at Nordstrom then you’re probably picking up pet supplies at Pet Smart, and frequenting ATM Banks in order to fuel your Starbucks addiction. In the case of the latter, this is likely a function of the fact that ATM Banks and Starbucks co-occur quite frequently (80,000 times) and just another manifestation of the more data, more visits/segments bias.

We can double check that we’ve identified the major clusters of places by running a metric multidimensional scaling algorithm on our various places (temporarily removing segments for the sake of clarity). MDS represents the distance between our various places in a 2-d space. We see that the majority of car dealerships are clustered in the top left corner, though notably Mercedes and Land Rover dealers are in their own space. In the top right hand corner, we see the Nordstrom, American Eagle, Gamestop cluster we identified earlier. We may also laugh to see the male demographic sit squarely at the center of Best Buy, Subway, and McDonalds.

As a final investigation (though with such a rich dataset there are myriad cool questions one could ask), we inquire whether particular behavioral segments are likely to co-occur and what, if any, are the demographic biases for particular consumer behavior?

A plot like the one above provides powerful insight into consumer segments. It also allows us to see whether various segments are behaving as we expect them to. Each segment was derived from an independent model, so this type of cross-examination based on external ‘truthiness’ is very powerful. For example, it seems reasonable that a user is less likely to be a college student the older they are or the more they appear to be a frequent traveller. Affluent consumers have higher incomes, date nighters overlap a fair bit with entertainment enthusiasts, and males are more likely than females to be golfers.

This exploration is the first of a series in which we QA Geopulse Audience profiles by comparing what they encode about user segments, demographics, and place visits to what we know about those fields based on external sources of information, such as consumer surveys and census data. These analyses enable us to refine our models and maximize the quality of our Geopulse Audience product.

Please email me at natasha@factual.com if you have any questions or feedback about these results or if you would like to learn more about being a Data Engineer at Factual!

- Natasha Whitney, Data Engineering Intern

Notes:
1. In order to ensure that our results will hold for samples of varying sizes, we only included correlations that had a 5% or smaller probability of occurring by mere chance given the number of records that were used to calculate the correlation (see this discussion of statistical significance for correlations).

InMobi Inks Global Partnership with Factual to Increase Mobile Ad Efficacy for Brands

Today we announced a partnership with InMobi, the world’s largest independent mobile advertising platform, to create a rich set of audience targeting solutions for brand advertisers. InMobi’s anonymous data set of global mobile consumer activity will be combined with Factual’s location-based data to build geo-derived audience segments worldwide. This partnership enables marketers to globally leverage a richer set of geo-targeting features that incorporate geo-based consumer intelligence at a large scale.

Brands are often challenged by ‘ad spillage,’ or dollars wasted when an advertisement reaches audiences outside their geo-spatial target segment. Providing advertisers with rich contextual data on consumers enables delivery of relevant ads with higher campaign efficacy, or lower ad spillage.

InMobi gathers anonymous user data from the 759 million monthly active unique users on its network. To enhance an advertiser’s targeting efficacy, InMobi builds on this understanding of users by partnering with multiple data-specialists across the ecosystem, such as Factual.

The partnership between InMobi and Factual will help advertisers understand how consumers move through the physical world by using aggregated anonymous user location data. Factual’s Global Places data covers over 65 million businesses and points of interest across 50 countries. Geopulse Audience uses this understanding of geography to build location-based profiles that contain hundreds of non-private behavioral attributes describing a user. For example, brands can now run campaigns targeting college students at Starbucks coffee shops or live sports fans at a stadium. In addition, InMobi will provide brands with geographically differentiated insights that will enable marketers to build targeted campaigns. For instance, brands will be able to market to movie goers in New York differently than to movie goers in London.

“As consumers spend more time on their mobile phones, our investments in building the infrastructure that gleans powerful data signals and insights from these interactions will provide increasing value to marketers,” said Anne Frisbie, Vice President & General Manager – Global Alliances, InMobi. “We are committed to enabling consumers and businesses to make smarter decisions. By partnering with Factual, we are able to improve our geo-targeting capabilities, thereby offering more relevant and engaging mobile marketing. Of equal importance, we are able to offer targeted audience buying to brands not just in the US, but around the world.”

Every market in the world is being transformed by mobile – the valuable signals generated by mobile devices enable marketers to deliver contextually relevant user experiences that were previously impossible. We believe that advertisers and developers everywhere are going to benefit from the combination of InMobi’s global reach and the scale of its mobile consumer data, combined with our global location data and understanding.

-Bill Michels, SVP Product and Partnerships

Expanding Restaurants Extended Attributes Data Coverage to France, Germany, and Australia

Today we are excited to announce the expansion of our Restaurants Extended Attributes data to include France, Germany, and Australia. While our Global Places data has always covered restaurants around the world, the 43 additional restaurant specific attributes in our Restaurants Extended Attributes data were previously only available in the United States and the United Kingdom. The new France, Germany, and Australia data boasts highlights such as:

  • Data on 300,000+ restaurants in France, 170,000+ in Germany, and 100,000+ in Australia.
  • 43 restaurant specific attributes1, covering everything you need to know about a restaurant, including:
    • Cuisine types and meal types.
    • Ratings and price ranges.
    • Alcohol policies.
    • Good for kids and if it has a kids menu.
    • Parking information.
    • Smoking policy.
    • Payment options.
    • See the schema for the complete attribute list.
  • Synced with Global Places- core attributes and Factual IDs match those in Global Places.

So whether you’re in Berlin with your family and looking for a restaurant that’s “Good for Kids”, in Marseille and interested in Provençal cuisine, or in Melbourne for the holidays and in need of some ice cream, our restaurants data has you covered.

You can explore all of our Restaurants Extended Attributes data (US, UK, France, Germany, Australia) on our site, you can get an API key to access it programmatically, or you can request a download to host the data yourself.

– Vikas Gupta, Director of Marketing and Operations

Notes:

1. Coverage across attributes varies, but will improve over time.

“Do you really know your consumer? The art and science of location data.” – Tyler Bell [Video]

Last week Factual VP of Product, Tyler Bell, spoke with VentureBeat about the vast increases in location data points brought about by mobile devices and wealth of insight we can gain from them. Watch the full interview to hear Tyler talk about “fundamentally knowing your user” by understanding their location data, providing timely and relevant content, and his thoughts about the exciting possibilities in the future of mobile.