We have a number of primary data directives at Factual: one is to create a global coverage of the world’s POI and businesses. At the time of writing we manage over thirty-one million entities in thirty-four countries; many more are in the oven.

We produce and manage this content on a country-by-country basis — this continues to make sense, as each country requires its own validation and canonicalization configurations. However, we are crafting a number of initiatives that abstract international boundaries to help create truly global applications. The first, and perhaps most important, step is standardizing Local and POI data on a single schema, employed across all countries.

Anyone who has worked with Local Data is a member of a small fraternity, bound by a deep and shared frustration that this stuff is, without doubt, a pig to work with. The vagaries of postal formatting and political administrative hierarchies will always ensure that the World is never easily typed.

In capturing an un-normalized world, our choice therefore was either to capture every attribute for each country and retain semantic clarity, or to compromise in return for ease-of-use across international countries.  As the mobile use case is increasingly international, we opted for the latter. This will undoubtedly entail some semantic shoehorning, but the big win is that creating and growing truly global location-based applications becomes easier, as you can now manage all of your Factual place data centrally.

The Schema

This schema presented here is a superset; We will formally support only a subset of these in any given country.

factual_id Factual ID
name Business/POI name
po_box PO Box. As they do not represent the physical location of a brick-and-mortar store, PO Boxes are often excluded from mobile use cases. We’ve isolated these for only a limited number of countries, but more will follow
address Street address
address_extended Additional address incl. suite numbers
locality City, town or equivalent
region State, province, territory, or equivalent
admin_region Additional sub-division, usually but not always a country sub-division
post_town Town employed in postal addressing
postcode Postcode or equivalent (zipcode in US)
country The ISO 3166-1 alpha-2 country code
tel Telephone number with local formatting
fax Fax number formatted as above
website Authority page (official website)
latitude Latitude in decimal degrees (WGS84 datum). Value will not exceed 6 decimal places (0.111m)
longitude as above, but sideways
category String name of category tree and category branch
status Boolean representing business as going concern: closed (0) or open (1) We are aware that this will prove confusing to electrical engineers
email Contact email address of organization

 

If this looks familiar, it should: it’s based on vcard/hcard attribute names, and you’ll see similar attribute names in Facebook’s OpenGraph markup, and in the return values from the Bing and Google Maps APIs (Ovi currently adheres to the city/district/state model, and Yahoo punts entirely). It could be argued that we should employ something more semantically rich, like Google’s multi-typed attributes in their Places API, but we learn towards ease-of-use, where we lean at all.

I’ll shortly post a link to a JSON file on GitHub where we enumerate the supported attributes for each country, and note where the ‘proper’ label differs from the default attribute. This schema harmonization is the first step in our data-centric approach to Geo and Local; we hope it helps engender more global location-based applications and, at the very least, makes ingesting and managing local data that much easier.

- Tyler Bell, Factual Product Bod