We have a number of primary data directives at Factual: one is to create a global coverage of the world’s POI and businesses. At the time of writing we manage over thirty-one million entities in thirty-four countries; many more are in the oven.
We produce and manage this content on a country-by-country basis — this continues to make sense, as each country requires its own validation and canonicalization configurations. However, we are crafting a number of initiatives that abstract international boundaries to help create truly global applications. The first, and perhaps most important, step is standardizing Local and POI data on a single schema, employed across all countries.
Anyone who has worked with Local Data is a member of a small fraternity, bound by a deep and shared frustration that this stuff is, without doubt, a pig to work with. The vagaries of postal formatting and political administrative hierarchies will always ensure that the World is never easily typed.
In capturing an un-normalized world, our choice therefore was either to capture every attribute for each country and retain semantic clarity, or to compromise in return for ease-of-use across international countries. As the mobile use case is increasingly international, we opted for the latter. This will undoubtedly entail some semantic shoehorning, but the big win is that creating and growing truly global location-based applications becomes easier, as you can now manage all of your Factual place data centrally.
This schema presented here is a superset; We will formally support only a subset of these in any given country.
|po_box||PO Box. As they do not represent the physical location of a brick-and-mortar store, PO Boxes are often excluded from mobile use cases. We’ve isolated these for only a limited number of countries, but more will follow|
|address_extended||Additional address incl. suite numbers|
|locality||City, town or equivalent|
|region||State, province, territory, or equivalent|
|admin_region||Additional sub-division, usually but not always a country sub-division|
|post_town||Town employed in postal addressing|
|postcode||Postcode or equivalent (zipcode in US)|
|country||The ISO 3166-1 alpha-2 country code|
|tel||Telephone number with local formatting|
|fax||Fax number formatted as above|
|website||Authority page (official website)|
|latitude||Latitude in decimal degrees (WGS84 datum). Value will not exceed 6 decimal places (0.111m)|
|longitude||as above, but sideways|
|category||String name of category tree and category branch|
|status||Boolean representing business as going concern: closed (0) or open (1) We are aware that this will prove confusing to electrical engineers|
|Contact email address of organization|
If this looks familiar, it should: it’s based on vcard/hcard attribute names, and you’ll see similar attribute names in Facebook’s OpenGraph markup, and in the return values from the Bing and Google Maps APIs (Ovi currently adheres to the city/district/state model, and Yahoo punts entirely). It could be argued that we should employ something more semantically rich, like Google’s multi-typed attributes in their Places API, but we learn towards ease-of-use, where we lean at all.
I’ll shortly post a link to a JSON file on GitHub where we enumerate the supported attributes for each country, and note where the ‘proper’ label differs from the default attribute. This schema harmonization is the first step in our data-centric approach to Geo and Local; we hope it helps engender more global location-based applications and, at the very least, makes ingesting and managing local data that much easier.
- Tyler Bell, Factual Product Bod