Data Markets: The Emerging Data Economy

Note: This was originally posted on TechCrunch on 9/30/2012, available here

The term data market brings to mind a traditional structure in which vendors sell data for money. Indeed, this form of market is on the rise with companies large and small jumping in. Think of Azure Data Marketplace (Microsoft), data.com (Salesforce.com), InfoChimps.com, and DataMarket.com.

While this model allows organizations to acquire valuable data, the term is evolving to include a variety of forms, each with varying degrees of adoption success. At the heart of it, data markets enable organizations to access data in new ways, where the currency does not only have to be money, but can be in the form of data or insight.

There is also a trend where companies can outsource certain aspects of data management, especially around reference or canonical datasets, to a third party that specializes in assembling and curating datasets or creating value from data in other ways. As a result, new data economies are being formed where data can be created, accessed, rented, and perpetually maintained in a more simple and affordable way.

The new forms of data markets are powered by the Internet’s ability to allow rapid collection and exchange of data as well as by APIs that can search for and deliver data exactly where it is needed.

Consider the following examples:

  • Jigsaw has created a data market in which individuals and organizations provide contact information in a central repository. Jigsaw curates that data and distributes in part and en masse in exchange for both data and money.
  • Kaggle allows companies to provide data to a community of data scientists who analyze the data to discover predictive, actionable insight and win incentive awards. Data and rewards are traded for innovation.

The emergence of data markets has led companies to question the common “not-invented-here” attitude about data. If third parties can create and assemble valuable data, why not rent it rather than own it? If you don’t own the data, you also don’t have to maintain the data.

Data markets are also changing attitudes about data as an asset that must be kept private. While some data will clearly always be proprietary, in many cases the largest amount of value will come from sharing data and getting some new type of value in return.

Key questions for new participants to data markets include:

  • What is the value of your data inside your organization?
  • What is the risk in sharing it?
  • What control do you over the data?
  • What can you get in exchange for it?
  • What role should you play in data markets?

Modern data markets will employ a whole new generation of technology, processes, and data science that supersedes the previous generation of data management systems. These include:

  • Cloud computing: First of all, cloud computing is becoming widely adopted. Clusters can be spun up instantly with no lead time and expanded as needed to address unexpected ramps in demand. While the largest datasets can be expensive to manage within public clouds, the same core technology can be used to manage private clouds – offering a host of management and cost benefits.
  • Big data software: The Hadoop open source project has gained incredible steam, becoming the centerpiece of many new large scale efforts for distilling value from huge amounts of data. Established software companies like Microsoft and EMC/Greenplum as well as newer companies on the scene like Cloudera and Hortonworks are all working overtime to add value to the Hadoop stack with advanced management, cloud, and support offerings.
  • Data science and machine learning: Predictive modeling and machine learning is becoming part of a standard toolset when sorting through vast quantities of data to find patterns and relationships. Natural language processing and statistical techniques can be used to find relationships in unstructured data. All of these techniques are crucial as data volumes have grown dramatically.
  • APIs: The API is the glue that enables an application to integrate the appropriate slice of a large database or a sophisticated 3rd party data crunching capability – all in realtime. APIs also enable data to be collected in small or large chunks so that central curation workflows can be maintained with the utmost data freshness.
  • Crowdsourcing and social processes: Just as Twitter has enabled people to connect and communicate in new ways, data markets can use crowdsourcing and other social media-inspired methods to create new forms of sharing.

The new data market model is still being evolved and accepted by the business community, but I predict over the next few years it will become the de facto standard for accessing and managing data.

-Gil Elbaz

Two Additions to Factual’s Advisory Board

We are excited to announce that Clare Hart and Jason Rosenthal have joined our Advisory Board.

Clare Hart is an incredibly accomplished executive with an extensive history of leading data and information-intensive companies.  Most recently, she was the President and CEO of Infogroup where she led the company in development and execution of its first strategic plan to stabilize, turnaround, and transform the company.  Prior to that, she served as the Executive Vice President, Dow Jones & Company and President Dow Jones Enterprise Media Group, and before that she was President and CEO of Factiva.

“I am very excited about working with the Factual team.  The company represents the next generation of data aggregation and distribution. Using modern technology to compile massive amounts of content and providing simple APIs for easy access to data, Factual will change the game in global content aggregation.”

-Clare Hart

Jason Rosenthal is a seasoned technology executive with an admirable track record at a series of highly successful companies.  He’s held senior operating roles at AOL, Netscape, Loudcloud, Opsware, and HP.  Most recently he was EVP of Products at Glam Media, which he joined through the company’s acquisition of Ning where he served as CEO.

“Factual has become the center of an incredibly impressive network of companies consuming and providing data. They have unparalleled opportunities ahead of them to disrupt traditional data businesses and enable a tremendous amount of innovation by making data available to developers and enterprises. I am honored to have the opportunity to work with such a talented team in this exciting market.”

-Jason Rosenthal

The expertise and experience that Clare and Jason brings will be invaluable in helping us shape our strategy and tackle the incredible opportunities we have in front of us.  We have been working with them in a more informal capacity for the past few months and they have already proven to be incredible resources.

 

-Gil Elbaz

Updates to the Factual iOS SDK

Factual would like to announce the updated release of our officially supported SDK for iOS!  This update includes support for the many new API features Factual has to offer. Here are some of the main updates:

  1. New API support:
    • Resolve: starts with incomplete data for an entity, and returns potential entity matches
    • Facets: returns row counts for Factual tables, grouped by facets of data
    • Geopulse: provides point-based access to geographic attributes
    • Reverse Geocode: converts coordinates into addresses
    • Monetize: finds affiliate deals for places in Factual’s Global Places database
    • Match: maps your data to Factual places
    • Submit: submits edits to existing rows, or submits new rows of data to Factual tables
    • Flag: flags problematic rows in Factual tables
  2. Opened up the source code (accessible via the main project on github).
  3. Updated the sample iOS project to use this new SDK and illustrate new features.

Under the hood, we’ve also made following improvements:

  1. Removed dependency on OAConsumer Library.
  2. Switched to Automatic Reference Counting.

Setup

Simply follow the directions on our github page to get the latest version and start using it in your project.

You can also reference the demo project as a working example for incorporating Factual data in your own application, or use it as a starter to immediately begin work on your new application.

Example

Here’s a quick example of how to use the driver. First, create an authenticated handle to Factual:

FactualAPI* _apiObject = [[FactualAPI alloc] initWithAPIKey:@"yourkey" secret:@"yoursecret"];

Next, create the request of your choice, and fetch the results. For example, let’s say you want to find all Apple stores within 1000 meters of the Factual office:

    FactualQuery* queryObject = [FactualQuery query];
    [queryObject addRowFilter:[FactualRowFilter fieldName:@"name"
                                                  equalTo:@"Apple Store"]];
    CLLocationCoordinate2D coordinate = {34.06018, -118.41835};
    [queryObject setGeoFilter:coordinate radiusInMeters:1000];
 
    [_apiObject queryTable:@"places" optionalQueryParams:queryObject withDelegate:myDelegate];

Results from API calls are provided asynchronously via the the passed in delegate. Retrieve and print the results in the delegate implementation:

-(void) requestComplete:(FactualAPIRequest *)request receivedQueryResult:(FactualQueryResult *)queryResult {
     for (id row in queryResult.rows) {
        NSLog(@"Row: %@", row);
     }
}

For more examples on using the SDK with the various Factual APIs, please take a look at the tests in our github project.

We hope you find these improvements useful, and feel free to let us know if you end up using the driver for any new iOS applications, as we’re always excited to see ways that people are using our data.  Let us know if you have any questions or suggestions, and we hope you’ll enjoy using Factual in your mobile applications!

New to Factual?  Explore our data and APIs and sign up to get started.

Best wishes,
Brandon Yoshimoto
Software Engineer

Eva Ho Speaking at DataWeek 2012

Eva Ho, our VP of Marketing and Business Operations, will be speaking on a panel tomorrow at DataWeek titled “Challenges of Building on Geo Data”.  She will be joined by Ben Standefer (Urban Airship), Ankit Agarwal (Micello), and Peter Davie (TomTom).  It will be from 12:00 pm to 12:40 pm in the SPUR Urban Center, 4th floor, Room 2.  The session description is:

Why does there need to be dozens of sources of unique types of geo data? If so much geo data is free and public, why are we witnessing the emergence of technologies that do nothing but make it easier for us to find and integrate geo data? This panel will seek to understand why IP data, place data, venue data, or check-in data all have unique geo data challenges that require a diverse array of geo data providers.

 

New US Release & Enhancements to Global Places

We are very pleased to announce a number of significant enhancements to our US Places dataset soon to be followed by similar improvements to the rest of the world.

Enrichments to US Data

We’ve added a boatload of new entities to the US including 80K landmarks (parks, memorials, historic buildings, and other monuments), 25K transport hubs (airports, rail stations and a handful of ports), and 190K new ATM locations. We’ve also included over 50 million additional references and edits from our partners to improve both coverage and accuracy. This brings us up to just over 23 million entities in the US alone, and over 63 million places in 50 countries worldwide.

Category Enhancements Globally

Our categories have taken an increasingly central role in the distribution and management of our data, so we’ve made our categorization framework more friendly to humans and more efficient for machines. These improvements include:

  • 50 new categories for better, more granular classification
  • Numeric category IDs for more structured search and data management
  • Category translations available in Italian, German, French, Spanish, Korean, Japanese, and Chinese

We’ve made the entire category hierarchy available as a Factual table so you can query it in all languages, and also made it available as a JSON file on Github to facilitate baking in client-side category logic. See more information on categories here.

150 Chains

Chains — stores representing both local and national brands — are often included in Places data sets but can rarely be managed as distinct entities. Factual now manages a table of chains which connects directly to our Places: developers can query by explicit chain ID to get the complete list of our first 150 authoritative chains from our partners Location3 and Universal Business Listings (many more coming) that connect to over 333K places. We also have an additional 775 auto-generated chains produced by machine clustering — these are experimental and won’t have the same coverage or precision, so experiment with care. We’re testing these features out in the US before expanding globally — see more on chains here.

Factual Place Rank

With over 23MM Places in the US, developers of Local applications often find that there are too many records to present to the user, and it is difficult to filter those most meaningful for your app. Factual Place Rank aims to provide a relative metric by which developers can sort places by their informatic and social footprint, to ensure the most prominent places rise to the top of the stack. We’re using Factual Place Rank as the default ranking for searches — the feature is in beta so we’re testing it in the US only. See more on Factual Place Rank and all Global Places Attributes here.

Going Global

Taken together, these changes are not insignificant and could bork existing code. We’re therefore releasing this US dataset as a new, versioned resource. We’ll follow with new revs of our US Restaurants and US Hotels data. All other countries will follow shortly, and this will become the production Global Places dataset. We’ve posted a migration overview online that describes the changes in more detail and helps you minimize disruption.

We’ve been working on these features for some time and it’s great to be getting them out the door. We’ll have a second, follow-on announcement on further features in a few weeks, so stay tuned.

-Tyler Bell
Our Data is Singular