Two Additions to Factual’s Advisory Board

We are excited to announce that Clare Hart and Jason Rosenthal have joined our Advisory Board.

Clare Hart is an incredibly accomplished executive with an extensive history of leading data and information-intensive companies.  Most recently, she was the President and CEO of Infogroup where she led the company in development and execution of its first strategic plan to stabilize, turnaround, and transform the company.  Prior to that, she served as the Executive Vice President, Dow Jones & Company and President Dow Jones Enterprise Media Group, and before that she was President and CEO of Factiva.

“I am very excited about working with the Factual team.  The company represents the next generation of data aggregation and distribution. Using modern technology to compile massive amounts of content and providing simple APIs for easy access to data, Factual will change the game in global content aggregation.”

-Clare Hart

Jason Rosenthal is a seasoned technology executive with an admirable track record at a series of highly successful companies.  He’s held senior operating roles at AOL, Netscape, Loudcloud, Opsware, and HP.  Most recently he was EVP of Products at Glam Media, which he joined through the company’s acquisition of Ning where he served as CEO.

“Factual has become the center of an incredibly impressive network of companies consuming and providing data. They have unparalleled opportunities ahead of them to disrupt traditional data businesses and enable a tremendous amount of innovation by making data available to developers and enterprises. I am honored to have the opportunity to work with such a talented team in this exciting market.”

-Jason Rosenthal

The expertise and experience that Clare and Jason brings will be invaluable in helping us shape our strategy and tackle the incredible opportunities we have in front of us.  We have been working with them in a more informal capacity for the past few months and they have already proven to be incredible resources.

 

-Gil Elbaz

Updates to the Factual iOS SDK

Factual would like to announce the updated release of our officially supported SDK for iOS!  This update includes support for the many new API features Factual has to offer. Here are some of the main updates:

  1. New API support:
    • Resolve: starts with incomplete data for an entity, and returns potential entity matches
    • Facets: returns row counts for Factual tables, grouped by facets of data
    • Geopulse: provides point-based access to geographic attributes
    • Reverse Geocode: converts coordinates into addresses
    • Monetize: finds affiliate deals for places in Factual’s Global Places database
    • Match: maps your data to Factual places
    • Submit: submits edits to existing rows, or submits new rows of data to Factual tables
    • Flag: flags problematic rows in Factual tables
  2. Opened up the source code (accessible via the main project on github).
  3. Updated the sample iOS project to use this new SDK and illustrate new features.

Under the hood, we’ve also made following improvements:

  1. Removed dependency on OAConsumer Library.
  2. Switched to Automatic Reference Counting.

Setup

Simply follow the directions on our github page to get the latest version and start using it in your project.

You can also reference the demo project as a working example for incorporating Factual data in your own application, or use it as a starter to immediately begin work on your new application.

Example

Here’s a quick example of how to use the driver. First, create an authenticated handle to Factual:

FactualAPI* _apiObject = [[FactualAPI alloc] initWithAPIKey:@"yourkey" secret:@"yoursecret"];

Next, create the request of your choice, and fetch the results. For example, let’s say you want to find all Apple stores within 1000 meters of the Factual office:

    FactualQuery* queryObject = [FactualQuery query];
    [queryObject addRowFilter:[FactualRowFilter fieldName:@"name"
                                                  equalTo:@"Apple Store"]];
    CLLocationCoordinate2D coordinate = {34.06018, -118.41835};
    [queryObject setGeoFilter:coordinate radiusInMeters:1000];
 
    [_apiObject queryTable:@"places" optionalQueryParams:queryObject withDelegate:myDelegate];

Results from API calls are provided asynchronously via the the passed in delegate. Retrieve and print the results in the delegate implementation:

-(void) requestComplete:(FactualAPIRequest *)request receivedQueryResult:(FactualQueryResult *)queryResult {
     for (id row in queryResult.rows) {
        NSLog(@"Row: %@", row);
     }
}

For more examples on using the SDK with the various Factual APIs, please take a look at the tests in our github project.

We hope you find these improvements useful, and feel free to let us know if you end up using the driver for any new iOS applications, as we’re always excited to see ways that people are using our data.  Let us know if you have any questions or suggestions, and we hope you’ll enjoy using Factual in your mobile applications!

New to Factual?  Explore our data and APIs and sign up to get started.

Best wishes,
Brandon Yoshimoto
Software Engineer

Eva Ho Speaking at DataWeek 2012

Eva Ho, our VP of Marketing and Business Operations, will be speaking on a panel tomorrow at DataWeek titled “Challenges of Building on Geo Data”.  She will be joined by Ben Standefer (Urban Airship), Ankit Agarwal (Micello), and Peter Davie (TomTom).  It will be from 12:00 pm to 12:40 pm in the SPUR Urban Center, 4th floor, Room 2.  The session description is:

Why does there need to be dozens of sources of unique types of geo data? If so much geo data is free and public, why are we witnessing the emergence of technologies that do nothing but make it easier for us to find and integrate geo data? This panel will seek to understand why IP data, place data, venue data, or check-in data all have unique geo data challenges that require a diverse array of geo data providers.

 

New US Release & Enhancements to Global Places

We are very pleased to announce a number of significant enhancements to our US Places dataset soon to be followed by similar improvements to the rest of the world.

Enrichments to US Data

We’ve added a boatload of new entities to the US including 80K landmarks (parks, memorials, historic buildings, and other monuments), 25K transport hubs (airports, rail stations and a handful of ports), and 190K new ATM locations. We’ve also included over 50 million additional references and edits from our partners to improve both coverage and accuracy. This brings us up to just over 23 million entities in the US alone, and over 63 million places in 50 countries worldwide.

Category Enhancements Globally

Our categories have taken an increasingly central role in the distribution and management of our data, so we’ve made our categorization framework more friendly to humans and more efficient for machines. These improvements include:

  • 50 new categories for better, more granular classification
  • Numeric category IDs for more structured search and data management
  • Category translations available in Italian, German, French, Spanish, Korean, Japanese, and Chinese

We’ve made the entire category hierarchy available as a Factual table so you can query it in all languages, and also made it available as a JSON file on Github to facilitate baking in client-side category logic. See more information on categories here.

150 Chains

Chains — stores representing both local and national brands — are often included in Places data sets but can rarely be managed as distinct entities. Factual now manages a table of chains which connects directly to our Places: developers can query by explicit chain ID to get the complete list of our first 150 authoritative chains from our partners Location3 and Universal Business Listings (many more coming) that connect to over 333K places. We also have an additional 775 auto-generated chains produced by machine clustering — these are experimental and won’t have the same coverage or precision, so experiment with care. We’re testing these features out in the US before expanding globally — see more on chains here.

Factual Place Rank

With over 23MM Places in the US, developers of Local applications often find that there are too many records to present to the user, and it is difficult to filter those most meaningful for your app. Factual Place Rank aims to provide a relative metric by which developers can sort places by their informatic and social footprint, to ensure the most prominent places rise to the top of the stack. We’re using Factual Place Rank as the default ranking for searches — the feature is in beta so we’re testing it in the US only. See more on Factual Place Rank and all Global Places Attributes here.

Going Global

Taken together, these changes are not insignificant and could bork existing code. We’re therefore releasing this US dataset as a new, versioned resource. We’ll follow with new revs of our US Restaurants and US Hotels data. All other countries will follow shortly, and this will become the production Global Places dataset. We’ve posted a migration overview online that describes the changes in more detail and helps you minimize disruption.

We’ve been working on these features for some time and it’s great to be getting them out the door. We’ll have a second, follow-on announcement on further features in a few weeks, so stay tuned.

-Tyler Bell
Our Data is Singular

Advice I’d Send Myself Before Starting My Machine Learning Internship at Factual

I spent this summer as a Data Specialist Intern at Factual, and was tasked with improving our Global Places categorization. Factual employs a wide variety of strategies at every stage of its data pipeline, and categorization is just one part of that. To clarify, every Factual Place belongs to one category from our 400+ node taxonomy. My job was to ensure that the existing process was producing data of high quality, and explore alternate means of improving category accuracy and coverage. Here are some things I wish I’d known before I started out.

  1. Ask for help first. If you’re not the sole proprietor of every piece of code at your company, then there are things you don’t understand how to do, and someone else can do faster (read: cheaper). So if there’s something you need, like data faceted in a special way, or a Ruby script that interfaces nicely with a step in the workflow, it’s a good bet that a coworker has that exact piece of code sitting somewhere in their personal repository.
  2. The documentation is your best friend. I spent a full day trying to find decent documentation for the open source Apache Mahout project. It’s simply not out there. By documentation, I mean walkthroughs and explanations that take place in a context, as well as the standard javadocs list of method headers. All the meticulously optimized algorithmic libraries in the world are useless to someone who can’t figure out how they should format their input data. (Disclaimer: I take full responsibility for my own Hadoop ineptitude; this bullet is meant as a journal entry for my personal experience, not as a polemic against Mahout.) On the other side of this comparison is Python’s scikit-learn. It’s built on top of the meticulously documented Numpy and Scipy libraries for scientific computing in Python, so if you can figure out how to get your data into a Numpy array, you’re good to go.
  3. Know the ecosystem. Working in Python instead of Java/Hadoop was a great decision, both in terms of the rapid prototyping virtues of the former, and my familiarity with the underlying data structures. If my work gets ported to our Hadoop cluster, it’ll be by someone who knows what they’re doing on that platform.
  4. Machine Learning applications are mostly the boring stuff. The most  “machine-learny” code I wrote all summer was two lines of Python, initializing and then fitting a model. The majority of the effort is in pre-processing: getting from the raw data to a standardized, plain-text dataset, to an array of 1s and 0s. There are still interesting problems to be solved here, of course. For example, I employed the hashing trick, which relies on some high-level probabilistic linear algebra to guarantee solutions that are “close enough.”
  5. Save the ML for the problems you can’t think to solve in any other way. Through a combination of hand curation and careful translation into our taxonomy, Factual gets incredibly far before applying any machine learning at all. A statistical approach is only as good as the data it’s given, which in this case is produced by a sophisticated and deterministic workflow. Before plugging into a fancy stochastic optimizer, make sure you’ve done everything possible to improve your inputs.
  6. Coding in R makes you feel like a ninja. The R core library is full of awesome one-liners waiting to happen, and is built around a set of vectorized operations, including the greatest thing since the static compiler: filtering by a vector of booleans. There’s basically nothing you can’t do in R with liberal application of implicit looping functions, and it makes even the boring stuff feel pretty cool.


I’m sure if I’d known all these things before I started, I’d be able to come up with a totally different list. Working at a startup like Factual was a great way to get quickly up to speed with a bunch of different tools, and also to be consistently challenged to expand and apply my repertoire in new ways. That holds for every single person at Factual. They work hard in unfamiliar territories because the task demands it. If there’s a driving principle behind Factual, it’s this:

 

  1. You can never have enough good data.