Announcing Two New Attributes for Global Products: Manufacturer and Average Price

Today we’re excited to announce the latest update to our Global Products dataset.  This release includes the addition of two new attributes that many of our partners have been asking for: manufacturer and average price.

Manufacturer

The connections between brands and manufacturers may sometimes seem like a complicated web.  By tying together the many brands and products that fall under a single manufacturer or parent company, we’ve made it easier to generate insight about these connections, and about the consumer products marketplace as a whole.

For instance, getting a list of the brands owned by Unilever is now as simple as making a facet call to our API:

http://api.v3.factual.com/t/products-cpg/facets?select=brand&filters={“manufacturer”:”unilever”}
preview the data and view the faceted list of brands on the right

Getting a list of products the fall under Unilever brands is just as simple:

http://api.v3.factual.com/t/products-cpg?filters={“manufacturer”:”unilever”}
preview the data

We’ve started by providing manufacturer data for more than 20 of the top CPG manufacturers and we’ll be rolling out data for additional manufacturers in the near future.

Average Price

When shopping for products online or in a store, it can be difficult to know if you’re getting a good deal.  We’ve eliminated some of the guesswork by providing the average price for the most commonly searched for products.

In addition to getting the average price for a particular product, you can also filter API results by price.  For instance, to find shampoo with an average price under $5, you can use the following query:

http://api.v3.factual.com/t/products-cpg?q=shampoo&filters={“avg_price”:{“$lte”:5}}
preview the data

If you’re the kind of person who likes to spend a little more on their hair care, you can easily get a list of the most premium hair care brands by combining a facet call with an average price and category filter:

http://api.v3.factual.com/t/products-cpg/facets?select=brand&filters={“$and”:[{"avg_price":{"$gte":30},"category":{"$bw":"hair"}}]}
preview the data and view the faceted list of brands on the right

Summary

We hope you find the addition of these attributes useful and we can’t wait to see what you build with them.  You can explore all our product data, including our new attributes, by visiting our shiny new Data Preview.  Visit our Global Products page to learn more, sign up for an API key, and find links to documentation.

As always, retailers and manufacturers interested in including their product data and links in our Global Products dataset can visit our Merchant Partners page for more information.

John Delacruz
Product Manager, Factual Global Products

Increasing Transparency with Data Preview 2.0

Here at Factual, we have always tried to make our data as accessible as possible. We believe that a key part of accessibility is transparency. After all, data is not really accessible if prospective consumers can’t easily evaluate it. To this end, our data has always been publicly available to explore on our site through our data preview tool. Today we announce Data Preview 2.0 – a set of usability improvements and new features that make it significantly easier for anyone to explore our data and understand what exactly we offer prior to consuming it. Here is a quick tour:

Data Preview has 5 main areas:

  1. Full-Text Search – runs a full text-search across every attribute in the data
  2. Filters – allow you to filter the data down to the exact data you wish to see
  3. Searchable Map – allows you to both visualize geodata and search data via the map
  4. Data Table – shows you the actual data, row by row, that meets your filter criteria
  5. Facets – displays row counts faceted by major attribute values and provides a shortcut to filter by specific facet values

 

Let’s say I was interested in building an app focused on coffee shops, so I needed Places data on coffee shops. I ran a full-text search for “coffee” in the Global Places data preview. I don’t have a great sense for how many coffee shops there are globally, but it’s good to know my app will be useful around the world.  I live in West LA, so I filtered the data down to the cities of Los Angeles, Santa Monica, and Culver City in California – an area I’m much more familiar with.

 

I used the map to zero in on Century City and so I could QA the data around my office and look at specific shops that I know exist.  Since my app is much more about smaller coffee shops than big chains, I also filtered out all Starbucks locations.

Now I can see how every coffee shop around my office is represented in Factual data, and I can develop a level of trust prior to using it.

I hope these improvements to Data Preview make it easier for you to examine our data and quickly decide if Factual data meets your needs. So please, dig into our data! Our Global Places data covers 64 million local businesses and other points of interest in 50 countries, with deep attributes on Restaurants, Hotels, and Healthcare Providers.  Our Global Products data covers over 650,000 consumer packaged goods with food nutrition info and ingredients data.  If you like what you see, sign up for an API key or request a download.

 

 

-Vikas Gupta, Marketing / Operations

PHP Driver 5.3 Released

We’ve released the latest version of the Factual PHP Driver: this includes enhanced features within our new Submit and Flag APIs, including clear and strict mode, as well as a number of new integration tests to make our collective lives easier during installation and testing. Diffs are now supported, and we also added a cheeky header to facilitate using the driver in a WordPress Plugin. The changelog is here.

Don’t feel pressured to upgrade if you are using a previous version of the API. However, if you are writing data to Factual — it’s both fun and rewarding! — this driver gives you a bit more finesse over how you engage with the service. Same if you are using Diffs or other advanced features.

Of course the PHP Driver is one of the (outrageous) ten languages we support, including:

We’re keen to hear from all PHP developers using the service. Please submit requests and bugs to our PHP Driver Issue List or bang us a missive at http://support.factual.com.

-Tyler Bell, Factual PHP Guy

Introducing Drake, a kind of ‘make for data’

Processing data can be a real a mess!

Here at Factual we’ve felt the pain of managing data workflows for a very long time. Here are just a few of the issues:

  • a multitude of steps, with complicated dependencies
  • code and input can change frequently – it’s tiring and error-prone to figure out what needs to be re-built
  • inputs scattered all over (home directories, NFS, HDFS, etc.), tough to maintain, tough to sustain repeatability

Paul Butler, a self-described Data Hacker, recently published an article called “Make for Data Scientists“, which explored the challenges of managing data processing work. Paul went on to explain why GNU Make could be a viable tool for easing this pain. He also pointed out some limitations with Make, for example the assumption that all data is local.

We were gladdened to read Paul’s article, because we’d been hard at work building an internal tool to help manage our data workflows. A defining goal was to end up with a kind of “Make for data”, but targeted squarely at the problems of managing data workflow.

Introducing ‘Drake’, a “Make for Data”

We call this tool Drake, and today we are excited to share Drake with the world, as an open source project. It is written in Clojure.

Drake is a text-based command line data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs.  It automatically resolves dependencies and provides a rich set of options for controlling the workflow. It supports multiple inputs and outputs and has HDFS support built-in.

We use Drake at Factual on various internal projects. It serves as a primary way to define, run, and manage data workflow. Some core benefits we’ve seen:
    • Non-programmers can run Drake and fully manage a workflow
    • Encourages repeatability of the overall data building process
    • Encourages consistent organization (e.g., where supporting scripts live, and how they’re run)
    • Precise control over steps (for more effective testing, debugging, etc.)
    • Unifies different tools in a single workflow (shell commands, Ruby, Python, Clojure, pushing data to production, etc.)

Examples

Here’s a simple example of a Drake workflow file with three steps:

;
; Grabs us some data from the Internets
;
contracts.csv <-
  curl http://www.ferc.gov/docs-filing/eqr/soft-tools/sample-csv/contract.txt > $OUTPUT

;
; Filters out all but the evergreen contracts
;
evergreens.csv <- contracts.csv
  grep Evergreen $INPUT > $OUTPUT

;
; Saves a super fancy report
;
report.txt <- evergreens.csv 
  linecount = len(file("$[INPUT]").readlines())
  with open("$[OUTPUT]", "w") as f:
    f.write("File $[INPUT] has {0} lines.\n".format(linecount))

Items to the left of an arrow ( <- ) are output files, and to the right of an arrow are input files. Under the line specifying inputs and outputs is the body of the step, holding one ore more commands. The command(s) of a step are expected to handle the input(s) and produce the expected output(s). By default, Drake steps are written as bash commands.

Assuming we called this file workflow.d (what Drake expects by default), we’d kick off the entire workflow by simply running Drake in that directory:

$ drake

Drake will give us a preview and ask us to confirm we know what’s going on:

The following steps will be run, in order:
  1: contracts.csv <-  [missing output]
  2: evergreens.csv <- contracts.csv [projected timestamped]
  3: report.txt <- evergreens.csv [projected timestamped]
Confirm? [y/n]

By default, Drake will run all steps required to build all output files that are not up to date. But imagine we wanted to run our workflow only up to producing evergreens.csv, but no further. Easy:

$ drake evergreens.csv

The preview:

The following steps will be run, in order:
  1: contracts.csv <-  [missing output]
  2: evergreens.csv <- contracts.csv [projected timestamped]
Confirm? [y/n]

That’s a very simple example. To see a workflow that’s a bit more interesting, take a look at the “human-resources” workflow in Drake’s demos. There you’ll see a workflow that uses HDFS, contains inline Ruby, Python, and Clojure code, and deals with steps that have multiple inputs and produce multiple outputs. Diagramed, it looks like:

As our workflows grow complicated, Drake’s value grows more apparent. Take target selection for example. Imagine we’ve run the full workflow shown above and everything’s up-to-date. Then we hear that the skills database has been updated. We’d like to force a rebuild of skills and all affected dependent outputs. Drake knows how to force build (+), and it knows about the concept of downtree (^). So we can just do this:

$ drake +^skills

Drake will prompt us with a preview…

The following steps will be run, in order:
1: skills <- [forced]
2: people.skills <- skills, people [forced]
3: people.json <- people.skills [forced]
4: last_gt_first.txt, first_gt_last.txt <- people.json [forced]
5: for_HR.csv <- people.json [forced]
Confirm? [y/n]

… and we’re off and running.

But wait, there’s more!

Drake offers a ton more stuff to help you bring sanity to your otherwise chaotic data workflow, including:

  • rich target selection options
  • support for inline Ruby, Python, and Clojure code
  • tags
  • ability to “branch” your input and output files
  • HDFS integration
  • variables
  • includes

Drake’s designer gives you a screencast

Here’s a video of Artem Boytsov, primary designer of Drake, giving a detailed walk through:

Drake integrates with Factual

Just in case you were wondering! Drake includes convenient support for Factual’s public API, so you can easily integrate your workflows with Factual data. If that interests you, and you’re not afraid to sling a bit of Clojure, please see the wiki docs for the Clojure-based protocol called c4.

Drake has a full specification and user manual

A lot of work went into designing and specifying Drake. To prove it, here’s the 60 page specification document. The specification can be downloaded as a PDF and treated like a user manual.

We’ve also started wiki-based documentation for Drake.

Build Drake for yourself

To get your hands on Drake, you can build it from the GitHub repo.

All feedback welcome!

If you’re a wrangler of data workflows, we hope Drake might be of some use to you. Bug reports and contributions can be submitted via the GitHub repo. Any comments or questions can be submitted to the Google Group for Drake.

 

Go make some great workflows!

 

Sincerely,

Aaron Crow

Software Engineer at Factual

How to build a .NET Web API app that uses Factual’s C# Driver

The ASP.NET Web API provides a nice platform for building RESTful applications on the .NET Framework. Factual’s C# driver makes it easy to use Factual’s public API in the .NET world.

We recently asked Sergey Maskalik, the author of Factual’s C# driver, to bring these two things together and create a tutorial showing .NET developers how to build a Web API application that makes use of Factual.

The result is the aptly titled, “ASP.NET Web API with Factual Driver Example” wiki page.

If you’re a .NET developer, we hope you find this tutorial useful.

Have fun!
Sincerely,
Aaron
Software Engineer