I was just reading Setting Government Data Free with ScraperWiki from ProgrammableWeb. It led me to start playing with ScraperWiki:

  • Scraper: a computer program that copies structured information from webpages into a database
  • ScraperWiki: a website where people can write and repair public web scrapers and invent uses for the data

With 20 years of experience with databases, I have love of data and tools that make data more accessible. Scraping is an important tool in the liberation of data from web-based sources.

The data often is restricted by a lack of skills or resources by the owning party. They just don’t have time or the understanding on how to publish the data so it is easily accessed and consumed.   Other times they may purposely make it difficult to access.

I have a pretty mature set of scraping scripts that allow me to pull web pages, consume, iterate and parse the content. I then store as XML files on Amazon S3, relational tables in Amazon RDS or key-value pairs in Amazon SimpleDB.

I like what ScraperWiki is doing with not only democratizing data, but democratizing the tools and places to store the data. There is a lot of work to be done in liberating data for government, corporate and non-profit groups.  We need all the people, tools, and standard processes we can get.

Advertisements