I recently read a couple of articles that focused on the data collection practices of businesses, where the moral of the story was that you should be collecting all the data you possibly can, even if you don’t need it because you never know what you’ll need in future. This is the popular perspective of a significant portion of the data community, which naturally has transferred to the world of APIs through a natural association.
While this might be tempting, and even seem logical at times, I recommend you stop and think about it deeply. The NSA is employing the approach, and leading tech companies like Google, Facebook, and others are thinking in similar ways. Pretty much saying that if you have all the data, you will have all the knowledge–something that really hasn’t ever been proven, remaining a constant fantasy of technologists.
Imagine the person who obsessively collects everything, thinking some day it will be valuable. Often times this is harmless if some of it contained hazardous material (ie. mercury, lead) that may have been considered safe at one point, but now you have large quantities of it–not good, and costly implications. Imagine if, at some point, you cross over some public zoning, safety, and other regulatory areas, without knowing it. Consider how the world has shifted and changed in the last 50 years, and how rapidly things have “seemingly” changed in the last 20 years, when it comes to public opinion–what if opinions on data gathering practices change drastically in the near term future?
With the NSA, and leading tech companies behaving pretty badly with their data collection strategy, pushback from other countries, companies, institutions, and the average citizen has already begun. Do you really want to have EVERYTHING stored in your data warehouses when this happens? Data you can’t actually verify that you need actually operate your business? What will your customers, partners, and shareholders think? What will public opinion be of your brands?
I haven’t even touched on the security concerns of storing all of this way of data gathering. There are numerous very serious considerations on the table, that should always be included in decision around just exactly what data we gather, store, and what we should just let be lost in the layers of time.