That Personal Big Data is Mine



Like Robert X. Cringely, I have partial email archives going back as far as the late 1980’s.  And while that may seem amazing to some, the more amazing thing is that I can bring those data files up in almost any email client without even the hint of a hiccup.  I have even, on rare occasion, replied to a handful of these ancient messages, and received back replies.  And of course some of these ancient mails, like those to and from loved ones who have passed away, are priceless.  For the better part of its existence, though, my email has lived in “the cloud,” which is just the trendy way of saying the master copy of the data is stored and maintained by some company on their servers.  It is immensely useful, allowing cross-device access and syncing.  The funny thing is, though, that periodically I download a copy to my personal computer and back it up.  I’m sure I’m not the only person who engages in this behavior, and there are almost certainly even more who wish they had when they switch email providers.  It reflects the reality that I feel that this data is “mine” and that my provider is just providing a hosting service.  And because the email format is a well defined standard, my data is to a large extent portable – at least in the sense that I can take a copy and switch to using whatever service I want to, including hosting or writing one myself.

Email is one of the first, and simplest protocols on the Internet.  It is because of its simplicity (simple text messages are sent and received, in an easy to understand, text based format) that it is so durable.  Of course, the email protocol is limited to a point-to-point communications paradigm, so it doesn’t cover the full range of media types and actions necessary for storing and manipulating general purpose digital artifacts.  As the world went online in the 1990’s, the need for such protocols gave rise to such standards as HTML, HTTP, XML, XSL, etc. etc.  But all of these standards and technologies were primarily adopted by application service providers and hidden, for the most part, from the typical end user.

As an example, I’m writing this post using an app called Evernote.  It lets me do something similar to what I do with email, namely store and manipulate my data (in this case, notes) in the cloud from any device.  I can write one sentence on my tablet, review it on my phone, and continue editing later on a laptop.  Under the covers it might be storing this document as HTML, XML, or who knows what.  I hope that someday Evernote will at least allow me to export to a common format, but I have no guarantee of that.  To be fair, to a large extent the technical community has adopted a common set of data formats that make porting data between different operating systems and applications possible, if painful.  (Some time ago I read an article about the National Archives – how they have to constantly convert data from old media to new, and from old formats to new, to ensure they are still able to access the knowledge locked up in those digital bits.  This is a very expensive, time-consuming process.)

Yet, as cloud computing becomes more commonplace, it seems like we’re moving in the opposite direction, with more and more of our personal data living behind APIs in non-extractable, non-exportable, and fundamentally non-accessible formats.  The most personal of that personal data are the things we share on social networks – photos, comments, links – a virtual diary of our lives (which Facebook has so aptly capitalized on by turning it into an actual Timeline).  Another very important source of personal data are medical records, and the list goes on. Few will deny the importance and inevitability of the cloud as storage device for data.  But we need to make sure that we own our own data.  And as long as it is locked up behind APIs, or only exportable in byzantine, non-standard formats, then it’s not really ours.

That’s why we need an open protocol for social networking, similar to the one the Diaspora team has been working on for the last couple years.  But this needs to be adopted by all the big social networking players.  As I was writing this post, Chris Dixon summed it up best in a tweet: “There would be vastly more innovation and valuable companies created if micro-messaging and the social graph were open protocols.”  But I would go one step further, and say we need to actively seek out other areas of personal data that live in the cloud and make them portable by adopting standards and protocols for the most common use cases.  For example, there should be no reason that I can’t take my Facebook photos and decide to host them on Flickr or Picasa or my own home-brewed photo sharing system that I host myself, and then if I decide to, move them back to Facebook.

Because as we help ASPs build up their walled gardens, there has to come a point where they recognize the value we have created for them, and let us migrate elsewhere if we find another garden that suits us better, or even take up shop in our own backyard.  This will have the added benefit of incenting them to keep their gardens in order, and move us back to the open ecosystem first envisioned by the pioneers of the internet.