“reaper (rē’pər) n. 1. One that reaps, especially a machine for harvesting grain”
“grain (grān) n. 13b. An essential quality or characteristic.” – answers.com
There’s been a lot of FUD spreading around the net about a “filter bubble.” In my year-end review of TED videos, I even noted that Eli Pariser gave a talk on the topic in that highly regarded forum. As someone deeply involved with designing and implementing one instance of such a filter, I’d like to send a simple message – don’t fear the filter. After all, you’ve been living in a filtered media world since you were born – it was just filtered by humans, not algorithms. Neither humans, nor the algorithms they design, are infallible, and the assumption that humans will necessarily provide better curation of content is not only an untested hypothesis, but completely disregards the impracticality of an only-human approach to the information glut we now face. While there are legitimate reasons to be concerned, the important thing to realize is that the designers of the algorithms are already acutely aware of these potential problems, and attempt to address them when solving the very hard problem of algorithmic content curation.
For example, Pariser expresses deep concern about the “invisible editing” done “without my permission” that results in “showing us what it thinks we want to see instead of what we need to see”. Of course, he’s talking about “the internet” (or, more precisely, the major players in content on the internet), but those same statements could equally apply to the editorial staff in any traditionally run media outlet. Yes, humans are typically a better judge of quality and it’s true that algorithms don’t yet have the embedded ethics that editors have. But imagine if we lived in a world where every time you needed to find something on the internet, you had to rely on a human to find it for you instead of a search engine. In reality, the internet has been in a “filter bubble” ever since the first web search engines came online over 15 years ago. And while it’s true that more and more content is being run through filters, that’s because the naive approach to using humans doesn’t scale, and the amount of information is certainly growing.
Unfortunately, the typical characterization of personalization technology, i.e. that the algorithms are mainly looking at what you click on first, is a gross simplification that doesn’t do justice to the complex algorithms used in the field of content recommendation. Yes, clicks are a tremendously important input signal to the algorithms, but they are by no means the only signal and most importantly, they are not sufficient in and of themselves to build a system that could compete with the likes of a human editor. And yes, it is often the case that what the algorithm is trying to optimize is clicks. But not always, and not only. More importantly, any decent algorithm has to take into account the “wisdom of the crowds” – in other words, it’s not just what you click on, but what others click on, especially others that are similar to you in some way (your friends, followers, or other cohorts, however that’s defined). So, in reality, there are humans in the mix.
At the core, the concern seems to be about what Pariser calls the “self-looping and fragmenting effects” that can result from the use of learning too well what a person likes. But any Machine Learning practitioner and filter builder is so acutely aware of this effect (variously known as “overfitting” or “local optima” or “explore vs. exploit” or by a number of other technical terms in a myriad of flavors depending on what particular aspect of the problem you are looking at), that it hardly even bears mentioning. It’s kind of like asking a truck driver to make sure he doesn’t run out of gas on his cross-country trip. Those of us who work on this are always trying to avoid getting “stuck in a rut” by making sure we throw in enough variety and diversity to promote discovery.
At Yahoo (where I currently work), we do take a hybrid approach, at many levels, of incorporating humans into the algorithm. But we are also always looking for ways to take humans out of the loop when that makes sense, which is often. We can’t possibly serve up relevant content to hundreds of millions of people across the globe without some big data science and heavy lifting done by machines.
To their credit, the filter-fear-mongers do make a few points that I particularly like – for example, the suggestion that the algorithms need to be transparent enough and that people need control over how it works. This part is particularly challenging for those of us doing Machine Learning for recommendation, primarily because the techniques we use are often not readily amenable to transparency and explicit control. Regardless, I certainly agree – although the vast majority of people will be more than happy with the default settings of the algorithm, we always need, like in Star Trek, a manual override. Presenting the user with the thousands of words, phrases, and other features we use to model their preferences just won’t work (especially after we’ve projected their “features” into a low-dimensional subspaces in an effort to divine their latent semantics).
I’m mostly not worried about filters because their designers will naturally be kept on their toes by the people using the filters. If your filter only suggests a narrow range of content, then people will stop using it or at least complain. Pariser used the example having his conservative friends filtered from his Facebook stream. Well, he wasn’t the only one to complain, and Facebook re-instituted the option to get an unfiltered time-sorted newsfeed.
In the end, you are your own best filter. As information gets easier and easier to publish, and as it gets more central to all of our lives and careers, the volume becomes unmanageable, and it behooves us to become active in our quest for relevant information, and not just swallow whatever the “big guys” decide to publish. Filters will be one of the indispensable tools for helping us do just that.