A long time ago, in a galaxy far away, I became interested in collaborative filtering. Well, it was five years ago, and I was at UC Berkeley, but it seems like eons ago.
What is collaborative filtering? Technically, it’s an algorithm for matching people with similar interests for the purpose of making recommendations. In non-technical terms, it’s a system for helping people find relevant content. Unlike search, where you parse a query to and the most relevant content, with collaborative filtering you find some way of gauging an individual’s interest in content, and then recommend what other similar users liked.
About 3-4 years ago there was a lot of excitement about collaborative filtering systems – it was going to be the next big thing. There were conferences, and workshops and mailing lists. Papers were published regularly. And of course there were a bunch of startups. Collaborative Filtering was going to help us with ecommerce (Firefly); find music (Music Buddha, Media Unbound), Movies (Netflix, Movie Lens), jokes (Jester at UC Berkeley). And it did yield some well known successes – some prominent ones are Amazon’s recommender system, Netflix’s Movie Recommender System (the best example of a Recommender system, IMO), and TIVO’s recommender system. But it did not become as ubiquitous and popular as its proponents (including me) hoped. Apart from other issues, its hard to user input, the interface for recommender systems is hard to get right (more on that later).
Lately, I have been getting a feeling of deja vu, encountering collaborative filtering in more and more conversations. And often in the context of tagging. Yesterday, I had dinner with two people from yet another startup that uses tagging and collaborative filtering in the same sentence. Sounds good – those are two concepts that I know a little about. And I wish I could say that I had immediately seen the connection between the two. No, as usual, it took seeing the two words in close proximity about a million times for me to have the “aha” experience.
But I think I do finally understand why collaborative filtering is being dragged out of closets, dusted and prettied up in this new world of tags, Web 2.0 and Long Tail. One of the roadblocks to collaborative filtering is user input, some expression of interest by a user that you can hook into. Tags provide such a hook. On the other hand, tags desperately need good ways of supporting findability. As I argued before, you can go only so far with lists. Which is why we are seeing interests in clusters, facets and collaborative filtering. Additionally, both tags and collaborative filtering provide inroads into the Long Tail.
So are tags and collaborative filtering a marriage made in heaven? It’s a promising approach, but there are challenges in making it work – some challenges inherent to tagging, and other challenges inherent to collaborative filtering. The question is – does the combination of tagging and collaborative filtering solve those challenges, or only make it worse? I don’t have any magic bullets (if you were looking for an easy answer, stop reading now.) Instead I have a bunch of observations in no particular order.
To start, let’s try to understand collaborative filtering systems. Collaborative filtering algorithms generally form the backbone of recommender systems (e.g. “Amazon’s Other users who liked this also liked”). But Recommender systems do not necessarily use collaborative filtering algorithms. For example, Pandora (a music recommender system) matches the profile of music you like to find other music you might also like (at the backend its using some type of sophisticated content filtering). (In the discussion below, I will use Recommender Systems to refer to systems that offer recommendations of stuff to users, and use Collaborative Filtering Systems if they use Collaborative Filtering algorithms for making those recommendations.)
Broadly speaking, there are three main components to a Recommender System.
First the input:: A user needs to provide some type of input to the system. The question is how: Do you ask them to rate stuff (explicit input) or watch what they are doing in some (implicit indications of interest). Explicit indications are often a good way to start with recommendations, but you have to convince the user to make this effort.
The second component is the algorithm or what