Collaborative Filtering strikes back (this time with tags)

A long time ago, in a galaxy far away, I became interested in collaborative filtering. Well, it was five years ago, and I was at UC Berkeley, but it seems like eons ago.

What is collaborative filtering? Technically, it’s an algorithm for matching people with similar interests for the purpose of making recommendations. In non-technical terms, it’s a system for helping people find relevant content. Unlike search, where you parse a query to and the most relevant content, with collaborative filtering you find some way of gauging an individual’s interest in content, and then recommend what other similar users liked.

About 3-4 years ago there was a lot of excitement about collaborative filtering systems – it was going to be the next big thing. There were conferences, and workshops and mailing lists. Papers were published regularly. And of course there were a bunch of startups. Collaborative Filtering was going to help us with ecommerce (Firefly); find music (Music Buddha, Media Unbound), Movies (Netflix, Movie Lens), jokes (Jester at UC Berkeley). And it did yield some well known successes – some prominent ones are Amazon’s recommender system, Netflix’s Movie Recommender System (the best example of a Recommender system, IMO), and TIVO’s recommender system. But it did not become as ubiquitous and popular as its proponents (including me) hoped. Apart from other issues, its hard to user input, the interface for recommender systems is hard to get right (more on that later).

Lately, I have been getting a feeling of deja vu, encountering collaborative filtering in more and more conversations. And often in the context of tagging. Yesterday, I had dinner with two people from yet another startup that uses tagging and collaborative filtering in the same sentence. Sounds good – those are two concepts that I know a little about. And I wish I could say that I had immediately seen the connection between the two. No, as usual, it took seeing the two words in close proximity about a million times for me to have the “aha” experience.

But I think I do finally understand why collaborative filtering is being dragged out of closets, dusted and prettied up in this new world of tags, Web 2.0 and Long Tail. One of the roadblocks to collaborative filtering is user input, some expression of interest by a user that you can hook into. Tags provide such a hook. On the other hand, tags desperately need good ways of supporting findability. As I argued before, you can go only so far with lists. Which is why we are seeing interests in clusters, facets and collaborative filtering. Additionally, both tags and collaborative filtering provide inroads into the Long Tail.

So are tags and collaborative filtering a marriage made in heaven? It’s a promising approach, but there are challenges in making it work – some challenges inherent to tagging, and other challenges inherent to collaborative filtering. The question is – does the combination of tagging and collaborative filtering solve those challenges, or only make it worse? I don’t have any magic bullets (if you were looking for an easy answer, stop reading now.) Instead I have a bunch of observations in no particular order.

To start, let’s try to understand collaborative filtering systems. Collaborative filtering algorithms generally form the backbone of recommender systems (e.g. “Amazon’s Other users who liked this also liked”). But Recommender systems do not necessarily use collaborative filtering algorithms. For example, Pandora (a music recommender system) matches the profile of music you like to find other music you might also like (at the backend its using some type of sophisticated content filtering). (In the discussion below, I will use Recommender Systems to refer to systems that offer recommendations of stuff to users, and use Collaborative Filtering Systems if they use Collaborative Filtering algorithms for making those recommendations.)

cognitive analysis of tagging

Broadly speaking, there are three main components to a Recommender System.

First the input:: A user needs to provide some type of input to the system. The question is how: Do you ask them to rate stuff (explicit input) or watch what they are doing in some (implicit indications of interest). Explicit indications are often a good way to start with recommendations, but you have to convince the user to make this effort.

The second component is the algorithm or what

6 thoughts on “Collaborative Filtering strikes back (this time with tags)

  1. For example, on my blog page
    You will see that I bookmark a few urls.
    There’s a drop down which will filter for only a
    specific tag. (most of them imported from topic system)

    There is also my tag cloud in the drop down menu.
    Global tag cloud can be access from
    But instead of showing url, it will show users instead.

    StumbleUpon is actually social bookmarking + social
    networking. see my friend page
    and my interest page

    The name ‘StumbleUpon’ comes from a feature that
    it can randomly show you a page using your interest
    and your friend’s interests. That’s a kind of collaborative
    filtering, isn’t it? You need to sign-up to test this feature.

  2. Maybe you would be interested to check a related metadata schema that we have developed, intended to store reusable evaluations (e.g. ratings) in the tagging describing an e-commerce resource.

    Click to access ORI_SI_NMCC_draft.pdf

    A similar (and simpler) approach is also under development for learning resources.

    Best regards,


  3. I agree with the points that collaborative filtering and related recommender systems will make a comeback, and maybe even achieve the promise touted by Firefly back in the day. The massive increase in product selection (the Long Tail) neccesitates product recommendations, while the growth in consumer-driven content (blogs, social networking, customer reviews) enables better peer-to-peer recommendations.

    Check out We’re trying to build the biggest, baddest database of customer preferences. We’re gathering explicit product ratings across channels (i.e. independent of any single merchant) What a customer liked or disliked across all media consumption is a better data set to model than what a customer clicked on or put in a shopping cart at a single merchant.

    And in the end, what matters is the data, not the recommender system

  4. Rashmi,

    It looks like this post got cut off in the shuffle. It has me hooked, any way we can get the full post up? I will name my first girl after you. :)

Comments are closed.