Clustering comes to Flickr

It was bound to happen sooner or later. Simple lists are great, weighted lists (or tag clouds) are even better. But as data accumulates, its no longer practical to navigate it through tag lists. Some sort of structure is needed. Back in February I had written about tag-sorting, and how easy it should it should be to cluster tags.

clusters on flickrEnter categories or clusters as they are referred to on Flickr.

How it works: When you click on a tag, you have the option of exploring different clusters related to the tag. For example, summer has several associated clusters:

Cluster 1: beach, sea, vacation, sand, ocean, june, trip, island, florida, august

Cluster 2: water, lake, boat, river, reflection, rain, window, bridge, fountain, boats

Cluster 3: sky, blue, sun, sunset, clouds, tree, park, trees, white, leaves

Cluster 4: flower, green, flowers, garden, nature, red, yellow, grass, pink, macro

clusters on clustyUnlike Clusty (based on Vivismo), there is no effort to name the cluster. Naming requires a much more sophisticated algorithm, and even then, there is a possibility of getting it wrong. Instead Flickr just shows you the tags comprising each cluster. You can do this with tags (show a succint summary of each cluster by showing the tags). Much harder to do this with url’s which is probably one reason that Clusty has to resort to cluster names.

clusters on grokkerAnother design decision was the display of clusters. From time immemorial (ok, well maybe not time immemorial, but definitely since search interfaces have been worked on), researchers/companies have experimented with visual displays of clusters. There is something so visually compelling about the clusters that arrange themselves (as in Grokker.com). And yet, such interfaces have never acheived mass popularity a la Google/Yahoo.

Overall what I like about Clustering on Flickr:

-Clustering tags, not items. I had predicted this a little while back. It makes a lot of sense to cluster tags rather than the items. Its probably much easier (computationally speaking), and easier to understand (from a user perspective) clusters of tags rather than clusters of items (or photos).

-Not trying to automatically generate a name for the clusters. This is once again a good idea. Err on the side of simplicity. And a few example tags do a better job of describing the cluster than a automatically chosen name would. If you look at the findings from Cognitive Psychology, a few exemplars from a category help convey the category better than a prototype (or central tendency – represented by a name can).

-No fancy visual display. Its tempting to use some type of snazzy display to show clusters. Its good that Flickr stayed away from that, at least for a first attempt.

-There is no “More” category: Their clustering algorithm seems to be good enough that there are a small number of clusters. There is no “More” clusters with a whole set of sub-clusters as with Grokker. (This might be because the database is still not that big – this might break down as more and more pictures get added to Flickr.)

8 thoughts on “Clustering comes to Flickr

  1. I agree!
    Not naming the clusters was a good move. Staying away from slick visualizations is almost ALWAYS a good move (for some reason, visualizations are generally not as effective as designers think they are going to be : not sure what the reason for this is).

  2. I have been researhing on Folksonomies.What I have not been able to figure out is on what basis is this clustering done? I mean, there seems to be no particular theme dominant in one cluster(‘park’ and ‘tree’ lies in 1 cluster while ‘garden’ and ‘grass’ lies in another)
    I personally found the clusters very confusing as there was no semantic closeness in the tags within a cluster, which I think would have been a more appropriate approach to clustering…if the user is not able to see the pattern in the cluster….it defeats the whole purpose of clustering, dont u think?

  3. True – its hard to tell why the individual tags lie in one group as compared to the other. But a cluster taken as a gestalt does seem to work. So I do get that overall Cluster 3 seems to reflect more of blue sky in summer, while Cluster 4 is more about the flowers, gardens. The difference is subtle but it seems to be work for me – especially when I go to look at a pictures for Cluster 3 and 4.

    I am not sure such clusters would help people predict where they might find a particular picture (it might lie in one or the other cluster). Very hard to predict. But it works well for exploration – for getting a sense of Cluster 1 as compared to Cluster 4.

    As such Flickr’s clusters serve a different purpose than say, Google Image search: its more about exploration than predictably find a particular picture.

  4. As regards how the clustering was done – mathematically they probably used something like k-means clustering algorithm. Also, they seem to have specified keeping a small number of clusters – larger number of clusters might have increased the cohesiveness of each cluster.
    (Note, I don’t know how this was done. This is all speculation on my part.)

  5. Interesting things which illustrate one problem of the tags

    In English
    http://flickr.com/photos/tags/cow/clusters/

    In French
    http://flickr.com/photos/tags/vache/clusters/

    We can find some “English” tagged cows because French have added both tags cow and vache. Let’s try to go a bit deeper.

    In Romaji (romanized Japanese)
    http://flickr.com/photos/tags/ushi/clusters/

    No more cluster, no more association with the cow or vache. Well it might be normal, because it’s the romanized version of the kanji for cow.

    In Hiragana
    http://flickr.com/photos/tags/うし/

    Zero results.

    In Kanji
    http://flickr.com/photos/tags/牛/

    Zero results. Japanese people don’t have the right to have their images tagged and clustered?
    It should have given at least one result
    http://flickr.com/photos/karl/1333234/

  6. karl,

    This is fascinating. GIven that Flickr is not doing any type of semantic linking of words that refer to the same language – for example the tag cats is different than cat, so it makes sense that they are not able to link the English word cow with the French word vache.

    So the two languages are kept separate each in their own domain, except when a user makes a specific effort to make the linkage, such as a French user using both cow and vache as a tag,

    Not showing the Kanji results when you have a photo tagged is simply inexplicable. I am looking around to see if there are any other Kanji clusters. It would be strange if they do not support clusters in Kanji.

  7. Yes, I like that. I wrote about it here. I think they could improve their algorithm though. I kept on getting this one cluster with about 20 clusters. I increased the number of clusters to try to break that cluster up, but it would drop one or two tags, but mostly remain constant. Good algorithms will give you very different views of the data as you adjust the number of clusters.

    It might have been somethign with my data rather than their algorithm though.

    But overall, clustering is very promising with tags.

Comments are closed.