A cognitive analysis of tagging

(or how the lower cognitive cost of tagging makes it popular)

At the start, let me confess that I struggled with this topic. From my first encounter with tagging (on systems such as del.icio.us & flickr), I could feel how easy it was to tag. But it took me a while to understand the cognitive processes at work. What follows is Rashmi’s theory of tagging – my hypothesis about the cognitive process that kicks into place when we tag an item, and how this differs than the process of categorizing. In doing so, my hope is to explain the increasing popularity of tagging, and offer some ideas regarding the design of tagging / categorization systems.

My ideas are mostly based on my observations about how people tag and relating it to on academic research in cognitive psychology and anthropology. This is a first version, which I expect to revise as I learn more. Feedback is very welcome.

The rapid growth of tagging(1)in the last year is testament to how easy and enjoyable people find the tagging process. The question is how to explain it at the cognitive level. In search for a cognitive explanation of tagging, I went back to my dusty cognitive psychology textbooks. This is what I learnt.(2)

cognitive analysis of tagging

Categorization is a 2-stage process.

Stage 1: Related Category Activation The first stage is the computation of similarity between the item and candidate concepts. For example, I come across the book “Snowcrash” in my library. Immediately a number of related semantic concepts get activated: “book”, “science fiction”, “Neal Stephenson”, “Zodiac”. Other concepts might be more personal; e.g., “favorite author”, “airplane trip”. Still other concepts activated might be more about the physical characteristics, e.g., “paperback”, “bad condition”.

How do we know this? Cognitive psychologists have explored this phenomenon by asking by asking people to list semantic associations with an object, and mapping the type and frequency of associations. Another method is to use implicit memory measures to probe what concepts have been activated. With the advent of fMRI, it is possible to correlate such concept activation to changes in blood flow to difference parts of the brain. The details of this are not relevant for the present discussion, what’s relevant is that there is broad agreement about such conceptual activation in cognitive psychology.

So far we have learnt how related concepts are activated. Writing down some of these concepts is easy enough. With tagging there is no filtering involved at this stage, you can note as many of those associations as you want. This is how tagging works, cognitively speaking. Yes, it’s that simple.

On the other hand, the work for categorization is just beginning.

cognitive analysis of categorization

Stage 2: The decisionNow that we have candidate categories, we need to make the DECISION. What category is the right one? Cognitively, the process is fairly simple – you compute similarity between present item and candidate categories. A function called Shepard-Luce(ppt) describes how people make this decision.

(The process sounds intimidating, but generally its not. Choosing the best category is something we do all the time. We see an animal – it could be a dog or wolf. We make a quick judgment. This is a basic cognitive process – putting things into categories. Even birds can do it. When I was at graduate school, the Professor in the lab next door studied how pigeons make categorization decisions. There is evidence that babies can categorize.)

Cognitively, we are equipped to handle making category decisions. So, why do we find this so difficult, especially in the digital realm – to put our email into folders, categorize our bookmarks, sort our documents. Here are some factors that lead to what I call the “post-activation analysis paralysis”.

First, there is less cultural consensus around items we categorize in the digital domain. Categorization is often based on cultural knowledge. For example, over the years we learn the cultural consensus regarding the boundary between wolf and dog, couch and chair, fruit and vegetable. With digital objects, there is less cultural knowledge about the categories – in fact, one purpose that tagging serves is transmitting cultural knowledge about our constantly evolving digital lives. (This is an interesting topic in itself and deserves a whole other essay).

In the digital world, we don’t just categorize an object, we also optimize its future findability. We need to consider not just the most likely category, but also where we are most likely to look for the item at the time of finding. These two questions might lead to conflicting answers, and complicate the categorization process.

Also, with digital objects, it’s not just adhoc categorization – put an object into a category, any category that comes to mind. We need to consider the overall categorical scheme. Is my scheme becoming unbalanced? Do I have too many items in one category, and too few in another? If I put everything in one category, I will never be able to find anything. Do I need a new category for this item? Does it even fit into this scheme?

The need to consider the overall categorical scheme is much more important in the digital realm, than in everyday categorization decisions. For example, I come across an animal; I categorize it as a dog. I don’t need to worry that my mental dog category has become too large, that I might need subcategories. Cognitively, it is sufficient to make local decisions about objects we encounter, the brain does the magic computation, and my animal taxonomy evolves. As I become an expert on dogs I evolve sub-categories for spaniels, dachshunds and terriers, without explicitly thinking about the structure. Next time, I encounter a terrier, magically, the terrier subcategory gets activated. Think of how much work it would take for something like to happen with our email folders.

Finally, there are no second chances in categorizing digital objects. Well there are – but those are fairly expensive. You need to go into the first category, retrieve the item, and put it into the second. This is where user interface for categorization comes into play – most systems assume that you are done with an item once you categorize it. It’s taken away from you. The brilliance of Gmail was to separate the tagging from the archiving.

Start thinking of all this and you land into “post activation analysis paralysis”. A state of fear that you will make the wrong decision. And the item will be lost forever – it will land in some deep well, some hard to access branch of the tree and disappear from your view and attention.

We come back to the question that we started with – why is tagging simpler. In my opinion, tagging eliminates the decision – (choosing the right category), and takes away the analysis-paralysis stage for most people. (Note that some people might still freeze up in deciding between different tags, or figuring out ways to optimize future findability. These are valid concerns that tagging systems can address better than they do now).

Another observation about tagging – it provides immediate self and social feedback. Each tag tells you a little about what you are interested in. And you find out the social context for that bit of self-knowledge. How do others view that item? Together this piecemeal feedback creates a cycle of positive reinforcement, so that you are motivated to tag even more. This might not make tagging easier, but it does make it more fun.

To conclude, the beauty of tagging is that it taps into an existing cognitive process without adding add much cognitive cost. At the cognitive level, people already make local, conceptual observations. Tagging decouples these conceptual observations from concerns about the overall categorical scheme. The challenge for tagging systems is to then do what the brain does – intelligent computation to make sense of these local observations, and an efficient, predictable way to ensure findability.

———————————————————
1.I refer to tagging as on flickr and del.icio.us – associating an item with a bunch of words, writing down those associations. In this article I have focused on the input part of the tagging process – the individual assigning the tag, not the output part – or the finding information through tags.)
2. For any cognitive scientists reading this article – yes I am well aware that the cognitive explanation is a simplified version, glossing over many of the details. I believe it captures the broad jist of how it works though.

108 thoughts on “A cognitive analysis of tagging

  1. Didnt fully understand the Importance of tagging until reading your analysis.

    Tagging seems pretty chaotic though, so maybe a form of categorizing and tagging may be useful?

  2. Yes, tagging does lead to fairly chaotic data – which is why you need intelligent computation to make sense of it.

    When you say combination of “tagging and categorization”, do you mean at the input stage – that the user specifies. Or some way of computing that post-hoc.

    One approach to a combination might be to take the first tag. Chances are that’s a likely category.

  3. ‘flexible categories’ thats a nice way to put it.
    It needs to be simple Benjamin for people like myself to understand, if its not simple people wont use it.
    I found this page today which describes the way i feel..
    http://www.nettakeaway.com/tp/index.php?id=155

    One problem i see with tagging is that people can tag things with words that are totaly irrelevant to an item just to try and get more people reading it.

    I think the way forward is some way of posting it ad hoc, possibly the way a search engine would by getting top “keywords” from an item.
    Better still let people choose the method that suits them best, tagging, categorizing or even both.

    http://www.web-feeds.com/comment.php?item_id=138&action=show

    I posted this page at my site so i can also bookmark it, but it would be nice to also tag it.
    Tagging basically is the way search engines used “keywords” but they stopped using them for searches as people “stuffed keywords” into the keyword meta tag to try and reach a larger audience, even though an item was not relevant to the keywords used.

  4. A few comments
    1. I think it is worth noting that tagging serves two purposes – one inwardly focused (what makes sense to me) and one outwardly focused (what makes sense to others, or as was pointed out in “Why I Still hate feeds” what will get other’s attention). Note that many of the current examples (del.icio.us or flickr) operate in both domains and some, like the WIKI effort, are more outwardly focused.

    2. I think that Rashmi’s analysis resonates with anyone who’s tried to classify/categorize a large body of information whether it’s internally focused or externally focused. I found that until I achieved a level of comfort with tagging and had developed a model in my own mind of a “good way” of generating tag names, that I was still experiencing some “post-activation analysis paralysis”, asking myself, “will this item I’m tagging ever get looked at again if I tag it badly…” I also think the point about the cost of having to re-categorize as being right on target. Lastly I think that I also had to overcome prior inculcation to “classify (implicitly, hierarchically)” vs. “categorize”. Intellectually I understood that “folders” were failing in an environment where I had many thousands of things to categorize (email, my personal pics at home), that didn’t stop me from continuing the madness, until that is, I taught myself how to tag.

    3. I read the “Why I Still Hate Tagging” post with interest. I think that additional ways of looking at the problems help (see Clay Shirky’s thoughts on ontology and classification etc. http://www.shirky.com/writings/semantic_syllogism.html ) this is only one other viewpoint of the problem. Personally, I think you need to choose the right tool for the job. I have found that tagging is a great way to manage information assets (especially those that don’t lend themselves to keyword search). Sometimes a hybrid approach is what is called for (gmail).

    4. Lastly – I have found that looking for things that others have categorized via tags does require a bit of learning, but once you’ve caught on, and build your own model of the tag space you are searching, using tags can be far more productive (you are leveraging other’s intellectual work)

  5. Interesting Interesting area – has intrigued thinkers, philosophers for centuries and like most good insights generates many more intriguing questions. I have observed feedback loops in my own tagging both at Stage 1 – Stage 2 And Stage 2 – Categorise It, a sort of meta-monitoring.

  6. craig: what do you mean by feedback loops – that you observe how you are tagging, and use that information to adjuts future tags?

    John: I believe that tagging always should have an internal focus. Thats the only way to get good metadata (for both personal findability and social discovery). I don’t know about the WIKI effort at tagging (unless you are referring to the categorization that goes on in Wikis generally speaking).

    I might be wrong here – but I think that good systems need to build in selfish motivation for tagging.

  7. This is one of the best posts describing tagging and the benefits of tagging that I have read in a long time! Very insightful.

    I wanted to ask you for permission whether I could use your diagrams on the help section of our site with a link to this full article of course.

    We just developed http://www.blinklist.com, and many users still need to learn about tagging so I thought your information would be very valuable.

    I also thought you might like to check out our site and the “Tag Manager” in particular since we are trying to solve the problem of how to easily find tags again.

    Mike

  8. A great explanation of tagging. I have speculated that
    the popularity of tagging can be explain cognitively.
    Your article explain the process elegantly.

    Some more issues that you could help explain.
    – How is the social aspect of tagging influence in the

    input process? You current explanation doesn’t seems
    to take this into account.
    – How is the tag-word chosen? Sometimes the concepts
    don’t have a direct mapping to a word (eg. synonym).
    Name, Noun, verb, adjective, etc.
    – How are previous taggings influence current tagging?
    Sometimes you want a particular combination of tags
    to uniquely identify an item. So, you may need to apply
    more tags to make it uniques, for example.

    Looking forward to your analysis of the finding phase.

  9. A nice description of tagging. It could well be a study in adding metadata to data – aka RDF. Starting with the Reiser 5 filesystem, Linux will provide a way to tag ALL data. That would be interesting – for what happens to metadata, when there is too much of it. Will there be meta-meta-data ?

  10. Great explanation of the process. As you have already mentioned in your post, the interface plays a great role in tagging and could be improved (I’m talking about del.icio.us right now).
    I’d would like to have more knowledge in the world than knowledge in the head :-)

    My full comment is on http://www.japhy.at (it’s in English, don’t worry :-) ).

  11. My experience with tagging has been del.icio.us (boom boom!) and I was interested to read this article. The inclusive nature of tagging in del.icio.us, flickr etc. should be encouraged even more, even to the detriment of ease of use. I have noticed recently that I can tag items in my del.icio.us account without putting much thought into it at all. All I need to do is click on recommended tags and hit save. If I’m in a rush, which I usually am, this is what I do. This has a drawback as the personal involvement and “radical trust” elements of Web 2.0 are being squashed before they get started. What good is tagging if you’re lazy and constantly back referencing as most of us will do, given the option?

  12. Wow, I over-enjoyed your short-enough essay. Extremely clear!

    Might I suggest you 3 additional topics you might want to consider and include in your struggle for understanding tagging? I would write about them myself but I’ll never reach your levels of understanding and clearity ;-)

    1) Visit
    http://cloudalicio.us/tagcloud.php?url=http://boingboing.net/
    The graph shows the evolution in time of the tags used to tag a specific URL (in this case http://boingboing.net). You may notice that in the beginning people were using more “blogs” and now people use more “blog”. This suggests people are moving from a category-like way of using del.icio.us (I put boingboing in the “blogs” folder that contains all the blogs) to a tag-like way of using del.icio.us (I name boingboing as a prototype of the class “blog”)
    Someone was making this point (surely more clearly) on some blog but I could not find it again. Anyway this is true also for other blogs and this is real, thriving evidence.

    2)
    At http://www.blumpy.org/tagwebs/ there is another “cognitive” approach to the tagweb (or tagspace or tagsphere).
    I wrote about it at http://moloko.itc.it/paoloblog/archives/2005/02/04/tag_the_tag_tag_and_metadadaism.html
    Jakob argues “a neuron in your brain is a lot like a tag in a tagweb”. A tagweb is a network of tags whose edges are the “this tag is tagged with this tag” relationship, for example he tags the tag “Victoria” with the tag “female”.

    Will it be possible/useful to let users tag the tags themselves?

    3) Of course it would be better to have people tagging stuff in a way that makes sense to them but of course, as soon as tags are public (you can see them), there is concern about tag spam. In fact, this is in fact of course not a problem when tags are private, for example for the tag you use in your gmail account; no big deal in spamming yourself, no?
    I wrote about it at
    http://moloko.itc.it/paoloblog/archives/2005/01/29/what_is_tag_spam_or_better_tag_spam_exists.html (from where you can find links)
    Or check the image at http://www.micropersuasion.com/2005/07/yahoo_myweb_bec.html
    In order to make better tag systems (I think this is one of your goals), we must take into account this issue as well. Of course one simple solution would be to give you the possibility to see only resources tagged by friends (flickr and Y!MyWeb2.0 let you do this) or friends of friends, i.e. users deemed trustworthy by a simple and customizable trust metric.

  13. Thank you so much. Your article is interesting, and it makes a fresh and qualified perspective on the folksonomy debate. I’m tagging it for future reference. :-)

  14. Rashmi: the WIKI system has evolved, the tagging is explicitly directed at creating a replacement for a strict ontology/classification system. The WIKI “taggers” are tagging the material with the intent of making it “findable” by the readers, acting in a sense like publishers more than consumers, so while there is still some personal selfishness in the tagging process, the overall flavor there is one that is more “open source”, which has its own psychology about it. In this sense I think WIKI is a different beast then a freely open system like del.icio.us.

    However, I see both the WIKI and del.icio.us models as emergent systems producing a cognitive mapping in a social context. One question I have about
    this mapping that is emerging – is it equilibrating in any sense the way different people think/perceive of things, or will it produce a system that isn’t really usable across different individuals and/or (micro)cultures?

  15. I tried to leave this comment yesterday but it is not here, so I wrote it on my blog at http://moloko.itc.it/paoloblog/archives/2005/09/29/comments_to_a_cognitive_analysis_of_tagging_by_rashmi_sinha.html

    —-

    Wow, I overenjoyed your short-enough essay. Extremely clear!
    Might I suggest you 3 additional topics you might want to consider and include in your struggle for understanding? I would do it myself but I’ll never be able to write as clearly as you [winking face]

    1) Visit http://cloudalicio.us/tagcloud.php?url=http://boingboing.net/
    The graph shows the evolution in time of the tags used to tag a specific URL (in this case http://boingboing.net). You may notice that in the beginning people were using more “blogs” and now people use more “blog”. This suggests people are moving from a category-like way of using del.icio.us (I put boingboing in the “blogs” folder that contains all the blogs) to a tag-like way of using del.icio.us (I name boingboing as a prototype of the class “blog”).

    Someone was making this point (surely more clearly) on some blog but I could not find it again. Anyway this is true also for other blogs and this is real, thriving evidence.

    2) At http://www.blumpy.org/tagwebs/ there is another “cognitive” approach to the tagweb (or tagspace or tagsphere).
    I wrote about it at http://moloko.itc.it/paoloblog/archives/2005/02/04/tag_the_tag_tag_and_metadadaism.html
    Jakob argues “a neuron in your brain is a lot like a tag in a tagweb”. A tagweb is a network of tags whose edges are the “this tag is tagged with this tag” relationship, for example he tags the tag “Victoria” with the tag “female”.
    Will it be possible/useful to let users tag the tags themselves?

    3) Of course it would be better to have people tagging stuff in a way that makes sense to them but, as soon as tags are public (everyone can see them), there is concern about tag spam (I tag something with a certain tag so that other people will be exposed to it). This is not a problem when tags are private, for example for the tag you use in your gmail account: no big deal in spamming yourself, no?
    I wrote about it at
    http://moloko.itc.it/paoloblog/archives/2005/01/29/what_is_tag_spam_or_better_tag_spam_exists.html (from where you can find interesting links). Or check the image at http://www.micropersuasion.com/2005/07/yahoo_myweb_bec.html

    In order to make better tag systems (I think this is one of your goals), we must take into account this issue as well. Of course one simple solution would be to give you the possibility to see only resources tagged by friends (flickr and Y!MyWeb2.0 let you do this) or friends of friends, i.e. users deemed trustworthy by a simple and customizable trust metric. What do you think?

  16. Rashmi – nice exploration. One aspect of tagging that you just touch on is the social. Tagging is inherently social, as it is meant to make information more retrievable and sharable. Three observations:

    1. The individuals in a group my all make the categorization decision differently, there will likely be overlap in the set of tags they apply. So tagging inherently makes the information more accessible.

    2. Also, when categorizing, it is critical that there be a shared definition of the category. Not so with tagging. Tagging produces a cloud where a group can tolerate a great deal of fuzziness about meaning, and still make productive use of the tag.

    3. An spoken or unspoken social process brings about a consensus of what tags to use: I have more than once globally replaced a tag I created with one I found more broadly used.

    Thanks for your post.

  17. I am really interested in the discussion about folksonomies and traditional taxonomies. This is a great article.

    What draws me to tagging is the idea that you can ascribe more tha one category to an object. With traditional bookmarking or folder hierarchies, you can only put an object in one place ( well unless you are going to create and place multiple copies, but that is a lot of time and energy.)

    Personally I have not got into the habit of using the social bookmarking tools – but I love the Firefox extension that allows you to tag your bookmarks on your own computer, instead of just creating folders, you can add as many tags to your bookmarks as you want.

  18. This is a very interesting post, specially as regards volition. However, from my experience, maintaining a good (coherent) tree requires a lot of gardening and finally adds an important cognitive cost.

    Tagging is probably the best tool for metacognition.

  19. Thanks everyone for the great feedback and the links.

    Mike, you are most welcome to reproduce the diagram. I would apprciate the appropriate attribution, but feel free to use it.

    Korakot, a social analysis of tagging is definitely on my list of articles to write. Thats an equally fascinating topic.

    JOhn, I agree. I do see the Wiki system and del.icio.us as similar in supporting emergent classifications. As regards your question: s it equilibrating in any sense the way different people think/perceive of things, or will it produce a system that isn’t really usable across different individuals and/or (micro)cultures?

    I would think the answer depends on how different and cohesive the group doing the tagging is. One can imagine that if a few people of a subgroup shared tags in a social space, then the tags would reflect them and their mental models. To the degree that their mental models were different, the tags would also be different.

    This is a many layered question, however. And there is some fascinating research in cognitive anthropology that can provide a guide for what happens in such situations. Thats another article on my to-do.

  20. Email me if your comment does not show up. My overactive spam filters catch some comments.

    Michael A reponse to your note abut fuzziness in tagging. Yes, it does make more fluid and easier to share. But categorization is also something that groups of people are able to share. Think of the way that we slowly evolve a shared understanding of a new musical genre. Its less fluid, but the sharing happens anyway. Our digital categorization systems just do not support the type of sharing that happens in offline social interaction. More on that later in a follow-up article.

    Paolo, I did read your comment on your site (sorry that your comments were being blocked out – I have fixed that – I think :-))

    As regards your points about cloudalicious, I did check that out. Tom Coates had made some interesting points in a similar vein: http://www.plasticbag.org/archives/2005/06/two_cultures_of_fauxonomies_collide.shtml
    I think your idea about tags shared by friends and trusted others makes a lot of sense (after all thats how the process works in real life – we are influenced by others around us, and our concepts are transmitted through these social circles). This is why I find Yahoo MyWeb2 so interesting, and have been meaning to check it out.

  21. I’ve been doing a lot of categorizing in recent weeks for one of my executive editing jobs and it’s been a monster. You’d think tagging / selecting categories is simple… but then you leap from site to site and category selections vary.

    This very sample shows how difficult information architecture can be since we all think differently and have different experiences that lead us to make the decisions we make. Thanks for a thought-provoking article.

  22. Tagging is sense-making for the tagger (hence the archiver).

    But it does not transfer to tags as sense-making to others (who didn’t tag with that intent those objects in the 1st place).

    This has been researched for many decades already.

    This is why expert systems, ontologies and corpuses get built.

    Sure they are hard to use.

    But natural language is not exact nor very expressive (with the common vocabulary).

    In summary: tagging is great for personal sense-making and a bit of social peek-a-boo, but not much else.

    There will never be intelligent computation that makes sense out of people’s tags and actually put them in similar categories.

    Think about it, not even smart people can do that (unless they where there when the tagging happened and taggers thought out loud!)

  23. Children, they have a curious, yet non-judgemental approach to investigating their world. When they begin their — “education” — they learn to label things and place them in specific categories. The given logic for this states that this saves us time in investigating every intricate nuance in life and speeds us onto whatever influences us the most. We become specialists in whatever category our circumstances permit and our individual life paths begin.

    But something happens to us along the way, a trade off occurs. The current economic systems and social institutions tell us that it’s not “efficient” for us to ask questions, to seek out new methods, to probe into new territory and investigate the root cause of lifes everyday events.

    We learn to forget our curious nature as children, and not for all, but most, we learn to stereotype. We learn to label ourselves: Christian, Republican, Muslim, Jew, Pro Union, Feminist, Teacher, Basketball Player, Buddhist, Straight Edge, Geek, Vegan, Communist, Anarchist, Man, Woman, Black, Brown, White etc etc…

    Ostensibly, we perceive the place, “where everybody knows our name”, safe. Our belief systems keep us headed on the same course, for many, our whole lives.

    I read about a recent study published in the — “Journal of Child Development” — which showed that adults did far better at remembering illustrations they had seen of fictional cats versus real cats, Why?

    Because the label “cat” allowed them to see the picture of it, recognize it as a “cat”, and move on. But the pictures of the never seen before, fictional cats, were so new that a quick easy stereotypical label for it wasn’t available yet. So, they had to think about it, investigate it, and not simply dismiss it as just another “cat”.

    Children on the other hand, hadn’t had the label or category ingrained in them as strongly yet, and therefore scored much higher on the test of remembering what they saw.

    What can we learn from this? — That absolute rampant categorization makes one mentally weak and unable to cope with the many interconnected dynamics of life. It leads to a breakdown in diagnoses and resolution of problems, it leads to attributing a single cause, often the wrong cause, to ones problems.

    These old reptilian/mammalian instincts don’t work in todays world, we have to break free from habits, from dogmas, from absolute totalitarian systems of control and classification.

    Why do people decide to put absolute faith in things?

    Because they have the illusory promise of control in doing so. I suspect that through natural selection, our ancestors who were better suited to rally around a group of people with a shared belief system sustained a far greater chance of survival from the elements. Those who could peddle control could successfully rally support behind a shared meme/idea. In the past, groups with differing views and interests clashed. This method, while not perfect, posed no threat to the species, But what about today?

    Systems based on deception, faith, and in misinterpretations of life “being” a certain way, breakdown and fail because they only take into account an either/or view of the situation.

    “You’re either with us, or against us”

    Such thinking may very well get a nuke planted in your garden.

    If we want to survive in todays world, not only do we have to learn to categorize non absolutely, we have to learn to pay attention to the diversity of life, ask more questions, listen more often, and upgrade to the better of ideas regardless of the identity our egos have enslaved us to.

    Hardening of the Categories

    “Hardening of the Categories”. Memory processes tend to work with generalized categories. If people do not have an appropriate category for something, they are unlikely to perceive it, store it in memory, or be able to retrieve it from memory later. If categories are drawn incorrectly, people are likely to perceive and remember things inaccurately. When information about phenomena that are different in important respects nonetheless gets stored in memory under a single concept, errors of analysis may result. For example, many observers of international affairs had the impression that Communism was a monolithic movement, that it was the same everywhere and controlled from Moscow. All Communist countries were grouped together in a single, undifferentiated category called “international Communism” or “the Communist bloc.” In 1948, this led many in the United States to downplay the importance of the Stalin-Tito split. According to one authority, it “may help explain why many Western minds, including scholars, remained relatively blind to the existence and significance of Sino-Soviet differences long after they had been made manifest in the realm of ideological formulae.”

    “Hardening of the categories” is a common analytical weakness. Fine distinctions among categories and tolerance for ambiguity contribute to more effective analysis.

    http://www.cia.gov/csi/books/19104/art6.html

  24. Great Article (and great comments)!

    My comment is from the corporate Intranet world where often it is equally vital to prohibit access to information as it is to facilitate it.

    I really would like us all “taggers” to think a little bit about how we could expand tagging to the realm of controlled access information.

    Consider a tag called “Oracle Bid” turning up in a “tag cloud” at a corporate Intranet. Even if the information tagged with “Oracle Bid” is inaccessible, the interpretation of the very tag will probably be the main topic by the water cooler.

    Since tagging is equally powerful, important and needed in large corporate Intranets as it is on the Internet, we should include this ingredient and deal with it.

    Cheers!

    //SH

  25. There will never be intelligent computation that makes sense out of people’s tags and actually put them in similar categories.

    One of the Flickr team, Serguei Mourachov, has used analysis of tags and their relatedness to other tags to show patterns and generate categories… without any modification to the way people tag their photos, he found emergent clusters of meaning…

    See:

    http://flickr.com/photos/tags/lion/clusters/
    http://flickr.com/photos/tags/urban/clusters/
    http://flickr.com/photos/tags/party/clusters/

    Beware! You won’t be able to stop surfing ;)

  26. I think it is dangerous to have an “either or” mentality when it comes to the debate surrounding tagging vs. taxonomy. One of the key ideas in cognitive science is the notion of the semantic network – a flow of semantic associations or related words. The clusters that you see on flickr are just semantic networks formed from tags.

    Semantic networks meander and sometimes that is not a good thing. Its a great way to increase serendipity but not so good at forming understanding. Tagging would be an awesome way to casually browse through books or music. You could find interesting connections that you did not know existed. Tagging is at its weakest when structuring content for both retrieval and subject comprehension. In this case it would be better to have a more authoritative way to categorize.

    The difficulty we face is that information is flowing faster than it can be categorized and so informal tagging seems to be a better method when IMO it is just a more expedient solution.

    Frankly I think a hybrid model may be the best solution. Imagine a faceted system of meta data that uses informal tagging as a means to populate a structured set of factes. You could even use methods found in cognitive, or even linguistic, anthropology to reach a decision on an “authoritative” value for a given facet once enough data has been collected through tagging.

    Makes one think.

  27. I find the last two comments interesting. First, the example of the Flickr clustering shows what can be done with an exceptionally simple dimensional model where a very straight forward unsupervised clustering can provide interesting results. I bet the FP/FN rate for set inclusion is surprisingly robust.

    The second comment hints at the emergent semantic network evolving as communal tagging is used in a context (like Flickr). The first clustering likely does not take into account the relationships between the tagger and the tagger’s vocabulary, the vocabulary and the things being tagged, the things being tagged and the taggers. There is a wealth of information there and we’re not even including temporal information or other weights like how many taggers tagged it or the intersections of the tagger/vocabulary clouds etc…

    As for a strict ontology, it may not be needed. I think that a probability based system (Bayesian, ANN or whatever) that could be created from the network that exists that might very well produce a good query based capability. I think what the users have to buy into is that it’s like a search engine query and not an SQL query.

    jos

  28. Many thanks for this useful article – I think that most of what I’d want to say has been covered by someone already!

    As far as I’m concerned, I certainly need to look more at “tagging” and less at “categorising” when I’m blogging.

  29. Rashmi – excellent post!

    Seems (from reading the comments) that the stumbling block is still the “findability”, thus the pressure (internal and/or external) to categorise following standards or at least a cultural common denominator – to allow others to find the object/subject.

    In my humble opinion that stumbling block is inhereted from earlier two dimensional (paper based) mechanisms to find an object or a subject. A “categorical scheme” as you call it, a single and common one-dimensional stencil.

    If the “finding mechanism” was multidimensional, like overlaying a set of flexible and personal stencils, blotting out any tag that is irrelevant for me only but which includes any tag of my liking, freely.
    Iffy tags, personal tags, it should not matter – as long as I can overlap many at the same time – then I should be able to find the object/subject.

    Say we’re looking for a dandelion – you might use “yellow”, “blow little umbrellas”, “lawn”, “weed” and some other.
    I might use “flower”, “salad”, “bitter”, “diuretic” and some other.
    If that object had been tagged by many I’m sure we could find most of the above in it’s “namespace” – and we both could find the same precise object using very different set of tags, or “stencils”.
    Logic, cultural background, training – all irrelevant…

    Add that the total set of tags added by many – the “namespace” – does not only tell something about the taggers, it delivers “knowledge” about the object as well. Your tags might enlighten me about the dandelion and vice versa :)

  30. I found that ZieTag gives a flexible environment to relate things, not only files, eg. links and images but also a text excerpt. Sure relating interests with interests are also another point. However the thing is that I can create a tag with multiple words. Sometimes this comes in handy, I think even though it is not the way delicious does. Btw, anybody knows about other applications which support tagging? I’d love to try out.

  31. Great stuff, Rashmi! You mention that there is evidence that babies can categorize. The hyperlink behind “babies” doesn’t seem to work. Do you have a reference to babies tagging ;-) ? I happen to have a five-weeks-old, so perhaps I can capture some empirical evidence…

  32. Great insight. What I found difficult with tags are consistency and fading of tags, i.e., different tags may come to mind for similar things on different days, and I may forget the tags I used to tag thing.

    The existing tagging methods I have seen tag a whole page. I designed “deep tags” that can tag any contents inside a page, e.g., a phrase or a paragraph. This way, when a user searches and find a tag, clicking on the tag will bring up the content to which the tag is attached. This is especially useful for long pages. Comments on the usefulness of the deep tags will be appreciated.

  33. Great insight. What I found difficult with tags are consistency and fading of tags, i.e., different tags may come to mind for similar things on different days, and I may forget the tags I used to tag something.

    The existing tagging methods I have seen tag a whole page. I designed deep tags that can tag any contents inside a page, e.g., a phrase or a paragraph. This way, when a user searches and find a tag, clicking on the tag will bring up the content to which the tag is attached. This is especially useful for long pages. Comments on the usefulness of the deep tags will be appreciated.

  34. Hope you are still reading this .. it is quite an old post I guess. I am also an ex cognitive psychologist … now computer scientist … who is fascinated by, and doing work on, user tags.

    I am puzzled by one aspect of your explanation… the very first stage! I know you include a disclaimer about the simplicity of the explanation .. and I accept this, but as a result I find the first stage completely mysterious.

    How exactly are “candidate concepts” identified in the first place? Maybe this is too difficult a question. Here are some simpler ones. Is there something interesting to say about the sorts of candidate concepts that are activated? Can we predict anything about the distribution of the sorts of tags which might be activated for an arbitrary item?

    I agree with most of what you say in stage 2 … categorizing in the “real world” was never going to be as simple as categorizing in a typical category learning experiment in which categories tend to be artificial. And you have pointed out some of the added complexities.

    Having said that, the puzzle still remains …. is there something interesting to say about the candidate concepts themselves? Some of these will end up as tags … and therefore the question becomes transitive to apply to the tags themselves.

  35. There are plenty of categorization schemes that allow the categorizer to choose more than one category. It’s a false dichotomy between tagging and one category only. A categorization scheme can easily allow “as many categories as fit!”.

    However. You are right in any scheme that’s all formally laid out for you in advance, there’s a cognitive burden in mapping your own internal conceptual model to the scheme’s model. Even IF you can choose more than one category. The burden is in the mapping of internal model to standardized model. With tagging, there is no need, your internal model is it. (Well, you do need to map your internal model to words or character strings; possibly without using spaces; but that’s a much smaller burden than mapping to a standardized external scheme, especially where there might be ‘wrong’ mappings, wrong choices of categorizes. There are no wrong tags).

  36. This is an excellent little article. It highlights some of the serious misgivings I have about tagging, based on how most web applications impliment the process.

    Most web 2.0 applications simply let the user tag, without providing any guidelines as to how to tag effectively. The presumption is that each user has their own mind about what information means, and thus is the best judge of how to categorize it. Unfortunately in the real world, most users don’t have a CLUE as to how to effectively categorize a piece of information. Some people will be effective, like by createing a few number of tags which they can manage, or by duplicating their real-world A-Z list. But most users, since they have the freedom to, will simply apply whatever tags pop into their mind at that particular moment. Sometimes those choices are correct, based on how they normally think. But often, this creates situations of redundant tags (using “mac” this time vs “macintosh”), situations of incomplete tags (using “mac” and “computer” this time, but only using “mac” a future time), and situations of unique-only tags that may be too complicated to remember applying in future situations, or have one-time application (using “buy-a-new-house” for a project, instead of simpler “house”-type tags).

    These mistakes quickly add up, causing even more confusion then the previous file/folder system. This is because the user is allowed to think less about what they are doing. This frees them up in the short term, but is disasterous in the long term. In other words, the user has the freedom to completely destroy their own logical congnative recall process, simply because they can tag however they want. People must be forced to think a bit upfront- “how will I search for this?”, “what does this mean to me?” etc. Not use whatever pops into their head at the moment. Because the state of searching is NOT the same state as categorizing.

    The way to fix tagging is to force the user to design a consistant tagging methodology. Either upfront, or on the fly when bookmarking occurs. One way to do this is to force some semblence of heirarchy, or rather, hard-lined relationship between related tags. A quick example of what I’m getting at is to have the system automatically tag “computer” every time “mac” is tagged. The user establishes that “computer” is above “mac” and thus the two should always be included together. Anothe important direction is to eliminate redundancy. Tagging “mac” should give a suggestion of also applying “apple”, “macintosh”, etc. In this way, the user can use any form of these tags to return to the peice of information.

    I guess what I’m advocating is for the system to “help” the user tag by taking feedback on what relationship one tag has to another, and automatically making future tagging suggestions based upon this logic.

    I would also eliminate showing the user what tags other users have applied to the same bookmark. This conflicts with the basic objective of tagging, which is enable the user to return to the tagged information. “Silent tags” should be applied to your bookmarks, which means, the top 10-20 popular tags (user configurable) for the peice of information are automatically applied. Thus, we keep community-tags seperate from user tags. Which is critical, because they have different purposes. For example, community bookmarks are so my friends can find my bookmarks, but may have nothing to do with me returning to the bookmark.

    I also advocate a system which, at tagging time, is visually designed to promote application of the right sorts of tags. Most bookmarklet forms (del.icio.us, blinkmarks, etc.) simply give all the tags the user has used previously in one huge alphabetical list, a list which changes constantly as tags are added and deleted. This is unproductive for the user to gain the muscle memory needed to apply tags consistantly. I often have to scan ALL of my tags in a blind effort to find a singular tag I used before that I want to re-apply. Failing that, I will most likely create a redundant tag in frustration.

  37. As people are starting to examine tagging there’s a danger they’ll ‘reinvent the wheel’.

    As with many things there’s a hidden history. Much work and thinking went on before the web was even created!

    Tagging arose from library classification systems. The key thinker was Shiyali Ranganathan (1872-1972).
    Peter Memes was the first to introduce tagging on the web.

    In information management tagging is better known as a
    ‘facet-based classification scheme’.

    One of the best known is Ranganathan’s Colon Classification scheme.

    There is some information on the web (Search using key words from the above) but you may also have to delve into paper based information science texts to get more information.

    Having spent a lifetime studying these Ranganathan’s main conclusion was that facet-based classification schemes usually don’t work well standing alone – they require some other supportive structure (framework) in order to be useful and intelligible.
    He defined any object or idea as being describable by its basic category:
    Personality (the primary facet)

    Matter (Physical materials)

    Energy (Action)

    Space (Location)

    Time (Time period)

    Tags can then add information within the above framework.

    The consideration in tagging has to be the retrieval of information.

    In a bookmarking service for example the user is effectively building a relational database where the ‘tags’ represent both field headings and the query criteria.
    Unfortunately most users have no experience of databases or the process of normalisation. (Again a lot of thought into these issues has been done in the field of Database Design. Another branch of Information management.)

    Consequently their tag cloud billows in size and they end up adding more and more tags in the hope of finding something again, a process that becomes self defeating, or at the least more lengthy, as they have to try more and more tags to get the item back.

    There is another prehistoric use of tags on the Internet. In the early days Search engines used to search for the metatags web designers placed in websites to describe them.
    The problem was other sites (porn) also began using popular metatags to get their sites into search results. It almost ended the life of some search engines as users deserted them and led to the rise of Google (pageranking) and Yahoo (directory) search engines whose systems didn’t depend on metatags.

    I’m hopeful that we’ll learn from history, but there does seem to be a generation that thinks tagging is a completely ‘new’ system, unique to the web and only used since 2001.
    Ahh to be young and innocent :-)

  38. Thanks Rashmi for a great post! We’ve quoted you at Tyner Blain in our post, Software testing series: Organizing a test suite with tags part one. Thanks for helping us with our research and inspiring some ideation!

  39. An excellent post that has generated a lot of discussion. Unfortunately some of this seems to have confused your point. Many people think you claim that categorization and tagging are distinctly different processes. Perhaps this misconception originates in a loose reading of your statement: ” … cognitive process that kicks into place when we tag an item, and how this differs than the process of categorizing.”

    But your claim is very clear that both processes share a lot of cognitive actitvity (your Stage 1). The confusion comes at Stage 2, which should actually be 2a and 2b.

    2a) is a process of automatic categorization, which you admit is a rather easy task. You say: “We see an animal – it could be a dog or wolf. We make a quick judgment” “Even birds do it” as you say. So clearly this is an act of categorization that does NOT lead to any kind of cognitive paralysis. It also seems to me that this kind of process is involved in tagging (for example we assign the tag “wolf” to a site about “wolves”)

    2b) is where difficulties arize. This is when we go beyond our automatic “reflex” to categorize and attempt to synthesize interesting new category structures. This is a hard task.

    So to clear up the confusion, tagging DOES involve categorization of type 2a (plus other stuff like adding descriptive adjectives and other associations). It probably does not involve categorization of type 2b.

  40. Hi Rashmi,

    is there something missing in this blog post now?
    Last time I was accessing it (I guess it was at http://www.rashmisinha.com/archives/05_09/tagging-cognitive.html then) I remember two graphics visualizing the underlying cognitive processes for categorizing vs tagging. I can’t find this graphics anymore. Is that on purpose or am I mixing up something here?

    Thanx for your help and BTW thanx for this great blog post (I cited it in several papers by now).

    cu
    mc

  41. Thank you for such an eloquent description. I was looking for just this type of explanation to present to a committee at work.

    You’re a life-saver!

  42. Thanks Rashmi for a great post! We’ve quoted you at Tyner Blain in our post, Software testing series: Organizing a test suite with tags part one. Thanks for helping us with our research and inspiring some ideation!
    Sohpet

  43. Dear Rashmi,
    thank you for this inspiring post, it helped me a lot during my research on tagging and ontologies. I’ve got one specific request, hope you’ll read this although the article is nearly 4 years old :-)

    Could you please post here any references to scientific articles / journals / reviews that are related to your ideas or that inspired you? I am definitely “hungry” for more discussion on this topic!

    Thanks!

    Juraj Frank
    Comenius University Bratislava

  44. One of the Flickr team, Serguei Mourachov, has used analysis of tags and their relatedness to other tags to show patterns and generate categories… without any modification to the way people tag their photos, he found emergent clusters of meaning……thank you

  45. Some interesting observations. But really a shame that this author hasn’t bothered to cite (or perhaps is not aware of) a great deal of good preceding work — “cognitive” and otherwise. In the cognitive realm, interested readers might start by looking at the work of Barsalou (and also Brian Ross) on evidence for goal-directed categories in human cognition. In the realm of HCI, consider the article by Civan et al. “Better to Organize Personal Information by Folders Or by
    Tags?…” which describes actual research on the benefits (and pitfalls) of tagging: http://kftf.ischool.washington.edu/rerecentpublicationaboutplanner/Civan%20et%20al,%202008,%20Better%20to%20organize%20bv%20folders%20or%20by%20tags.pdf

  46. My experience with tagging has been del.icio.us (boom boom!) and I was interested to read this article. The inclusive nature of tagging in del.icio.us, flickr etc. should be encouraged even more, even to the detriment of ease of use. I have noticed recently that I can tag items in my del.icio.us account without putting much thought into it at all. All I need to do is click on recommended tags and hit save. If I’m in a rush, which I usually am, this is what I do. This has a drawback as the personal involvement and “radical trust” elements of Web 2.0 are being squashed before they get started. What good is tagging if you’re lazy and constantly back referencing as most of us will do, given the option?

  47. One of the Flickr team, Serguei Mourachov, has used analysis of tags and their relatedness to other tags to show patterns and generate categories… without any modification to the way people tag their photos, he found emergent clusters of meaning……thank you

  48. Thanks Rashmi for a great post! We’ve quoted you at Tyner Blain in our post, Software testing series: Organizing a test suite with tags part one. Thanks for helping us with our research and inspiring some ideation!

  49. Good info thanks for sharing with us.Nice information, valuable and excellent, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which we all need, thanks for all the enthusiasm to offer such helpful information here…

Comments are closed.