Skip to content

Tagging, Searching, Linking

categorization vs ranked search. the old google-vs-yahoo! war of 1998-1999.
it struck me the other day that these two paradigms aren’t as orthogonal as we make them out to be in our minds.

  • full-text search is really just categorization where the categorical tags are the words in the text.
  • it’s a very rough heuristic, but works much of the time.  you might miss big-picture categorization, but as far as content-driven tagging, odds are that the categories you care about are mentioned in the article.
  • the big problem is scale.  250 tags per item, and it gets too noisy to browse easily.  so you need to prune your tags.  This is an NLP problem.  What words do we emphasize, what words do we de-emphasize?  Stemming, removing stop words are the standards for de-emphasizing.  Emphasizing?  That’s quite nontrivial.  I don’t know of anyone in our field that’s tried it yet.
  • Things get even things get even more interesting (and even more nontrivial) when we can create terms ex nihilo, create categorizations whose names aren’t found anywhere in articles.
  • Clustering of tags will prove hugely useful.  But can you generate human-readable names for your clusters (and hierarchical clusters, if that’s your style?)
  • Hyperlinking is a form of tagging too.

Machine-generated categorization are a huge unexploited area in folksonomy and tag-based IA.  I don’t know why no-one’s done anything with it yet.

2 Comments

  1. Claire

    visit and say hi!
    Thank you for the message on my blog.

    I find that you don’t have a message board on your blog.

    Posted on 12-Mar-06 at 03:52 | Permalink
  2. mote

    Claire: nope… American blogs a quite a different animal than Taiwanese BBSs =)

    Posted on 23-Mar-06 at 18:27 | Permalink