Skip to content

Extracting Semantics from Folksonomy?

So, we have some really nice folksonomies out there now. And they’re really good for humans. But what can machines do with them? Can we use del.icio.us to further the sisyphan goals of the semantic web?

More specifically, I was talking to HaoChuan tonight about how one might automatically use a folksonomy to populate an ontology.

The difficult part in this is to extract meaningful semantic relationships from a folksonomy. In your standard, nonhierarchical tag-based folksonomy, all you know is “a is-a-tag-of b” or “b is-tagged-by-a”. You don’t know what the tag infers about the page–content? origin? summary? intended use?

The problem, then, can be re-stated as this:

A tagsonomy is a large collection of (tag, object) pairs. How can I extrapolate the relationship between the tag and the tagee, and expand these pairs into (tag, relationship, object) triplets?

Looking at the popular tags in delicious, there are a number of relationships. “software” “web” “reference” “toread” “tutorial” “fun” “free” “cool”.

There are a few interesting things about these popular tags:

  • there are several distinct relationship types one can classify the tag into. “summarizes-content”, “is-of-format”, “is-of-use”, “makes-reader-feel”.
  • the relationship is easily predictable and relatively unambiguous, looking at the tag alone without the corresponding tagged web pages.

So how could we go about automating the process of seeing what tags mean in context?

  • look at the tags themselves, as I have done. this will not work in all cases (in fact, I am curious to see how it will work on the less popular links)
  • look at examples of data mining to extract relationships in unstructured plain text documents.

Data mining, in its barest essence, does extract relationships, but the relationships are fixed and predefined–and it’s all about many instances of data pairs that fit one strictly defined relationship. This problem is related, but different in some nonsubtle ways.

Thoughts?

Updates
Related reading

  • Mimi Yin of Chandler has a good entry on the relative strengths of faceted vs hierarchical organization, and how facet use easily devolves into chaotic tagsonomy. It addresses the lack of semantics in tagsonomic relations as a problem, but doesn’t treat it as a solvable or even surmountable problem.
  • Clay Shirky talks about how cool and useful folksonomy is. His focus there was not the specific usefulnes of ontology (machine-parsability) that I was looking for… But his examples (e.g. looking at sets of tags assigned to URLs, how they are different per-user and per-time) do provide interesting data sets for this thought problem of mine.

Spent a little time talking to Patrick Pantel, one of ISI’s resident automated ontology generation researchers. He brought up another merging of folksonomy and ontology, namely automatically creating an ontology of tags. This problem is straightforward, interesting, and a bit easier than what I’m trying to do here. Joshua has already accomplished this, in the form of his “related tags” function.

Addressing my problem, he suggests that, because the tags themselves seem to be good enough predictors of semantic relationship between the tags and tagees, I could come up with list of possible relationships, and use some pre-existing ontologies to map tags and their ontological-neighbors into these relationship sets. Might be a good first-stab at things, and I wouldn’t even need begin at the computationally-intensive task of examining the text of tagged objects.