The New School of Ontologies

A paper I wrote on ontologies (and, more directly, folksonomy) for a databases class I’m taking.

Summarizes much of the thought that’s out there in the blogosphere. There’s a lot more I wanted to cover, but I was space-constrained. Here’s a preliminary version, full of rushed thoughts and unpolished wordsmithing. Will upload a cleaner, more fluent, and web-friendly version later, along with my summarization notes.

Later, in my copious spare time.

Some caveats:

  • I’m an armchair information architect who’s done lots of thinking but had no formal education outside of web-punditry
  • This is a rough, last-minute hack for a grad-school paper. Might be revised if I get around to it.
  • I really wanted to put more information into this (Kmconnection has some great comparisons of the Dewey Decimal System vs other ontologies I would have liked to quote. Additionally, I’m actually not as firm a subscriber to this whole “down with hierarchy” movement as the paper would have you believe.), but was constrained by assignment size.
  • Intended audience has little-to-no exposure to social software/computing. A degree of introduction is necessary that some of my readership might find needless.
  • Please forgive my overuse of passive voice–my writing is a bit rusy
  • UPDATE: whoops this got leaked before I was ready for it. Keep in mind that this isn’t a final final draft

That being said, here it is:
Folksonomies: The New School of Ontology
(or, plain html (in-progress–no footnotes))

Oh Little Voices Of the Throats of Men

An untitled, unpublished-’til-1996 favorite of mine from T.S. Eliot’s youth.
A little rough compared to his later work, but the earlier Eliot feels more approachable in his unpolishedness. Simple and eloquent.

We blow against the wind and spit against the rain

I love that

Ontologies, Taxonomies, and Folksonomies

Am writing a paper on “folksonomies” as contrasted with vanilla ontologies for a databases class. Really, it’s a contrast between distributed classification vs professional annotation (which is, in turn, a microcosm of what’s happening thru the web at large–blogs vs newspapers, blah blah blah).

The paper is just a short thing, 5 pages. And for such a short paper, there’s way too much reading material available on the blogosphere–it seems this is an idea that’s spread like wildfire, due to the popularity of things like del.icio.us, flickr, and furl.

My working delicious feed for the paper (del has been very useful during the brainstorming/information-gathering process, as a way to keep track of all the links I’ve found, but given it’s recent flaky 503-ing nature, I just need to pray it doesn’t crash on me tonight).

Will be using this space to keep summaries of different pages I’ve read, and will eventually put up the paper.

whoops

How to reset the admin password in wordpress.

=p

Tags, Relationships, Links

It hit me after installing tomboy a little while ago:
the current state of tag-based information representation theory, as painted by flickr, gmail, delicious, etc., is that of tags as very limited-scope metadata. What I mean is this: Given a document, say this blog entry, I have different scopes of metadata. The “date” metadata item (2004/11/08 @ 18:40 PST), the subject metadata, and this “category” metadata (”meta”, “thought”) that tries to implement the same sort of tag-based categorization that flickr and gmail and all our organization-by-tagging friends have.

And I’ve said before that I can’t wait for tags to be better-implemented in blogs and wikis and (filesystems too!).

But should we look at tags this way?

What about looking at tags as entities just as solid as the things that they tag. This would shift the paradigm from:

document has-date-metadata $date
document has-tag-metadata tag1
document has-tag-metadata tag2
document has-tag-metadata tag3
document links-to document2
document links-to document3

to

document links-to tag1
document links-to tag2
document links-to tag3
document links-to document2
document links-to document3

The difference is subtle. In effect, it makes it so that tags themselves become documents, and documents that a page links to (and that a page is linked from) become tags. Tags become a sort of generalized URL, and documents become a sort of specialized tag (This makes a certain sense when you think about it. Everything that external page that I link to on my www home page provides some information and perspective on what I want my home page to represent. And every page I link to on my blog defines, by slow, exact, painstaking enumeration, the big-picture content and aim of my blog). Do you see what this all does? It, in effect, blurs line between context and content.

So, how can we make this work for us, practically?

Assuming unidirectional links:

  1. “view all things that this document links to”: this returns the current tagset of the document, where normal tags function as they did before, and where linked document URLs are a sort of very specific tag.
  2. Because tags are documents as well, tags of tags become a structure to our ontology. “Windows” tag and “Linux” tag could both link to “OS” tag, “computers” tag, etc. (Does this work? Directed, binary relationships between tags could possibly be a solution to creating an ontology that is both successfuly emergent and hierarchical).
  3. “view all things that link to this document”: this view is the opposite that we are accustomed to when browsing the www. But in the case that “this document” is a tag, it returns all objects that had been given a certain tag.
  4. It doesn’t have to stop at user-created tags. Links could be created automatically, say by document similarity clustering, which could further add structure to our emergent tag ontology.

(I might be inspired by Latent Semantic Indexing, which attempts to represent the meaning of a word using a high-dimensional vectors in n-space. These meaning-vectors are derived by looking at what words co-occur in different documents. We know that, over hundreds of thousands of documents, “pomegranate” appears in the company of the words “tree”, “ripe”, and is less likely to co-occur with “linux” or “mousepad” or “SUV” or “Manchester”. This says something about the meaning “pomegranate”. Likewise, Bush has been able to form some link between “Al Qaeda” and “Iraq” by using them together a lot ;). Perform a little bit of dimensional reduction and we got ourselves rough vectors that correspond to word meanings. But more talk on Latent Semantic Indexing later. The inspirational bit was that the words surrounding the word you’re specifically looking at are not only content but also “tagging” in a rough sort of way. The line between context and content is blurred.)

Would love to hear any reader thoughts on this! Please!

Stabilization of Technology

I can’t wait for the day when computer and display technology stabilizes the way that mechanical technology has. I can buy a quality wrist-watch, and 5, 20, 50 years from now, it will still be functional, aesthetic, and useful. The cell phone that I bought this year will be behind-the-times a year from now and most likely broken 5 years from now.

Will I live to see the day that displays will be built into furniture like desks, and not render them obsolete fodder for the scrapyard in 5 years?

Don’t get me wrong, I love the fact that this year I can buy twice the hard drive for the same price that I could have bought it at last year. And I love my LCDs compared to my CRTs. And don’t get me started on camera megapixels.

But there’s another part of me that longs for stability, for today’s gadgetry to move beyond its adolescent growth spurt–and mature into something that is both stable and has reached its potential.

(Huh, and I never thought of myself like this before. What’s the term for a luddite born 200 years before his time?).

Time Enough For Research

Just realized I’ve been spending alltogether too much time in my PhD research doing software development, and not enough time doing actual research.

So, as of now I’m going to spend a self-enforced 15% of my time doing actual research-type stuff, keeping abreast of current literature in the field, etc (and hopefully blogging it here too).

It was a scary thing when, in reflecting on life yesterday, i realized that i hadn’t read an actual academic paper related to my work in a LONG time. And that’s not good. Hopefully it’ll change now…

Human-Computer Collaborative Approach to Computer Aided Assessment

trying to blog my research :
Mary Wood from University of Manchester just gave a talk on “A Human-Computer Collaborative Approach to Computer Aided Assessment“.

Abstract:

The ABC (Assess by Computer) system has been developed and used in the School of Computer Science at the University of Manchester for formative and (principally) summative assessment at undergraduate and postgraduate
level. We believe that fully automatic marking of constructed answers -especially free text answers - is not a sensible aim. Instead - drawing on parallels in the history of machine translation - we take a “human-computer collaborative” approach, in which the system does what it can to support the efficiency and consistency of the human marker, who
keeps the final judgement.

Our current work focuses on what are generally referred to as “short text answers” as contrasted to “essays”. However we prefer to contrast “factual” with “discursive” answers, and speculate that the former may be
amenable to simple statistical techniques, while the latter require more sophisticated natural language analysis. I will show some examples of real exam data and the techniques we are using and developing to handle them.

Interesting. Rather than try to make computers do a job perfectly, they team up a computer with human and try to exploit the strengths of each. To enable adoption in the teaching community they have made the machine aide as subtle as possible, so that it remains in the background and doesn’t “steal control” from the teacher during the grading process. Most interaction is thus passive (sorting and clustering of student answers, key word hilighting, etc).

Challenges:

  • System must deal with different question types, each requiring different approaches (multiple choice choice could easily be judged by a machine system, short answer could theoretically be graded using clustering and keyword matches, and long essays should theoretically be graded using natural language understanding techniques).
  • System must deal with dirty data (spelling mistakes, many ways of expressing the same thing, context-specific synonyms, deep domain knowledge).

Solutions:

  • Take advantage of the effectiveness of simple tools: use LSA and other techniques for clustering, sort by length, hilite keywords (both exact matches to kewords and fuzzy matches)
  • Let teachers adaptively build domain knowledge on-the-fly

(fuzzy matching to keywords was done using string edit distance. naive attempts have been made at pronunciation edit distance (and what about keyboard-typo edit distance?).).

Benefits:

  • improves grading speed by a factor of 2-3 (pessimistic) or 6-8 (optimistic)
  • anonymizes students’ work
  • allows different “views” on data (”let me see all answers from this students’ test” or “let me see all the students answers for this question”)
  • as an NLP project, system use, data gathering, and system usability improvement is a bootstrapping cycle
  • no noticeable bad effects on grading reliability

Very cool, just a shame that Manchester doesn’t have a bigger NLP group to do work on this.

Elections, The Morning After (or, Why I Didn’t Vote)

And Bush Jr. has won. As a Christian I support the morality he says he stands for, but can I really trust the moral integrity of a man that attempted (and, most scarily, actually succeeded!) to pull the wool over his country’s eyes and link the 9/11 tragedy to his completely separate agenda for Iraq?

And, I’m afraid for my civil liberties, if this whole Patriot Act thing keeps up.

But the question remains: would it have mattered if I had voted or not? This is the problem I see with voting: Judged from a solely utilitarian viewpoint (how useful are my actions), voting just isn’t worth it. If you judge the “total usefulness” of an action to be the measure of the good that it accomplishes divided by the measure of effort it takes to accomplish, voting just doesn’t rank very high.
usefulness = good_accomplished / effort

As far as effort goes, you have the effort it takes to see through the political doubletalk and people-pleasing to hear what each proposition or candidate is really saying (plus the minimal effort of getting to the polls). As far as what it accomplishes… … …
I figure my power in choosing the next President of the United States is 1 / (however-many-people-vote). In this case, 1/(120 million). That is a very small number. Small enough that it brings the “total usefulness” of the voting action as close to zero as I practically want to measure. To me, I interpret this to mean that it’s not really worth my effort to vote.

I have explained this before to people, and they say “well, if everyone did this then the country would be in trouble”. I can only answer “well, if (practically) everyone did this, then my vote would have a lot of power and I would vote!”.

I am not advocating political apathy. I think that what the Internet Veterans for Truth did was a great thing. I think that political {pundits,propogandists,evangelists} can convince many people to vote. And groups of people voting actually make a difference, just not individuals. If you can inspire many people to vote the way you want them to, that act of inspiring can rank pretty high on the “total usefulness” scale.

The one flaw in my thinking is that voting isn’t just a pragmatic action–it’s also a symbolic action. When I vote, I’m not only accomplishing something practically (adding my extra pull, however small, to the tug-of-war for the next president). I’m also becoming a partner and participant in the democratic process. I have some vague feeling of linkedness, of ownership, or participation. And crossing the line from observer to participant is a powerful thing, psychologically. This is the only reason I could see to vote.

But I didn’t, so that’s that.

My Vote

For technorati’s pseudo-election,
I’m Unregistering to Vote.

The linked site wasn’t made by me, but I commiserate with it regardless of if it was intended to be satire.
feeling disillusioned…
Quoted below:

Dear Secretary of State;

I am a registered voter of Kansas who would like to rectify that. First, let me say that I take my duties and responsibilities
as a citizen seriously and want to do what is best for my community as well as me. With that said, I believe that the responsible thing to do is to have my name removed as a registered voter. I do not have the time, interest, nor ability to stay appraised of the candidates and issues that I will have to vote on. I feel removing my ability to vote will be more effective in accomplishing this than placing the onus on myself to just not vote. Most likely the temptation to vote that will be instilled in me by MTV and various commercials over the coming months will be too much for me, and I will end up casting a vote on issues and candidates that I am, at best, totally ignorant of.

Take that last sentence of the previous paragraph as an example of how I take the easy (ignorant) way out. I ended it with a preposition. I know you’re not to do that, yet I did anyway because it was easier than rewriting it to not end with the word “of”. This is why I know I can’t count on myself to just not vote and would instead like to prevent myself from ignorantly voting by removing my name as a registered voter. So, please tell me, short of committing a felony, what can I do to unregister to vote?

Thank you for your assistance,

Jason Curless
jason@porkjerky.com