On The Success of LaTeX

I suspect that the success of LaTeX–and its ubiquity as a format for thesis-writing–is in part due to the fact that learning its arcane subtleties is a wonderful source of procrastination.

What a glorious escape from having do to actual paper-writing!

AskMefi on Language Learning

A fun thread on mefi this morning, “What is the most efficient way to learn the native language of a country while traveling there on your own?”.  One of the most interesting parts of this thread is that some of the commenters seem to have a natural feel for the language learning techniques I only learned in theory, in second language acquisition classes.  Most of the good advice points to the relational aspect of language.  Language’s base purpose is communication, and communication’s base purpose is relationship building… I guess it’s natural that the best way to learn language is to pursue relationships in the target culture.  I used to think that it was just because target-culture relationships afforded more opportunity to practice the language.  Now I think it’s more than that–there’s also something deeply, inherently motivating about target-culture relationships, how friendships drive us to want to communicate, and how the desire to communicate drives us in turn to learn the language.

Thesis Blues

Ugh. I should be busily typing away at my thesis proposal(s!) right now. But I’m not. Perhaps it wasn’t the best idea to plan on finishing my MS Thesis and proposing my PhD Thesis in the same semester. Hrm. Well, my wife has her MS Thesis finishing up by summer too, so at least we get to procrastinate together.
The Big Picture:

1. M.S. Thesis: Modeling Language Learner Pronunciation Errors

2. PhD Thesis: Modeling Language Learner Errors

3. CALICO 2006 Presentation: Re-ranking Language Learner Errors by importance as gauged by native speakers.

Now, to think how can I maximize overlap between these…

On the other hand, my coding work in modeling non-pronunciation errors is going well. It’s amazing how motivated you are to code when you need to be writing. “Structured Procrastination” they call it. My web-based annotation is up and running smoothly (But my, it IS an ugly hack of a Plone site… we won’t talk about that). And lots of Pashto annotators are coming in soon to help with that.
And my citeulike to-read queue is getting huge…

So much to do. I think I’m going work hard for the next 2 days, and set aside tomorrow night for poetry reading. You see, there’s this Neruda book I’ve been waiting to crack open…

Tagging, Searching, Linking

categorization vs ranked search. the old google-vs-yahoo! war of 1998-1999.
it struck me the other day that these two paradigms aren’t as orthogonal as we make them out to be in our minds.

  • full-text search is really just categorization where the categorical tags are the words in the text.
  • it’s a very rough heuristic, but works much of the time.  you might miss big-picture categorization, but as far as content-driven tagging, odds are that the categories you care about are mentioned in the article.
  • the big problem is scale.  250 tags per item, and it gets too noisy to browse easily.  so you need to prune your tags.  This is an NLP problem.  What words do we emphasize, what words do we de-emphasize?  Stemming, removing stop words are the standards for de-emphasizing.  Emphasizing?  That’s quite nontrivial.  I don’t know of anyone in our field that’s tried it yet.
  • Things get even things get even more interesting (and even more nontrivial) when we can create terms ex nihilo, create categorizations whose names aren’t found anywhere in articles.
  • Clustering of tags will prove hugely useful.  But can you generate human-readable names for your clusters (and hierarchical clusters, if that’s your style?)
  • Hyperlinking is a form of tagging too.

Machine-generated categorization are a huge unexploited area in folksonomy and tag-based IA.  I don’t know why no-one’s done anything with it yet.

Gentoo 2006.0

Dusted off my old 5-year-old laptop–thinking it might make a good server–low power consumption, built-in UPS.  The new Gentoo Live CD is very slick.  Too bad it doesn’t quite work:

  • The LiveCD auto-starts a gnome session.  Too bad it doesn’t allow the screen resolution to be any higher than 640×480.  This size is unfortunately too small for the graphical auto-installer to be useable.
  • The command line auto-installer is brittle.  After 6 failed tries with an error dump that closes too quickly to look at, I’m going for the old-school command-line install.  Oh well.

a short braindump.

hmm, haven’t posted anything in a while.
a smattering of notes from life:

  • my mother-in-law just took a trip to Yunnan, China to take photos. I’ve posted some of them in this flickr set
  • went on ISI’s bi-annual AI retreat a few weekends ago. a few interesting things that I may fill out later:
  • SIMILE (project aiming towards “inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services”. distributed semantic web, like a delicious of metadata sets/organizational structures).
  • Craig’s group, researching mashups of mashups
  • my first taste of lisp programming (gives (me (of-type (headache big))). (or something like that)
  • and now, for some lunacy: last night ahd a conversation with friends about the magnetosphere around earth. this morning i woke up wondering, if the field extends far enough–say, to the moon–if we could lay down some long electrical wires and generate “free” (err, compared to the oversized kinetic energy of the moon) electricity as the moon orbits through the field. hmmm.