Skip to content

Thesis Blues

15-Mar-06

Ugh. I should be busily typing away at my thesis proposal(s!) right now. But I’m not. Perhaps it wasn’t the best idea to plan on finishing my MS Thesis and proposing my PhD Thesis in the same semester. Hrm. Well, my wife has her MS Thesis finishing up by summer too, so at least we get to procrastinate together.
The Big Picture:

1. M.S. Thesis: Modeling Language Learner Pronunciation Errors

2. PhD Thesis: Modeling Language Learner Errors

3. CALICO 2006 Presentation: Re-ranking Language Learner Errors by importance as gauged by native speakers.

Now, to think how can I maximize overlap between these…

On the other hand, my coding work in modeling non-pronunciation errors is going well. It’s amazing how motivated you are to code when you need to be writing. “Structured Procrastination” they call it. My web-based annotation is up and running smoothly (But my, it IS an ugly hack of a Plone site… we won’t talk about that). And lots of Pashto annotators are coming in soon to help with that.
And my citeulike to-read queue is getting huge…

So much to do. I think I’m going work hard for the next 2 days, and set aside tomorrow night for poetry reading. You see, there’s this Neruda book I’ve been waiting to crack open…

Tagging, Searching, Linking

08-Mar-06

categorization vs ranked search. the old google-vs-yahoo! war of 1998-1999.
it struck me the other day that these two paradigms aren’t as orthogonal as we make them out to be in our minds.

  • full-text search is really just categorization where the categorical tags are the words in the text.
  • it’s a very rough heuristic, but works much of the time.  you might miss big-picture categorization, but as far as content-driven tagging, odds are that the categories you care about are mentioned in the article.
  • the big problem is scale.  250 tags per item, and it gets too noisy to browse easily.  so you need to prune your tags.  This is an NLP problem.  What words do we emphasize, what words do we de-emphasize?  Stemming, removing stop words are the standards for de-emphasizing.  Emphasizing?  That’s quite nontrivial.  I don’t know of anyone in our field that’s tried it yet.
  • Things get even things get even more interesting (and even more nontrivial) when we can create terms ex nihilo, create categorizations whose names aren’t found anywhere in articles.
  • Clustering of tags will prove hugely useful.  But can you generate human-readable names for your clusters (and hierarchical clusters, if that’s your style?)
  • Hyperlinking is a form of tagging too.

Machine-generated categorization are a huge unexploited area in folksonomy and tag-based IA.  I don’t know why no-one’s done anything with it yet.

Gentoo 2006.0

08-Mar-06

Dusted off my old 5-year-old laptop–thinking it might make a good server–low power consumption, built-in UPS.� The new Gentoo Live CD is very slick.� Too bad it doesn’t quite work:

  • The LiveCD auto-starts a gnome session.� Too bad it doesn’t allow the screen resolution to be any higher than 640×480.� This size is unfortunately too small for the graphical auto-installer to be useable.
  • The command line auto-installer is brittle.� After 6 failed tries with an error dump that closes too quickly to look at, I’m going for the old-school command-line install.� Oh well.

a short braindump.

02-Mar-06

hmm, haven’t posted anything in a while.
a smattering of notes from life:

  • my mother-in-law just took a trip to Yunnan, China to take photos. I’ve posted some of them in this flickr set
  • went on ISI’s bi-annual AI retreat a few weekends ago. a few interesting things that I may fill out later:
  • SIMILE (project aiming towards “inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services”. distributed semantic web, like a delicious of metadata sets/organizational structures).
  • Craig’s group, researching mashups of mashups
  • my first taste of lisp programming (gives (me (of-type (headache big))). (or something like that)
  • and now, for some lunacy: last night ahd a conversation with friends about the magnetosphere around earth. this morning i woke up wondering, if the field extends far enough–say, to the moon–if we could lay down some long electrical wires and generate “free” (err, compared to the oversized kinetic energy of the moon) electricity as the moon orbits through the field. hmmm.
  • SQLObject Woes

    27-Jan-06

    Don’t get me wrong, I still love SQLObject. The way it abstractifies database engines is very nice, and the way it treats SQL code in a pythonic way is absolute poetry. But it doesn’t seem ready for primetime.

    Two things that have been painful this last week:

    • Evidently, you can’t have a boolean table column named “dirty” in SQLObject, because it uses it for internal voodoo. I was getting recursion errors for a couple hours that I couldn’t trace down this weekend until I guessed about the “dirty” flag. They do the same black magic with their “id” table column. Now, there has to be a better way of implementing their code so there’s no overlaps. Private variables? Namedspaced variables? compile-time warnings?
    • No way to do orderBy=RAND() in SQLObject queries. I would think this is a common feature people would want. Google yielded no attempts at solutions, and the SQLObject discussion forums had a thread full of suggestions that don’t work

    Sigh.

    Also on a pythonic note, our benevolent dictator Guido had a good post today requesting advice on python web frameworks. Reading the quite-good comments-discussion that ensued, this just reaffirms my belief that there are many, many options for python web development platforms/frameworks, but no one real good ones. In the gamut between painfully feature-lacking to woefully overcomplicated, there are plenty of options on the peripheries… but nothing in the sweet spot. Where is the Rails of the Python world?
    A few notes on the comments:

    • It looks like Aaron finally released webpy a week ago. It looks a little feature-lacking still, but definitely going in the right direction. Need to check that out. I respect the zen-like simplicity/usefulness of Aaron’s code, and his coding philosophy.
    • I need to buckle down and try out paste someday too.
    • TurboGears might probably be very good when it makes it to 1.0

    </nerdery>

    Settling In

    26-Jan-06

    An entry of more personal flavor:

    It’s been almost a month now that I’ve been married. Life is slowly settling back to normal. Our bed finally arrived today (no more sleeping on two unequal-height twin mattresses pushed together, THANK GOD). The books are getting unpacked and the pictures are getting placed on the walls… There’s still a tremendous backlog of things to do. Re-establish relationships that had been neglected during the hectic months of wedding planning, write thank-you notes, continue settling in to the apartment… Also, I’ll be proposing my thesis in three months or so, so there’s quite a bit of work there too.

    It’s busy times, but happy times.

    Art, (In)security, Surveillance

    12-Jan-06

    While doing my best to shake off digital guilt, I’ve been slowly reading through the backlog of weeks of unread blogs in my spare time. We Make Money Not Art had a great commentary, Panoptic Insecurity, on an installation entitled Gun Control.

    Gun Control is an electromechanical installation, which explores underlying issues of both security and surveillance. Each of the four units incorporates a police-issue revolver and a small video camera. As people move into the installation space, the cameras track the movement and the guns follow. However, the technology is imperfect. The cameras do not always function properly. The revolvers point at different targets. They sometimes twirl about playfully. The armatures shake and rattle. We are directly in the line of fire. This piece raises questions about our security-surveillance apparatus by prompting a visceral reaction.

    Beautiful and reactionary, and unsettling in a very proper and emotionally manipulative way.  And the message is so good.

    Back

    10-Jan-06

    Back now. And married, too! Strangely enough, not much is different, with a few subtle exceptions:

    • our kitchen is a bit more well-equipped from the wedding gifts
    • I get to wake up every morning next to a beautiful woman
    • our apartment is an absolute mess from all the moving boxes
    • I suddenly have a little free time, because there’s no more wedding to plan
    • I don’t have to say goodbye to Mindy at night any more.

    The wedding turned out wonderfully, if a bit hectic (all the “usual” last-minute wedding preparations, plus the emotional strain, plus moving Mindy’s stuff into my apartment, plus the wonderful-but-tiring opportunity to host 6 out-of-country guests (bridesmaides, plus the parents-in-law, plus my sister-in-law-in-law (err, my brother-in-law’s wife… what’s the name of that relationship?) ). It was Mindy’s parents first time in the states, which meant it was the first time our parents met, and also the first time that I got an opportunity to really host them. That last bit was really good–the opportunity to host them. In the past, I had always been visitting them, which meant it was them driving me around, them treating me to good restaurants, them cooking for me. I find it hard, generally, to serve Taiwanese parents. (don’t misunderstand, the hard part is not in finding motivation to serve, but in getting them to let you serve them. From my ?? perspective, the parent-child relationship is so fixed that it’s almost awkward for them to receive care instead of giving care). However, now that they were on my home turf, heh… I finally got to treat them at restaurants, drive them around, cook for them. It was wonderful to be able to return the love, finally.

    But man, that last week before the wedding was hectic.

    More later–hopefully plenty of pictures, an itinerary of wine tasting (the blogosphere was disappointingly uninformative on good santa barbara vinyards to visit), and santa barbara food.

    twenty eight days

    02-Dec-05

    ???!

    Nature: “Scientists must embrace a culture of sharing and rethink their vision of databases”

    01-Dec-05

    Good editorial on Nature, “Let Data Speak to Data“:

    Web tools now allow data sharing and informal debate to take place alongside published papers. But to take full advantage, scientists must embrace a culture of sharing and rethink their vision of databases.

    That being said, I find it more than a little ironic that this diatribe is published in Nature, of all places. Nature, the stalwart of the old academic regime… it’s method of publication (high cost of subscription, stringent peer-review-by-the-few instead of peer-review-by-the-masses as blogs are) seems quite opposed to the open, “let information be free!” attitude in this editorial.

    But then again, isn’t it a good sign that the readership is thinking things like this, and that the editors are publishing such sentiments?