Thought for the night:
We humans are amazingly good at finding patterns in data. So good, in fact, that we can even find patterns where there are none.
The Skeptic’s Dictionary’s Law of Truly Large Numbers talks about just this…
Disclaimer: The following web space does not contain my own opinions, merely linguistic representations thereof.
Thought for the night:
We humans are amazingly good at finding patterns in data. So good, in fact, that we can even find patterns where there are none.
The Skeptic’s Dictionary’s Law of Truly Large Numbers talks about just this…
Finally got down to it and installed a wiki.
MoinMoin was easy to config on Gentoo but a pain for Fedora Core 1. MediaWiki, in contrast, was a breeze to install on both. It feels like a little overkill for my needs (way too much cruft for a personal “offboard brain” with only one user), but at least it’s there now. I’ll look into customization later.
The only bad part is that all the page data seems to reside in mysql. That means no filesystem access for easy hacking (script-generated pages, or pages that scripts can read). Disappointing, because I was looking to hacking something similar to the app described in this page: a unix calendar-like script that takes a wiki page as input.
Time, time, time…
Edit:
A year later, I find MoinMoin has matured a bit and is now easier to install. It’s a lot less of a Heavyweight Bruiser, much more suited to my needs.
Somehow, it looks as if all the DOT sensors under the 10 freeway between the 110 and the 405 are down. Or, at least, Sigalert.com isn’t showing them.
Bummer. Same with the 405S between the 10 and the 90. This, coincidentally, is the exact route I take to work every morning, if I take the freeway.
Looks like I’m going to have to get around to parsing all that data I’ve been collecting, and number crunch some time statistics that can at least give me traffic pattern approximations, ifI can’t get access to reality any more.
So it goes.
Faina Vahaleva (蔣方良, Chiang Fang-liang), wife of Chiang Ching-Guo (蔣經國 — R.O.C. nationalist leader, President of Taiwan, and eldest son of Chiang Kai-Shek), just passed away at around noon today. This site is a nice memoire.
A wonderful bit of web-based English pedagogy. Perhaps I can use this as inspiration for my work in our Tactical Langauge Training System.
The World is Too Much With Us; Late and Soon
The world is too much with us; late and soon,
Getting and spending, we lay waste our powers:
Little we see in Nature that is ours;
We have given our hearts away, a sordid boon!
The Sea that bares her bosom to the moon;
The winds that will be howling at all hours,
And are up-gathered now like sleeping flowers;
For this, for everything, we are out of tune;
It moves us not.–Great God! I’d rather be
A Pagan suckled in a creed outworn;
So might I, standing on this pleasant lea,
Have glimpses that would make me less forlorn;
Have sight of Proteus rising from the sea;
Or hear old Triton blow his wreathed horn.
-William Wordsworth, 1806.
I’m one hour away from my presentation.
Here are my powerpoint slides (with perhaps too verbose of attached speaker notes, because I’m like that).
While I hoped to have more actual statistical results for my talk, lack of both time and enough annotation data kept me from that. Will definitely have it by the time papers come around, though.
The world is abuzz with talk of ontology, folksonomy, facets.
Some cursory thoughts on Leonard’s entry. He says:
I believe that both disambiguation and synonym merging are relative non-issues. For the former, the ease of intersections almost makes it moot from the practical perspective of searching. For the latter issue, we are already beginning to see automated solutions (related tags).
And, later:
The first hurdle is, I suppose coming up with a convincing argument that hierarchies are worthwhile. I think quite obvious that in everyday life, we categorize and subcategorize often and that being a first-class object isn’t completely out of the realm of sense. The real question is if there’s a way of reintroducing hierarchy that doesn’t reintroduce the problems they caused in the first place.
I totally agree with him. Even though my paper lauded the advantages of lack of hierarchy (that, for instance, lack of hierarchy makes it easier to merge the personal ontologies of the masses, because hierarchies, in many case, have some ambiguity when it comes to ideal structure), I still think that the flexibility afforded by hierarchical faceted systems is really, really cool. This whitepaper from the makers of reiserfs that talks about their vision for the future of filesystems has long been an inspiration for me. (Among other things, this paper has some pretty nifty ways to transparently merge faceted hierarchy with non-faceted (that is, modern-day vanilla) hierarchy).
Del.icio.us doesn’t use hierarchy right now in its facetted organization. I may be misunderstanding Joshua Shachter’s comments on the delicious-discuss list, but his primary motivation for not using hierarchy seems more to be because it’s computationally heavy, and less because it’s ambiguous and harder to glean order out of hierarchical personal ontologies. I don’t know if this is short-sighted or not.
Leonard talks about namespace masking to simplify hierarchy, and I think that’ll be part of the eventual solution. The neat part about it is that this hierarchy can be statistically generated just like semantic-level tag splitting and merging. I’m not sure if Leonard’s incarnation solves the sparsity issue as he claims it does, but that doesn’t mean it’s bad. It just means we need to keep looking for more strengthening solutions.
I wager that the lack of automated solutions for metonymy disambiguation and synonym merging (two sides of the same coin, really), and support for faceted hierarchy are things that are not here yet simply because the field is so young. But, really, I can’t wait to see what the field looks like a few years from now.
I like this picture. It talks to me about what it is to be a good artist, and a good designer. To see the potential as clearly as (or even more clearly than) one sees the actual… I love the way the artist is staring so intently at the egg, as if copying down from reference every edge and shade of the bird-to-be.
Painted by one of my favorite artists, Rene Magritte.

A week from now I’m giving a talk on creating a language model for second-language-learner speech (basically, my PhD research up to this point, and what will eventually become my thesis).
Information:
Speaker: Nick Mote
Date: 10 Dec 04
Time: 3:00pm – 4:30pm
Location: Information Sciences Institute (Marina Del Rey, California)
Abstract:
ISI’s Tactical Language Project is a system designed to teach Americans how to speak Arabic through a video game environment. We’ve taken a FPS engine (Unreal 2003), added skins and maps so it looks like you’re in a typical Lebanese village, taken away the guns, added speech recognition, and set the player in the middle of it all. The theory is that if you learn well in a classroom, you’ll perform well in a classroom–but if you learn well in a pseudo-naturalistic environment, you’ll perform better in real life. My research comes into play because speech recognition is a hard thing–especially when you’re trying to understand language-learner speech, with all of its mispronunciations, disfluencies, and grammatical errors. Understanding speech is hopeless unless you have a good approximation of what kinds of mistakes learners make, and can anticipate them.
Say an English learner says “Water”. Is he asking you for water? Is he telling you there’s a puddle in front of you? Is he saying his name is “Walter”, but mispronouncing it? There’s a lot of ambiguity involved. In order to disambiguate, we need to look at context, the learner’s past language performance, and details about the learner’s mother language as it relates to English, to be able to guess what he is actually trying to say.
And then, of course, once we have a good guess at what the learner has said, what do we do about it? How do we correct him? How serious are different speech disfluencies in terms of native listener comprehension, pedagogical objectives, and social politeness (the Lebanese word ra’iib (sergeant) dangerously close to the word rahiib (terrible) ). We want to take special corrective care to make sure learners don’t make errors like these). And how do we compensate for poorly-performing speech recognition (ASR works great with a lot of data, but there’s not too much annotated data of Americans learning specific subdialects of Arabic)?
This is basically what I’m doing. I use a lot of Natural Language Processing–primarily statistical NLP, with a bit of pedagogy theory and linguistic (SLA and phonology) theory sprinkled in.
Let me know if you want to come, I can give you more details. I’ll also put my presentation slides up here (once I’m finished writing them).